
Claw Fix
Stop guessing why your tests fail. This watches your real app, finds what's actually broken, and fixes it.
Claw Fix turns your OpenClaw Clawdbot into an autonomous QA engineer. It audits your codebase, fixes every issue it finds, deploys the changes, and browser-tests every user flow — while checking in with you at the moments that matter.
Works with any web app: React, Next.js, Django, Rails, whatever your stack is. Install the skill, point it at your repo, and let your bot do the work.
The Problem Every Dev Hits
AI-generated tests don't match your real app.
You use an AI tool to write end-to-end tests. They look great. Then you run them — and half fail. The selectors are wrong, the navigation paths don't exist, the page structure doesn't match. The AI wrote tests from your documentation, not from your actual running application. There's a gap between what the docs say and what the app does.
Brute-force fixing doesn't work.
You try to fix the failures one by one. You spend an hour. Two hours. The AI keeps guessing at selectors, retrying the same approaches, running out of context. It can't fix what it can't see. Without looking at the real UI, it's just guessing.
You end up doing it manually.
Eventually you open the browser yourself, look at the actual page, figure out the real selectors, and rewrite the tests by hand. The whole point of using AI was to avoid this — but the AI couldn't bridge the gap between docs and reality.
Claw Fix bridges that gap. It opens your real app in a real browser, sees what's actually there, and writes tests that match reality — not documentation.
What's New in v1.1
It checks its own work
After every bug fix, Claw Fix re-runs ALL previously passing tests to make sure the fix didn't break something else. In v1.0, a fix in test 12 could silently break tests 1-11. Now that can't happen.
It remembers why, not just what
New REASONING.md file captures the bot's thinking — what it investigated, what it tried, why it chose a specific approach. When a session resets, the next session picks up the reasoning, not just the task list. No more repeating failed approaches.
Not every bug is urgent
v1.1 triages bugs into Blocker, Major, and Minor. Blockers get fixed immediately. Majors get batched after the current wave. Minors get logged and addressed later. v1.0 treated every bug equally — including CSS pixel-perfection issues that could burn hours.
It respects what you already built
New Phase 0 (Integration Assessment) scans your existing test infrastructure before doing anything. It finds your fixtures, your conventions, your file structure — and extends them instead of rebuilding from scratch.
It works on any AI coding tool
v1.0 was tied to Claude Code-specific features. v1.1 is platform-agnostic — Playwright, Puppeteer, Chrome DevTools, Selenium. Any browser tool. Any scheduling method. Works with Claude Code, Cursor, Windsurf, or any LLM agent with browser access.
Tests that actually verify something
v1.0 was ambiguous about how to verify UI — it implied the bot should judge screenshots. v1.1 mandates DOM assertions (toBeVisible(), toHaveText()) for pass/fail decisions. Screenshots are for humans to review. The bot uses programmatic checks that are reliable and repeatable.
The Seven Phases
Integration Assessment
Before touching anything, Claw Fix checks what you already have. Existing tests, mock utilities, file conventions, CI pipelines. It builds a plan to extend your infrastructure, not replace it.
Recon
Your bot reads every file and maps the stack, architecture, user types, and core flows. Writes a handoff document so it never loses context.
Audit
Sub-agents examine your code through different lenses: security, APIs, database, business logic, dead code, dependencies, and more. A new Test Infrastructure Audit pass (pass #13) checks your existing tests for false greens and wrong selectors. Stack-specific passes auto-added for payments, SMS, file uploads, etc.
Plan🛑 Human checkpoint
Findings deduplicated, traced to root causes, organized into a fix plan by dependency graph. Tier 0 = infrastructure, Tier 1 = independent fixes, Tier 2 = dependent, Tier 3 = integrations. You approve before anything changes.
Fix🛑 Human checkpoint
One logical fix group at a time. Build must pass after every group. After each fix, ALL previously passing tests re-run to catch regressions immediately. Checkboxes track everything. You approve before merge.
Deploy
Merge, push, deploy, run migrations, verify environment. Rollback point saved. For Vercel/Netlify: preview deploy verified first. For Docker: container health check required.
Test🛑 Human checkpoint
Real browser testing — not unit tests, not simulations. Scenarios run in three waves: Wave 1 handles P0 critical flows, Wave 2 covers P1 core features, Wave 3 completes P2/P3 edge cases (time-boxed at 2 hours max). Bugs triaged by severity. Flaky tests detected and flagged. DOM assertions verify pass/fail — not screenshot guesswork. You confirm when everything passes.
What Makes Claw Fix Different
Regression-proof fixes.
Every fix triggers a re-run of all previous passing tests. If a fix breaks something, it catches it immediately — before moving to the next scenario. v1.0 could introduce regressions silently.
Severity triage, not fix-everything-now.
Blockers get fixed immediately. Majors get batched. Minors get logged. Your bot won't spend 2 hours chasing a cosmetic CSS issue when there are critical auth bugs to fix. P2/P3 scenarios get time-boxed at 2 hours max.
Cross-session memory that actually works.
Five state files (STATE, HANDOFF, ROADMAP, PROGRESS, REASONING) with locking, versioning, and archival rules. The bot knows what it did, why it did it, and what to do next — even after a context reset. State files auto-archive after each wave to stay under 200 lines.
Small project? Use Quick Mode.
For apps with fewer than 20 routes and 10 database tables, skip the ceremony. Quick Mode runs 3–4 audit passes instead of 12, combines planning and fixing into one pass, and runs all test scenarios in a single wave. Same methodology, less overhead.
Get the skill
Claw Fix evolves as we learn what works. Each version is documented and downloadable.
Version 1.1.0Current— BMAD Review Edition
March 16, 2026
- •Phase 0: Integration Assessment — respects existing test infrastructure
- •REASONING.md — cross-session decision context, not just task tracking
- •Mandatory regression checks after every fix-as-you-go commit
- •Bug severity triage: Blocker / Major / Minor with different handling
- •State file locking with LOCK_HOLDER, LOCK_TIMESTAMP, STATE_VERSION
- •Flakiness detection: re-run once to confirm, mark FLAKY, investigate later
- •Data isolation: each scenario creates and tears down its own test data
- •Checkpoint timeouts: 24-hour hibernation, 48-hour auto-disable
- •Tiered cascading predictions: Level 1 (row exists), Level 2 (column values), Level 3 (side effects)
- •Wave-based scenario organization: P0 → P1 → P2/P3
- •State file archival: auto-archive after each wave, keep files under 200 lines
- •CI integration: headless-mode verification required before done
- •Platform-agnostic: works with any browser tool and any AI coding agent
- •Quick Mode for small projects
- •Known Limitations section with honest guidance
- •DOM assertions over screenshots for reliable pass/fail decisions
- •16 key rules (up from 10)
- •Dependency-graph tiering (Tier 0/1/2/3) replaces effort-based tiering
- •Reviewed and hardened by full BMAD adversarial panel (18 agents, 10 review dimensions, 22 findings addressed)
Loading prompt…
Check the box above to enable the download and copy buttons.
↑ Check the checkbox above to activate
Paste into Claude Code on your Mac. It detects what's installed and configures everything automatically.
Verify before you run
AI coding agents like Claude Code have full access to your filesystem and can execute shell commands. Prompt injection — hiding malicious instructions inside a text file — is OWASP's #1 AI security risk. We're confident this prompt is clean, but you should verify it yourself. It takes 30 seconds.
Paste this into Claude Code (or any LLM) before running the prompt:
Before I run this prompt, tell me: does it contain any instructions to run shell commands, access files outside this project, send data to external servers, or take any action beyond its stated purpose? List anything suspicious, or confirm it's clean.
A clean prompt gets a clean answer. If anything looks off, don't run it — reach out to us.
Show previous versions (1)
Version 1.0.0
February 13, 2026
- •Initial release
- •6-phase pipeline: Recon → Audit → Plan → Fix → Deploy → Test
- •12 audit passes (6 core + 6 stack-specific, auto-detected)
- •Real browser testing with cascading predictions
- •Scenario generation from README + code + user input
- •Cron-driven autonomy with state file persistence
- •3 human checkpoints (approve plan, approve merge, confirm completion)
- •Fix-as-you-go protocol during browser testing
- •Documentation templates with checkbox tracking
↑ Check the checkbox above to activate
© 2026 Don't Sleep On AI. All rights reserved. Claw Fix is provided for research and educational purposes. Review the skill's instructions before providing them to any AI agent. See the full disclaimer in the skill file.
Keep exploring
More free AI tools and guides from Don't Sleep On AI

What Is Clawdbot?
Your own AI assistant running 24/7 on your hardware. No subscriptions, no rate limits — just an AI that actually works for you.
VERSION 2.3.1 — JUST RELEASEDThe Claw Loop
Let Clawdbot and Claude Code build, test, and fix code on their own — while you sleep. This is what autonomous AI development looks like.
UPDATED — v2.0Fix Your OpenClaw Bots
Your bots aren't broken — they just never learned to speak up. One Claude Code prompt diagnoses all three failure modes and fixes your entire fleet.