Refactors have no symptoms — you can't see when Claude gets it wrong. Three guardrails: tests, atomic commits, manual click-through.
Refactoring is the task where Claude is most dangerous. A bug has symptoms — button does nothing, value is undefined, stack trace — so you can tell whether Claude's fix worked. A refactor has none. "It still runs" can mean "tests still pass" while behavior silently changed, you didn't notice, and production blows up a week later.
I recently did a fairly big refactor with Claude on how2claude: migrated the x402 crypto payments off a hand-rolled PaymentHandler + FacilitatorClient (139 lines) onto the x402-rails gem, and at the same time extracted the Purchase.create! / Subscription.create! field mapping — duplicated across two controllers — into model class methods. One commit, 4 files changed, 2 deleted, 2 added.
My prompt was one word: "refactor."
It could be that short because there were guardrails around it.
By the time this branch got there, it had 221 tests. All the critical paths in the payment flow were covered.
Claude's default move before a refactor is not "look at the tests first." So I told it to run bin/rails test first, confirm green, then touch anything.
After the refactor, run it again. Still green. That doesn't mean no regressions — it means known behavior isn't broken.
If your codepath has no test coverage, have Claude write a minimal test that locks current behavior in place. Run it, commit it. Then refactor. Otherwise what it's doing isn't refactoring, it's rewriting — and you have no way to verify equivalence.
This refactor was really two things:
Purchase / Subscription field mapping: controller → model class methodsThere was a third, on the frontend: rewriting the JS-side signing flow with viem + x402-fetch.
I made Claude split on natural boundaries: backend + model extraction in one commit (9f3e239), frontend in a separate one (93746d8). Each commit carries a full description, file list, and reasoning.
Benefits:
- Diffs stay readable. One commit, one thing.
- Rollback granularity. If production hits a frontend bug, git revert 93746d8 rolls back the frontend while leaving the backend.
- Claude's own attention stays focused. One commit, one thing — and its attention covers only that thing.
Once the refactor is complete, I make Claude stop and show me git diff --staged. Don't run the tests. Don't run the app. Read the diff first.
Signals I scan for:
app/services/x402/payment_handler.rb gone entirely — fine, that's the point of migrating to the gem. But if it deletes something I didn't ask it to touch, I stop and ask.Purchase.create!(wallet_address: verify_result["payer"], ...) → Purchase.record_x402!(payment:, settlement:) now reads payment[:payer]. The source changed (gem's request.env vs the old client's return value), but the fields have to map one-to-one.Trap 1: The gem's Stimulus controllers silently didn't load
The x402-rails gem ships its own Stimulus controllers. Claude wrote the code, tests went green. I manually clicked the pay button — nothing happened.
Reason: config/importmap.rb had a pin for @hotwired/stimulus pointing at a vendor file that didn't exist, and importmap silently dropped the pin. The gem's controllers never loaded. Tests can't catch this because bin/rails test doesn't execute JS.
Trap 2: YAML parsed 0x... as an integer
wallet_address: 0x833589... — no quotes. YAML saw the 0x prefix, read it as a hex integer. Facilitator received a non-string and rejected. Claude didn't stop to think about YAML parsing rules while writing the config.
Both traps were caught because I clicked the actual button. Tests passing is not the same as the feature working. Manual verification after a refactor isn't optional.
Refactoring is the highest-risk "let Claude drive" scenario. The guardrails aren't for Claude. They're for you — so that when Claude gets something wrong, you catch it in 5 minutes instead of in a production fire.