Free

Letting Claude Refactor

Refactors have no symptoms — you can't see when Claude gets it wrong. Three guardrails: tests, atomic commits, manual click-through.


Refactoring is the task where Claude is most dangerous. A bug has symptoms — button does nothing, value is undefined, stack trace — so you can tell whether Claude's fix worked. A refactor has none. "It still runs" can mean "tests still pass" while behavior silently changed, you didn't notice, and production blows up a week later.

I recently did a fairly big refactor with Claude on how2claude: migrated the x402 crypto payments off a hand-rolled PaymentHandler + FacilitatorClient (139 lines) onto the x402-rails gem, and at the same time extracted the Purchase.create! / Subscription.create! field mapping — duplicated across two controllers — into model class methods. One commit, 4 files changed, 2 deleted, 2 added.

My prompt was one word: "refactor."

It could be that short because there were guardrails around it.

Guardrail #1: Never refactor without tests

By the time this branch got there, it had 221 tests. All the critical paths in the payment flow were covered.

Claude's default move before a refactor is not "look at the tests first." So I told it to run bin/rails test first, confirm green, then touch anything.

After the refactor, run it again. Still green. That doesn't mean no regressions — it means known behavior isn't broken.

If your codepath has no test coverage, have Claude write a minimal test that locks current behavior in place. Run it, commit it. Then refactor. Otherwise what it's doing isn't refactoring, it's rewriting — and you have no way to verify equivalence.

Guardrail #2: Make it split the change into atomic commits

This refactor was really two things:

  1. x402 backend: hand-rolled → gem
  2. Purchase / Subscription field mapping: controller → model class methods

There was a third, on the frontend: rewriting the JS-side signing flow with viem + x402-fetch.

I made Claude split on natural boundaries: backend + model extraction in one commit (9f3e239), frontend in a separate one (93746d8). Each commit carries a full description, file list, and reasoning.

Benefits:
- Diffs stay readable. One commit, one thing.
- Rollback granularity. If production hits a frontend bug, git revert 93746d8 rolls back the frontend while leaving the backend.
- Claude's own attention stays focused. One commit, one thing — and its attention covers only that thing.

Guardrail #3: Read the diff before calling it done

Once the refactor is complete, I make Claude stop and show me git diff --staged. Don't run the tests. Don't run the app. Read the diff first.

Signals I scan for:

  • What did it delete? app/services/x402/payment_handler.rb gone entirely — fine, that's the point of migrating to the gem. But if it deletes something I didn't ask it to touch, I stop and ask.
  • Did the field mapping change? Purchase.create!(wallet_address: verify_result["payer"], ...)Purchase.record_x402!(payment:, settlement:) now reads payment[:payer]. The source changed (gem's request.env vs the old client's return value), but the fields have to map one-to-one.
  • "Along the way" changes. Claude loves to fix stuff that "looks wrong" while refactoring — rephrasing error messages, renaming variables, extracting a method it thinks should exist. Watch these. A refactor's promise is "behavior-equivalent." A drive-by change breaks that.

Two things Claude got wrong this time

Trap 1: The gem's Stimulus controllers silently didn't load

The x402-rails gem ships its own Stimulus controllers. Claude wrote the code, tests went green. I manually clicked the pay button — nothing happened.

Reason: config/importmap.rb had a pin for @hotwired/stimulus pointing at a vendor file that didn't exist, and importmap silently dropped the pin. The gem's controllers never loaded. Tests can't catch this because bin/rails test doesn't execute JS.

Trap 2: YAML parsed 0x... as an integer

wallet_address: 0x833589... — no quotes. YAML saw the 0x prefix, read it as a hex integer. Facilitator received a non-string and rejected. Claude didn't stop to think about YAML parsing rules while writing the config.

Both traps were caught because I clicked the actual button. Tests passing is not the same as the feature working. Manual verification after a refactor isn't optional.

The full Claude-refactor flow

  1. Run the whole test suite. Confirm green. If the target code has no coverage, have Claude write a minimal test that locks the current behavior, then commit that.
  2. Say what surface you want extracted. "Refactor" works when the duplication is obvious; when it isn't, say "extract the field mapping from X into class methods on Y."
  3. Split into atomic commits. One commit, one thing.
  4. Read the diff yourself. What got deleted, whether the field mapping holds, whether it drive-by changed anything.
  5. Run the tests again. Green.
  6. If it touches a user-visible path, click through the feature manually. Layers that tests can't reach — JS, importmap, CDN, YAML parsing — you have to see with your own eyes.

Refactoring is the highest-risk "let Claude drive" scenario. The guardrails aren't for Claude. They're for you — so that when Claude gets something wrong, you catch it in 5 minutes instead of in a production fire.