Nearly all 1,562 lines of TopicReview's test suite were written by Claude. I only review — two-plus weeks live, all follow-up commits add tests, none rewrite them. This post is on why testing is the ideal delegation target, what to look for and what to ignore in review (including a real missing edge case in the actual spec), and the default setup for this split.
The last post closed with "119 specs green" for TopicReview. The real next question: who wrote those tests?
Answer: Claude wrote nearly all 1,562 lines of test code. I only reviewed. Over two-plus weeks on production, the maintenance pattern has been only adding new tests, never rewriting old ones.
This post is about why tests are the single best thing to delegate to Claude, what to look at and what to ignore when reviewing, and how far this split actually goes.
TopicReview's tests span 7 files:
spec/services/topic_review_service_spec.rb 760 lines (88 tests)
spec/requests/topic_reviews_spec.rb 281 lines (32 tests)
spec/requests/review_appeals_spec.rb 152 lines (16 tests)
spec/requests/review_votes_spec.rb 127 lines
spec/policies/topic_review_policy_spec.rb 109 lines
spec/jobs/close_topic_review_job_spec.rb 71 lines (7 tests)
spec/models/topic_review_spec.rb 62 lines
───────────────────────────────────────────
1,562 lines
Four kinds of tests: service (business logic), request (controller + integration), policy (Pundit authz), job (scheduled jobs).
The initial commit d162f1e ships with Co-Authored-By: Claude Sonnet 4.6, landing 1,100+ of those lines in one go. Every spec-related commit that followed is "Add test for..." — not a single refactor or rewrite:
00393fc Add test for finalize! with zero votes (expired review)
3f53304 Add test for finalize! with legacy votes missing reasoning
3b185da Update specs to use PROVISIONAL_PENALTY constant
Patching gaps, not rework. This detail matters — I come back to it.
Four hard reasons:
1. Inputs and outputs are explicit. A test is "given this state → expect this behavior." That's Claude's strongest translation task: turning a spec into an assertion. Business code sometimes has to weigh trade-offs; tests don't.
2. Mechanical × high volume. One describe .open! has to cover "has eligible jurors / no eligible jurors / no topic / already-active review" — four contexts, each with 2–5 it blocks. Humans start cutting corners by the third context. Claude writes the 88th it with the same care as the first.
3. Extremely short feedback loop. Write a test, run rspec, know in seconds whether it passes. Business code takes days of real use to surface issues. A short loop means any Claude mistake gets caught by rspec on the spot — you don't have to babysit.
4. Naturally parallel. it blocks are independent, no hidden coupling, trivially scalable. Generating dozens of isolated tests at once is exactly what Claude is good at.
This is the pivot of the whole split.
Ignore:
Look at:
The last one is review's real value. Claude covers the tests "it can think of," but the tests it can't think of don't get written automatically. That's exactly where human review fits — working backward from business rules to find missing coverage.
Open the start of spec/services/topic_review_service_spec.rb's describe ".open!":
describe ".open!" do
context "when there are eligible jurors" do
# review status ok / post under_review / assignments created / author notified / jurors notified / no double-open
end
context "when there are no eligible jurors" do
# review created but no assignments
end
context "when post has no topic" do
# returns nil
end
end
Looks comprehensive. But the real eligible_jurors rule in the model excludes three groups:
def eligible_jurors
excluded_ids = [ post.user_id ] + post.reports.pluck(:user_id) + review_votes.where(stage: :initial).pluck(:user_id)
User.jurors_and_judges.where.not(id: excluded_ids.uniq)
end
Now look at the tests — which test asserts that "the post author is never selected as a juror"?
Search service_spec.rb and model_spec.rb: none. model_spec.rb only tests a few cases of the pending_vote_by scope; it doesn't cover eligible_jurors at all. service_spec.rb has only a comment: # Jurors must NOT be the post author — that's in setup, not an assertion.
This is what review catches: three exclusion rules (author / reporter / already-voted-initial), and none of them is protected by a test. If someone later refactors eligible_jurors and accidentally drops post.user_id from the exclusion list, every existing test passes — and production quietly lets authors sit on their own jury.
Claude wasn't wrong — it tested what it was asked to test. It just didn't spontaneously ask, "do these three rules each need test coverage?" That question — working backward from rules to coverage — is what review is for.
(Honesty: I missed this during the original review. I only spotted it during a second audit while writing this post. So review isn't a one-shot either — but it's still 10× better than no review.)
If "Claude writes + human reviews" were perfect, there'd be no new test commits after the initial one. What actually happened is more interesting — gap-patching without rewrites:
00393fc Add test for finalize! with zero votes (expired review)
3f53304 Add test for finalize! with legacy votes missing reasoning
The first is a post-bug regression test — e8cb2db Default to keep verdict when review expires with zero votes was the fix, 00393fc the follow-up test. Same pattern for the second, chasing abaa22e Fix CloseTopicReviewJob failing due to reasoning validation on old votes.
These two commits prove two things at once:
"Good enough + able to keep patching" is a far more realistic bar than "perfect." Chasing perfect review is what stops you from handing tests off to Claude in the first place. Accepting "good enough" is what makes the split possible.
Not every test is right for full handoff:
In those cases, human leads and Claude assists.
To turn this split into a runnable default:
Programmers have less mental energy for reading tests than for reading code. Tests are repetitive, mechanical, draining, necessary. All of that describes Claude's sweet spot — it doesn't get bored, doesn't get tired, doesn't cut corners by the 50th it.
Your job isn't "writing tests" — it's "making sure every business rule is covered by a test." One is implementation, the other is judgment. Judgment stays with you; implementation goes to Claude.
119 specs / 1,562 lines landing in one commit and surviving two-plus weeks without rework — that happened not because I'm better at writing tests, but because I didn't write any. I just did one thing Claude doesn't: decide which business rules deserve protection.