Getting Claude to Change Its Mind

I was working on smarts — a Rails app that renders Ethereum smart contracts as "developer documentation." The other day I hit an old problem: contracts get upgraded. If the system renders an upgraded proxy contract using a stale ABI, it's lying to the user.

I asked Claude to help me design proxy upgrade detection. It wrote up a thorough technical plan, including compatibility for EIP-1967, Beacon proxies, and ZeppelinOS. When we got to "when do we trigger the check," it laid out two options:

# Option A: add last_proxy_check_at to the DB, throttle by 5min
if @contract.last_proxy_check_at.nil? || @contract.last_proxy_check_at < 5.minutes.ago
  CheckProxyUpgradeJob.perform_later(@contract)
end

# Option B: throttle via Solid Cache, lighter
unless Rails.cache.exist?("proxy_check:#{@contract.id}")
  Rails.cache.write("proxy_check:#{@contract.id}", true, expires_in: 5.minutes)
  CheckProxyUpgradeJob.perform_later(@contract)
end

Its closing line:

B is simpler, no need for a column.

I stared at the two snippets for a moment. B was simpler, no migration, just cache calls, very Rails-idiomatic. But something about it didn't sit right.

I typed five characters and hit enter:

u recommend a?

Claude recomputed

Its reply (a few code blocks trimmed):

I'd change my mind — recommend A.

The reasoning: the entire point of proxy detection is to not trust ABI freshness. If we throttle via Solid Cache, we're putting "have we checked yet" right back into ephemeral storage:

After deploy / cache wipe, every contract enters the "unchecked" state at once → next traffic wave triggers a check storm

No way to answer "how long since this contract was last reverified?"

No way to do a scheduled backstop: cold contracts may go weeks without traffic and lazy triggering will never fire

What last_proxy_check_at unlocks:

Lazy triggering (same logic as B)

A scheduled backstop via recurring.yml that batches contracts past 24h, independent of user traffic

Surfacing freshness in the UI: "Implementation verified 12 min ago"

Cost: one migration + one update_column per job (no callbacks, basically free).

Bottom line: B looks "more Rails-native, lighter," but it's missing a layer of capability. A is the option that actually delivers on "we don't trust ABI freshness."

Notice what happened between the two recommendations:

First time: list options → compare line counts → pick B ("simpler")
Second time: anchor on why we're doing this → notice that B parks state in ephemeral storage, contradicting the whole "don't trust ABI freshness" premise → pick A

No new facts entered between the two passes. Lines of code, Rails idioms, cache semantics — all unchanged. What changed was which layer it was evaluating from.

The first pass was comparing code volume. The second was comparing "does this change actually solve the problem we set out to solve."

What "u recommend a?" did

I didn't type those five characters trying to push it anywhere. I just stalled out on B's code and instinctively asked for its preference again.

But the effect was to pull it from "pick one of two options" back to "why are we doing this." That gap is the whole point.

When LLMs make recommendations, they almost always start from local optima: which diff is smaller, which snippet looks more idiomatic, which implementation reads cleaner. These dimensions are real. But they routinely fail to connect with what this PR is actually for.

A and B are equivalent on the "5-minute throttle" axis. But A also unlocks three things B can't do — scheduled backstop, observability, freshness in the UI. If you only look at the throttle layer, B wins. If you care about why this job exists, A wins.

Claude didn't see that the first time around not because it can't. It's because the default frame for "compare these options" is the syntax/Rails layer. One follow-up question pulled it up to the problem layer, and it computed B's costs on its own.

Turning this into a habit

A few small adjustments I made to my own workflow:

When Claude closes with "simpler / lighter / more native," look twice. Those are aesthetic words, not judgment words. They describe what reads cleanly, not what holds up.

The cost of asking again is zero. "u recommend X?" — five characters. Claude gets forced to evaluate from an angle it hadn't taken yet. Even if it lands on the same recommendation, the reasoning will be more grounded, enough for you to decide whether to trust it.

Make it lay out the costs, not just the recommendation. What made this turn out well wasn't me strong-arming A — it was Claude writing out B's hidden costs (cache-wipe storm, no observability, dead-end for scheduled work). I hadn't articulated any of those before it did.

A shipped. I added the last_proxy_check_at column, lazy triggering plus a 24h scheduled backstop, and the UI now shows a line: Implementation verified 12 min ago. Three days later, working on something unrelated, that column let me write Contract.where("last_proxy_check_at < ?", 24.hours.ago).find_each — a query that simply couldn't have existed if I'd taken B.

Five characters' worth of follow-up question, worth roughly a week-out refactor.