Debugging Production Rails With Claude, Without SSH or Redeploy

Production 400/500s are nothing like local. Locally, you rerun tests, edit, rerun. In production, a user has already clicked "Pay" and is staring at a blank screen.

Three paths:

SSH into the server — works, but the shell inside the container may not have rails console, your changes have no audit trail, one typo and you're in trouble
Redeploy — push a fix, 10+ minutes at minimum, possible brief unavailability, and plenty of problems aren't code problems (they're data or credentials)
Run Rails runner inside the running container — no restart, no redeploy, no SSH, scalpel-level precision

This article covers the third path. Two real cases: a full-width ？ wedged into credentials that broke x402 payments with 500s, and a tweet in the middle of a thread whose status: :failed needs to be retried while keeping thread continuity. Along the way, Claude will try to go wrong in 4 different directions by default — you have to intercept each one.

The tool: `kamal app exec --reuse`

Kamal is 37signals' Rails deploy tool. It has a command:

kamal app exec --reuse 'bin/rails runner "..."'

--reuse means: don't spin up a new container, execute inside the running web container. No new build, no docker pull, no restart, no ENV re-injection. Command runs, container goes back to handling requests.

The output streams to your stdout — effectively puts inside a production Rails console, without SSH, tmux, or leaving your laptop.

A typical session:

$ kamal app exec --reuse 'bin/rails runner "puts User.count"'
Launching command with version abc123 from existing container...
  INFO [ok] Finished in 3.8 seconds with exit status 0
App Host: deploy.how2claude.com
12847

3-6 seconds round-trip. Two orders of magnitude faster than a redeploy.

Case 1: The full-width `？` that caused production 500s

Commit eba9ac9.

Symptom: A few hours after shipping x402 payments, every payment request was 500-ing. Logs full of Net::HTTPBadResponse from the HTTP client. Local worked perfectly.

Diagnosis: Tell Claude to print the x402 config in production first:

kamal app exec --reuse 'bin/rails runner "puts X402.configuration.wallet_address.inspect"'

Output:

"0xAbC123...def？"

There's an extra ？ at the tail — a full-width question mark (U+FF1F), not the half-width ?. Someone's input method switched during an edit of config/credentials/production.yml.enc and dropped this character in.

The local config/credentials.yml.enc (decrypted with the dev/test master.key) didn't have it — production and dev are two separate encrypted credentials in Rails 8, contents not shared.

Fix: You can't SSH in and edit the file (it's encrypted) and you can't pull it down and edit locally (the master.key isn't on your laptop). The move is to have Claude write a one-off Ruby script and inject it via EDITOR= into credentials:edit:

# script/fix_prod_wallet.rb
content = File.read(ARGV[0])
# Strip trailing full-width question mark
content.gsub!(/(wallet_address: 0x[0-9a-fA-F]+)？\s*$/, '\1')
File.write(ARGV[0], content)

EDITOR="ruby script/fix_prod_wallet.rb" \
  bin/rails credentials:edit --environment production

credentials:edit flow: decrypt → write to temp file → invoke $EDITOR → re-encrypt → delete temp. Swap EDITOR for our Ruby script and the edit is automated, no need to eyeball ciphertext locally.

After that, git commit + kamal deploy once — this deploy is required because production.yml.enc changed. But diagnosis didn't cost a deploy.

Rule: When prod breaks, read first with kamal app exec --reuse. Don't guess, don't redeploy first.

Case 2: Retrying a failed XQueue::Tweet mid-thread

Symptom: After an article publishes, tweets go into the x_queue_tweets table. One of them — the 2nd in a 4-tweet thread — gets status: :failed (X API rate-limited, content rejected, whatever). Retrying it requires preserving thread continuity with the 1st tweet.

Find the failed one:

kamal app exec --reuse 'bin/rails runner "
  XQueue::Tweet.where(status: :failed).each do |t|
    puts \"#{t.id}: thread=#{t.thread_id} pos=#{t.thread_position} content=#{t.content.inspect}\"
  end
"'

Turns out to be id=87, thread_id=15, thread_position=2.

Pitfall 1: shell escaping. Tweet content often has quotes, newlines, backticks. If you write:

# blows up — shell eats quotes and backslashes
kamal app exec --reuse 'bin/rails runner "t = XQueue::Tweet.find(87); t.update!(content: \"...\")"'

The Base64 dance — encode locally, pass a base64 string, decode inside the runner:

# Encode locally
echo -n 'Rewritten tweet content...' | base64
# => UmV3cml0dGVuIHR3ZWV0IGNvbnRlbnQuLi4=

# Pass through
kamal app exec --reuse "bin/rails runner \"
  t = XQueue::Tweet.find(87)
  t.update!(content: Base64.decode64('UmV3cml0dGVuIHR3ZWV0IGNvbnRlbnQuLi4='), status: :scheduled)
  puts t.status
\""

Base64 strings are ASCII-only, shell-safe.

Pitfall 2: thread continuity. XQueue::PostTweetJob.perform_later(87) publishes a standalone tweet — it does not chain to tweet #1 — because the X API needs reply_to_tweet_id, and the Job defaults to no such value.

Find the previous tweet's x_tweet_id (successful sends fill this field):

kamal app exec --reuse "bin/rails runner \"
  t = XQueue::Tweet.find(87)
  prev = XQueue::Tweet.where(thread_id: t.thread_id, thread_position: t.thread_position - 1).first
  puts 'prev x_tweet_id: ' + prev&.x_tweet_id.to_s
\""
# => prev x_tweet_id: 1834567890123456789

Enqueue with the reply target:

kamal app exec --reuse 'bin/rails runner "
  XQueue::PostTweetJob.perform_later(87, reply_to_tweet_id: \"1834567890123456789\")
  puts \"enqueued\"
"'

The worker's polling_interval is 0.1 seconds — it picks up the job almost immediately. A few seconds later, kamal app exec to check status flipped from scheduled to posted and x_tweet_id got filled — thread is continuous.

Rule: Production data operations have to respect business-layer constraints, not just "record updated successfully." Thread continuity is a business constraint; Rails runner won't check it for you.

The 4 directions Claude defaults to (and how to redirect)

When doing this kind of work, Claude's first instinct is often wrong. Catch each:

1. Wants to SSH into the server

"Let me SSH in and take a look..."

Redirect: kamal app exec --reuse beats SSH — in-container, Rails env loaded, audited (kamal logs keep records), doesn't touch host shell, no container drift (reuse guarantees current production version).

2. Wants to write a migration to fix data

"I'll write a migration to strip the full-width ？ from wallet_address..."

Redirect: Changing one credentials value doesn't need a migration (no DB was touched). A one-off Rails runner does it in 10 seconds; a migration needs a deploy and lives in the schema forever. Only use a migration if the fix might need to run again; a typo is a one-time thing.

3. Wants to shove special characters into shell strings

"I'll just call update!(content: "...")..."

Redirect: Any user-generated content (tweets, comments, user-input markdown) should route through Base64. Shell parsing of quotes, backslashes, $, and backticks is a classic minefield — production is a terrible place to rehearse.

4. `perform_later` without business parameters

"I'll just re-PostTweetJob.perform_later(87)..."

Redirect: First ask "does this record relate to others?" Threads have reply_to relations, batch jobs have batch_id relations, paginated jobs have cursor relations — a Job's argument list is the carrier for those business relations. Drop an argument, break a chain.

Checklist

Debugging + mutating production Rails data with Claude — 6 rules:

Read before you write. kamal app exec --reuse 'bin/rails runner "puts X"'. Pin down the problem before changing anything.
kamal app exec --reuse is the default tool, not SSH, not redeploy. In-container, Rails loaded, 3-6 second round-trip.
Credentials changes via EDITOR=ruby-script bin/rails credentials:edit --environment production. Ruby script performs the edit, no eyeballing ciphertext locally.
Shell escaping is a minefield; route special content through Base64. echo -n 'X' | base64 locally, Base64.decode64 in the runner.
Rails runner won't enforce business constraints for you. thread reply_to, batch_id, paging cursor — enqueue with all of them.
Only write a migration if "this might happen again." One-off data fixes go through runner, two orders of magnitude faster.

Debugging in production isn't about transplanting your dev workflow to prod — you don't have iteration space, and you don't have error tolerance. What you actually use is the introspection surface production already offers (Rails runner + kamal exec + credentials:edit), each step making the smallest possible change. Claude can write correct Ruby — but knowing "which things can be done directly vs must be diagnosed first" is a call it won't make for you. That's your production discipline.