Free

Let Claude Find CDN Cache Poisoning in Rails — Three Symptoms, One Root Cause

Three PRs, one root cause: the HTML body varies with login state, UA, and flash — properties the CDN's cache key knows nothing about — leaking content across users. Letting Claude walk the PR chain surfaces three reusable patterns: lazy personalize frames, CSS client-side hiding, and cache-boundary guards.


Users hit my site and every public page had a stray "Content missing" floating at the bottom. The first reports all came from iOS — my first instinct was that this was some weird Hotwire Native client behavior. It wasn't until I dug into the Cloudflare logs that I realized: this has nothing to do with the client. The CDN cache is poisoned, all three platforms (Web, iOS, Android) can hit it, iOS just surfaced it first.

Reflex one: a Hotwire Native bug. Reflex two: some weird Cloudflare edge behavior. Neither was right. Even calling this category "a CDN problem" is wrong — the CDN is doing exactly what you told it to do.

Letting Claude walk through the PR chain, three consecutive PRs all had the same root cause: the HTML body varies with login state, UA, and flash messages — request properties the CDN's cache key has no idea about. Three different symptoms, one root cause.

Trap 1: tab-badge turbo-frame leaking across users

/, /topics, /square, /coins, /searches/app are CDN-cached via public_expires_in. The layout had this:

<% if mobile_hotwire? && user_signed_in? %>
  <%= render "shared/tab_badge" %>
<% end %>

shared/tab_badge renders a turbo-frame:

<turbo-frame id="tab_badge" src="/notifications?badge_only=true"></turbo-frame>

The intent is clear: signed-in Hotwire Native users see an unread-notification dot on the tab bar. Web doesn't need it (different rendering path), and signed-out users don't either.

The problem: mobile_hotwire? && user_signed_in? makes the same URL produce two different HTML bodies. But the CDN's cache key only looks at things like URL and Accept header — it has no idea whether you're signed in.

Timeline:

  1. A signed-in Hotwire Native user hits /. CDN goes to origin, caches the HTML with the tab-badge frame.
  2. A signed-out visitor (Web / iOS / Android, doesn't matter) hits /. CDN serves the same cached HTML.
  3. The browser executes the turbo-frame and fetches /notifications?badge_only=true.
  4. NotificationsController has before_action :authenticate_user!, sees no session, and 302s to /users/sign_in.
  5. Turbo can't find a frame with id tab_badge on the sign-in page, so it paints "Content missing".

All three platforms (Web / iOS / Android) can hit this — any cache slot that was first filled by a signed-in user is poisoned for everyone after. Reports clustered on iOS because iOS users have the highest login rate, so caches got seeded with logged-in HTML most often, and signed-out iOS visitors had the highest probability of landing on a poisoned slot. Web and Android weren't "fine" — they had lower trigger rates and fewer reports. A site-wide disaster was disguised by early data as "iOS-only".

The fix is two-step, both aimed at making the cache stop splitting on login state:

Step 1: drop the && user_signed_in? from the layout. Otherwise you're playing cache-key roulette forever:

<%# Render for all mobile_hotwire? requests regardless of auth —
    the user_signed_in? branch splits the same URL into two HTML
    variants, and whichever one wins the CDN slot gets served to
    the next visitor. %>
<% if mobile_hotwire? %>
  <%= render "shared/tab_badge" %>
<% end %>

Step 2: the /notifications?badge_only=true frame endpoint must return a structurally identical 200 frame for signed-out requests, not a 302. And the endpoint itself needs private, no-store so the CDN doesn't share one user's unread count with another.

class NotificationsController < ApplicationController
  before_action :authenticate_user!, except: [:index]

  def index
    if params[:badge_only] || params[:count_only]
      response.headers["Cache-Control"] = "private, no-store"
      redirect_to notifications_path and return unless turbo_frame_request?
      @unread_count = user_signed_in? ? current_user.notifications.unread.count : 0
      if params[:badge_only]
        render(partial: "notifications/tab_badge")
      else
        render(partial: "notifications/bell")
      end
      return
    end

    authenticate_user!
    # ... normal index logic
  end
end

Note authenticate_user! is bypassed only for index, and only for frame sub-requests — non-frame requests still go through the redirect.

The same PR had another bug of identical shape: signed-out Hotwire Native users were also seeing the compose FAB — same poisoning, login-state-dependent cache leaking to signed-out visitors. The original wrote user_signed_in? straight into the layout to decide whether to render the stimulus controller div. Fix: move the FAB into a lazy frame backed by /personalize/compose_fab, a private, no-store endpoint.

<%# Layout always renders an empty frame %>
<% if mobile_hotwire? %>
  <turbo-frame id="compose-fab" src="<%= personalize_compose_fab_path %>" loading="eager" class="hidden"></turbo-frame>
<% end %>
<%# /personalize/compose_fab template %>
<%= turbo_frame_tag "compose-fab" do %>
  <% if user_signed_in? %>
    <div data-controller="bridge--compose-fab" class="hidden"></div>
  <% end %>
<% end %>

The layout always emits an empty frame; the endpoint inside the frame decides whether to inject the controller div based on auth. The main page HTML has zero per-user branching markup — the CDN can cache it however it wants and nothing breaks.

Trap 2: flash messages leaking to other users

Days after #117 shipped, another report: a signed-out user hit the homepage and saw a "Successfully signed in" flash banner at the top. They were baffled.

Same disease, different symptom.

The layout had:

<%= render "shared/_notice" %>

This renders flash[:notice] / flash[:alert]. The first request after any redirect carries flash, e.g.:

  • redirect_to root_path after successful sign-in
  • redirect_to root_path after sign-out
  • Pundit's redirect_back(fallback_location: ...)

If the URL the redirect lands on is also a public_expires_in path, that flash gets baked into the cached HTML. The next anonymous visitor to that URL gets the same HTML — and sees someone else's "Successfully signed in".

The tempting fix is to refactor every cacheable view and move flash rendering into a frame. The more robust fix is to gate it at the cache boundary: if this response carries flash, it cannot be publicly cached.

def public_expires_in(duration)
  return unless Rails.env.live?
  # flash renders directly in the layout, so caching this response
  # would leak the previous user's notice to the next anonymous visitor
  if flash.any?
    response.headers["Cache-Control"] = "private, no-store"
  else
    expires_in duration, public: true
  end
end

Why this is good:

  1. No view changes — every page using public_expires_in benefits automatically.
  2. No wasted cache — the next request without flash hits origin and refills the cache cleanly.
  3. Normal traffic is untouched — 99% of requests still cache fine.

While I was there I noticed coins#show had its own expires_in 5.minutes, public: true that bypassed the helper. Switched it over to public_expires_in, otherwise the same flash-leak bug would resurface from there.

After deploy, you have to run cloudflare:purge_personalized_pages once — already-poisoned cache entries don't auto-expire and will sit there until the URL's TTL runs out (/topics is 1 week).

Trap 3: signed-in users not seeing "+ New Topic" on /topics

A few hours later, third symptom: signed-in users on /topics couldn't see the "+ New Topic" button. Refresh didn't help. But the same user reaching /topics?r=1 from another page (cache-busting) — the button came back.

/topics is public_expires_in 1.week, the most aggressively cached. The view originally:

<% unless mobile_hotwire? %>
  <a href="<%= new_topic_path %>" class="btn-new-topic">+ Create</a>
<% end %>

<% if mobile_hotwire? %>
  <% if user_signed_in? %>
    <button data-controller="bridge--new-topic">+</button>
  <% end %>
<% end %>

Two guards: a UA guard (web vs native) and an auth guard (signed in or not). The CDN knows about neither — whichever variant fills the cache first is what gets served. The first anonymous web visitor caches "no native bridge button, has web button"; the first signed-in native visitor caches "has bridge button, no web button" — what the next visitor sees depends on which slot they land in.

Same approach as #117:

The web button — no branch, render it always:

<a href="<%= new_topic_path %>" class="btn-new-topic web-only">+ Create</a>

.web-only hides it under native UA via CSS. The cached HTML is always identical, the button is always there, UA decides whether to display it — CSS is a client-side decision, the cache doesn't participate.

The native bridge button — moved into a lazy personalize frame:

<% if mobile_hotwire? %>
  <turbo-frame id="topic-new-button" src="<%= personalize_topic_new_button_path %>" loading="eager"></turbo-frame>
<% end %>
<%# /personalize/topic_new_button template %>
<%= turbo_frame_tag "topic-new-button" do %>
  <% if user_signed_in? %>
    <button data-controller="bridge--new-topic">+</button>
  <% end %>
<% end %>

The main page HTML has only the <turbo-frame> container and the web button — neither branches on user. All "render this only for some users" markup has moved to the personalize endpoint, which is private, no-store and never enters the CDN.

The real lesson: CDN cache poisoning isn't a CDN bug

Each symptom looked like a different bug at the time. "Content missing" looked like a Hotwire Native problem; flash leaking looked like a cookies/session config issue; the missing button looked like a view template logic bug.

Stack the three PRs together and the root cause is one thing: the HTML body's content depends on request properties outside the cache key. The CDN's job is to cache HTML by cache key — it cannot possibly know that a given HTML is only correct for signed-in users, or only for Hotwire Native, or only for requests carrying flash.

Three reusable patterns to root this out:

  1. Push the variability out of the cached body — anything that branches per-user becomes a lazy turbo-frame backed by a private, no-store personalize endpoint. The main page HTML is invariant; the CDN can cache it any way and stay correct.

  2. Render a universal version, hide on the client — e.g. the web button is always in the HTML, CSS hides it under native UA. The branch lives in CSS/JS, not in cache-keyed HTML.

  3. Cache-boundary guard — for responses that already have state leaked into them (like flash), demote Cache-Control to private, no-store at the cache boundary. Normal traffic is untouched.

What these three share: the cached HTML and the per-user branching markup never overlap.

After shipping the fix, one more thing: poisoned CDN entries don't auto-expire and TTLs go out to a week. Write a one-shot rake task cloudflare:purge_personalized_pages to actively purge every suspect path — otherwise the bug keeps surfacing until natural expiry catches up.

A side note: lazy frames have problems beyond caching

PR #122 surfaced around the same time. Same shape, different bug type — worth calling out separately because it shares a hidden mechanism with the cache-poisoning ones.

The /following page uses a lazy turbo-frame to load the next page (standard infinite-scroll). The frame's src is computed by a helper called set_load_more_path, which decides the next-page URL based on the current controller/action.

def set_load_more_path(page:, anchor_id: nil)
    if controller_name == "coins" && action_name == "show"
        path = coin_symbol_path(...)
    elsif controller_name == "posts" && action_name == "hot"
        path = hot_path(...)
    # ... lots of branches ...
    elsif controller_name == "users" && action_name == "index"
        path = users_path(page: page, ...)
    # ...
    else
        path = posts_path(page: page, ...)  # ← fallback
    end
end

posts#following isn't in the branch list, so it falls through to the posts_path fallback — which is /posts, the explore feed.

Effect: /following page 1 is fine (the controller action sets @posts = following_posts itself), but page 2+ silently fetches /posts?page=2 via the lazy frame, loading explore-feed content — posts from people you don't follow drift in.

The fix is short:

elsif controller_name == "posts" && action_name == "following"
    path = following_feed_path(page: page, anchor_id: anchor_id, r: nil)

This is not the same category as the three cache poisoning bugsset_load_more_path's wrong branch has nothing to do with the CDN. But it shares the same invisible mechanism: content loaded by a lazy turbo-frame is content you won't actively review.

117's frame silently 302'd on error; #119/#121's content was wrong inside the cache and you couldn't see it; #122's frame path was wrong and you wouldn't catch it just by looking at page 1. Once you put "dynamic content" inside a lazy frame, you have to actively review what state happens after the frame loads — what Claude caught for me at PR review was exactly that "what does the page look like after the frame loads" check.

A self-audit checklist

If your Rails app sits behind a CDN (Cloudflare, Fastly, anyone), eight times out of ten you're hitting at least one of these:

  • In any view template using public_expires_in / expires_in ..., public: true, do you have an if user_signed_in? branch?
  • Does flash render directly in the layout? Will a public_expires_in path get cached when flash is present?
  • Does any Hotwire Native app have mobile_hotwire?-guarded markup on cacheable paths?
  • For any turbo-frame src endpoint — does its controller use before_action :authenticate_user!? Will signed-out requests 302?
  • Is any lazy turbo-frame src computed by a helper? Does the helper have a fallback branch? Is the fallback the wrong URL?

Have Claude run that five-item checklist — if any one hits, that's your next PR. These bugs don't surface on their own, because lazy frames silently fail, CDNs silently hit, and CSS silently hides. Three silent mechanisms stacked on top of each other, and a bug can hide in production for months.