Case study · Incident retro
Cache Stampede During Deploy — 14 minutes to all-clear
A routine deploy bumped a cache key prefix. Every request became a miss, the backend pool saturated, p99 latency ran from 80ms to 2,400ms. We had 15 minutes of error budget left for the quarter. Here's the play-by-play.
The metric — p99 latency vs cache hit rate
Press Replay to scrub the 14 minutes. Two signals: p99 latency (rose) overlaid with cache hit rate (cyan). The story is in the gap between them.
Timeline
- T+00:00 Deploy goes live
Service rev v2.4.1 ships. Cache key prefix was bumped as part of the schema change.
- T+00:12 p99 jumps 80ms → 2,400ms
Every request becomes a cache miss. Backend pool starts saturating.
- T+00:45 Cache hit rate 94% → 12%
Coalescing isn't in place — N concurrent misses → N upstream calls per key.
- T+02:00 PagerDuty: HighErrorBudgetBurn
Burn rate alert fires. I'm on the bridge with the deploy lead inside 90s.
- T+03:30 Root cause confirmed
Key prefix bump in the deploy config invalidated the entire cache namespace.
- T+05:00 Mitigation rolled
Pushed singleflight coalescing + a synchronous warm-up step on the canary pod.
- T+07:00 Hit rate climbing through 60%
Coalescing is doing its job — one backend hit per key per concurrent burst.
- T+11:00 p99 back under 200ms
Below SLO. Auth, search and feed dashboards all green.
- T+14:00 All-clear, runbook updated
Cache-prefix changes added to the migrations checklist; coalescing standardized in service template.
What I did
- Shipped singleflight request coalescing in the cache client — one upstream call per key per concurrent burst, not N.
- Added a synchronous cache warm-up gate to the deploy pipeline whenever a key prefix changes; canary pods don't take traffic until hit rate ≥ 70%.
- Lowered the error-budget burn alert from 5m to 2m — earlier paging on this exact failure class.
- Wrote and dry-ran the runbook: prefix-bump diagnosis in under 60 seconds.
What I learned
Cache prefix changes are deploys disguised as config. Treat them like DB migrations: pre-warm, blue-green, and verify hit rate before draining the old fleet.
Coalescing is cheaper than scaling. A 1-line singleflight wrapper beat 8x of backend capacity at a fraction of the cost.
Synthetic monitoring should include cache hit rate, not just latency. Latency tells you something's wrong; hit rate tells you what.