A script that ran while he slept
Picture this: it's late, you've just built something clever. Claude Code, set to check for software updates every 30 minutes, running on autopilot. You go to sleep feeling productive. You wake up to a $6,000 bill. That is exactly what happened to a developer who posted about it on Reddit. Not a rogue process. Not a typo in a billing tier. The script did exactly what it was told — wake up, check for updates, go back to sleep. The catastrophe came from two things colliding silently in the dark. I build token tooling for a living. When I first read this story, I didn't think "what a careless developer." I thought: I know that feeling of trusting a loop that was supposed to save time.How context becomes a money fire
Claude Code doesn't reset its memory between loop iterations when you keep a session alive. Every tool call, every response, every intermediate reasoning step accumulates into a context window that grows like sediment. Early in a session that's fine — a few hundred tokens. Hours in, it's a different animal. By the time this developer's loop had been running overnight, it had built up an 800,000-token conversation history. That history attached itself to every single request. Normally, this is where prompt caching earns its keep. You pay the full input price once to write the cache, then every subsequent call reads from it at a fraction of the cost. For Claude Sonnet 4.x:| Token type | Cost per MTok |
|---|---|
| Base input | $3.00 |
| Cache write (5-min TTL) | $3.75 |
| Cache read (hit) | $0.30 |
| Output | $15.00 |
- Cache hit: 800k × $0.30/MTok = $0.24 per cycle
- Cache miss (full rebuild): 800k × $3.75/MTok = $3.00 per cycle
The five-minute window that cost $6,000
Here is the part that makes this story genuinely unfair. At some point in early March 2026, Anthropic quietly changed Claude Code's default prompt cache TTL from 1 hour to 5 minutes. No announcement. No changelog entry. Nothing in the release notes. The change was only confirmed weeks later when a developer namedseanGSISG opened GitHub issue #46829 on April 12, 2026, after doing what I would have done: actually looking at the data. They analysed 119,866 API calls across two machines and found a clean phase transition — 1h cache writes up until around March 6–8, then 5m cache writes from that point on, with nothing in between.
So here is the math that destroyed this developer's night:
- Loop interval: 30 minutes
- Cache TTL (now): 5 minutes
- Gap between interval and TTL: 25 minutes of guaranteed cold cache
What the cache giveth, the TTL taketh away
The GitHub issue analysis made something else clear: this wasn't only a problem for developers running overnight loops. Even ordinary Claude Code subscribers who had never hit their limits saw a 20–32% increase in cache creation costs after the TTL change. The loop story is extreme, but the underlying mechanism — silent TTL regression forcing unnecessary full rebuilds — affected everyone. XDA Developers documented the pattern: every cache bust on a large context window is a full rebuild. In agentic coding sessions, where context grows into the hundreds of thousands of tokens as a matter of course, this is not a niche edge case. It's the normal operating condition for any agent that runs long enough. The developer lost $6,000 in a single night. That figure is not approximate — it comes directly from the MakeUseOf report of the Reddit incident.What compression would have changed (an honest estimate)
This is an estimate. I want to be precise about what compression actually addresses here. The root lever is the 800,000-token history being rebuilt 48 times. A token-optimization proxy like Tokenade intercepts every tool response before it enters the model — stripping verbose scaffolding, compressing output, summarising earlier turns before they accumulate — which is the core idea behind reducing AI coding agent token usage. A conservative 50% compression of that history would reduce each cycle's context to ≈400k tokens. Assuming the same 48 cold-cache cycles (worst case: TTL mismatch still present):- Uncompressed per cycle (cache write): 800k × $3.75/MTok = $3.00
- Compressed per cycle (cache write): 400k × $3.75/MTok = $1.50
- Estimated overnight saving on cache writes alone: ≈48 × $1.50 = ≈$72
Three lines of code worth more than hindsight
Reading this story, three guardrails stand out — none of them require any new tooling:- A hard token budget on the loop. Claude Code supports
--max-tokensflags. Capping the context forces it to summarise and trim rather than accumulate indefinitely. - Spend alerts configured in advance. Anthropic's dashboard lag makes reactive monitoring useless. The only alert that arrives in time is the one you set before you go to sleep.
- Input compression on every cycle, so the growing conversation history doesn't compound into the cache write cost.
Sources: MakeUseOf — Someone left Claude Code running overnight, and it cost $6,000 · XDA Developers — Anthropic quietly nerfed Claude Code's 1-hour cache · GitHub issue #46829 — Cache TTL silently regressed from 1h to 5m · Anthropic pricing docs
Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.
Profiles are sourced from public statements, podcast interviews, Twitter/X posts, and Indie Hackers / Reddit threads cited inline. No private claims; if you spot a factual error, contact [email protected].