Case study · indie SaaS

I Watched Someone's $6,000 Mistake Unfold Line by Line

He set up a 30-minute loop, went to sleep, and woke up to a $6,000 bill. The script wasn't broken — a silent infrastructure change had turned every cache hit into a full rebuild, 48 times before sunrise.

Profile photo of Paul Irolla

By Paul Irolla

Founder · AI & developer tools · Tokenade

Ph.D. in AI · builds token-optimization tooling for AI coding agents

View author page
5 min read
Cite this page

A script that ran while he slept

Picture this: it's late, you've just built something clever. Claude Code, set to check for software updates every 30 minutes, running on autopilot. You go to sleep feeling productive. You wake up to a $6,000 bill. That is exactly what happened to a developer who posted about it on Reddit. Not a rogue process. Not a typo in a billing tier. The script did exactly what it was told — wake up, check for updates, go back to sleep. The catastrophe came from two things colliding silently in the dark. I build token tooling for a living. When I first read this story, I didn't think "what a careless developer." I thought: I know that feeling of trusting a loop that was supposed to save time.

How context becomes a money fire

Claude Code doesn't reset its memory between loop iterations when you keep a session alive. Every tool call, every response, every intermediate reasoning step accumulates into a context window that grows like sediment. Early in a session that's fine — a few hundred tokens. Hours in, it's a different animal. By the time this developer's loop had been running overnight, it had built up an 800,000-token conversation history. That history attached itself to every single request. Normally, this is where prompt caching earns its keep. You pay the full input price once to write the cache, then every subsequent call reads from it at a fraction of the cost. For Claude Sonnet 4.x:
Token typeCost per MTok
Base input$3.00
Cache write (5-min TTL)$3.75
Cache read (hit)$0.30
Output$15.00
Source: Anthropic pricing docs On 800,000 tokens, the gap between those two paths is brutal:
  • Cache hit: 800k × $0.30/MTok = $0.24 per cycle
  • Cache miss (full rebuild): 800k × $3.75/MTok = $3.00 per cycle
A 12.5× multiplier. Per cycle. Before a single output token is counted.

The five-minute window that cost $6,000

Here is the part that makes this story genuinely unfair. At some point in early March 2026, Anthropic quietly changed Claude Code's default prompt cache TTL from 1 hour to 5 minutes. No announcement. No changelog entry. Nothing in the release notes. The change was only confirmed weeks later when a developer named seanGSISG opened GitHub issue #46829 on April 12, 2026, after doing what I would have done: actually looking at the data. They analysed 119,866 API calls across two machines and found a clean phase transition — 1h cache writes up until around March 6–8, then 5m cache writes from that point on, with nothing in between. So here is the math that destroyed this developer's night:
  • Loop interval: 30 minutes
  • Cache TTL (now): 5 minutes
  • Gap between interval and TTL: 25 minutes of guaranteed cold cache
Every single wake-up found a cache that had already expired. The 800,000-token history was rebuilt from scratch 48 times before morning. Forty-eight full cache writes at $3.75/MTok instead of 48 cache reads at $0.30/MTok. There was no live counter to watch. Anthropic's usage dashboard updates with a delay of several days. The developer's first warning was an email notification — at which point the money was already gone.

What the cache giveth, the TTL taketh away

The GitHub issue analysis made something else clear: this wasn't only a problem for developers running overnight loops. Even ordinary Claude Code subscribers who had never hit their limits saw a 20–32% increase in cache creation costs after the TTL change. The loop story is extreme, but the underlying mechanism — silent TTL regression forcing unnecessary full rebuilds — affected everyone. XDA Developers documented the pattern: every cache bust on a large context window is a full rebuild. In agentic coding sessions, where context grows into the hundreds of thousands of tokens as a matter of course, this is not a niche edge case. It's the normal operating condition for any agent that runs long enough. The developer lost $6,000 in a single night. That figure is not approximate — it comes directly from the MakeUseOf report of the Reddit incident.

What compression would have changed (an honest estimate)

This is an estimate. I want to be precise about what compression actually addresses here. The root lever is the 800,000-token history being rebuilt 48 times. A token-optimization proxy like Tokenade intercepts every tool response before it enters the model — stripping verbose scaffolding, compressing output, summarising earlier turns before they accumulate — which is the core idea behind reducing AI coding agent token usage. A conservative 50% compression of that history would reduce each cycle's context to ≈400k tokens. Assuming the same 48 cold-cache cycles (worst case: TTL mismatch still present):
  • Uncompressed per cycle (cache write): 800k × $3.75/MTok = $3.00
  • Compressed per cycle (cache write): 400k × $3.75/MTok = $1.50
  • Estimated overnight saving on cache writes alone: ≈48 × $1.50 = ≈$72
I'll be honest: $72 against a $6,000 bill sounds underwhelming. But the output tokens and additional scaffolding account for a significant share of the bill too, and that is not what compression targets. What compression targets is the part that compounds: the history that grew all night and multiplied every subsequent cache write. The real question isn't whether $72 or $600 would have been saved. It's whether the context ever reaches 800,000 tokens at all, if every tool response is compacted on the way in — before it sediments into the history that gets rebuilt from scratch, 48 times, between midnight and 6am. Tokenade's free tier covers up to 20 million tokens compressed per month. For a loop like this, the compression costs nothing. The $9.90/month paid tier covers sessions well beyond what this script would have generated.

Three lines of code worth more than hindsight

Reading this story, three guardrails stand out — none of them require any new tooling:
  1. A hard token budget on the loop. Claude Code supports --max-tokens flags. Capping the context forces it to summarise and trim rather than accumulate indefinitely.
  2. Spend alerts configured in advance. Anthropic's dashboard lag makes reactive monitoring useless. The only alert that arrives in time is the one you set before you go to sleep.
  3. Input compression on every cycle, so the growing conversation history doesn't compound into the cache write cost.
Tokenade sits at layer 3. It intercepts every tool response and context chunk before it reaches the model, compresses it, and lets you keep the same loop logic without the ballooning footprint. If you run any kind of automated or overnight agent: start for free at tokenade.net — no credit card required, no quota surprises on day one.
Sources: MakeUseOf — Someone left Claude Code running overnight, and it cost $6,000 · XDA Developers — Anthropic quietly nerfed Claude Code's 1-hour cache · GitHub issue #46829 — Cache TTL silently regressed from 1h to 5m · Anthropic pricing docs

Up to 88% fewer tokens. Zero config.

Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.

Profiles are sourced from public statements, podcast interviews, Twitter/X posts, and Indie Hackers / Reddit threads cited inline. No private claims; if you spot a factual error, contact [email protected].