A number that should alarm every engineering leader
Imagine telling your CFO in mid-April that the full-year AI budget is gone. Not over-run by 20%. Gone. Four months into the fiscal year, with eight months still ahead. That is exactly what happened at Uber. In December 2025, Uber CTO Praveen Neppalli Naga rolled out Anthropic's Claude Code to roughly 5,000 engineers. By mid-April 2026, he was telling The Information that the company was "back to the drawing board" on AI budgeting. The tool had worked too well — and nobody had thought to ask what "working" would cost.The leaderboard that ate the budget
Uber didn't just give engineers Claude Code. They built internal leaderboards ranking engineers by AI tool usage volume. Explicit incentive: use it more, rank higher. The result was predictable in hindsight, though apparently not in advance. Agentic coding usage went from 32% of engineers in February 2026 to 84% by March. Ninety-five percent of engineers were touching AI tools every month. Nearly 70% of code commits involved them in some capacity. Adoption metrics like that are the kind of thing you put in a board slide. Until the invoice arrives. Per-engineer monthly API costs ran between $500 and $2,000. At 5,000 engineers averaging $1,000/month, that's roughly $5M/month — $20M over four months. Consistent, in round numbers, with exhausting an annual AI coding tools budget before summer.Where five thousand engineers' tokens actually went
Here is the part that I find genuinely clarifying, as someone who builds token tooling for a living: the problem wasn't that Claude Code was doing useless work. The problem was how agentic coding is architecturally expensive when nothing filters it. Every time an agent runs a task — reads a file, fires a shell command, checks test output, loops back — the entire context window is re-sent to the model. That's not a Claude-specific quirk; it's how stateless API calls work. Each loop turn starts from scratch. An agent doing 20 iterations on a refactor might send 200,000 input tokens per task without ever generating more than 300 lines of code. Now multiply the patterns that make this worse:- Full-directory scans at session start to orient the agent in the codebase
- Repeated reads of the same files — the same 500-line module re-ingested on every turn
- Verbose tool output folded wholesale into the next prompt: raw
git log, complete stack traces, full test output - No context pruning between steps — the window grows, not shrinks
What an efficiency layer would have changed (ESTIMATE)
The levers behind Uber's bill are well-understood: redundant file reads, full-directory context dumps at session start, verbose tool output re-injected each turn, and no semantic retrieval to limit what gets loaded. These are precisely the patterns that token optimization targets. A conservative estimate: reduce input tokens 40–50% through smarter context management — semantic retrieval instead of directory dumps, output filtering on tool calls, cache-stable prefixes for shared context — and API cost drops proportionally. At the mid-range of $1,000/month per engineer across 5,000 engineers:| Scenario | Monthly spend | Annual run-rate |
|---|---|---|
| Baseline (as reported) | $5,000,000 | $60,000,000 |
| 40% input reduction | $3,200,000 | $38,400,000 |
| 50% input reduction | $2,750,000 | $33,000,000 |
The lesson isn't "use AI less"
Uber is the clearest public example yet of what happens when a large engineering team adopts agentic coding at scale without a token efficiency layer. The LLM API token pricing is not a SaaS seat fee. It doesn't cap. It scales with every loop turn, every file re-read, every verbose tool output folded back into the next prompt. At 5,000 engineers, that compounding produces budget exhaustion in four months. The fix is not to use AI less. It is to stop sending tokens you don't need. Tokenade sits between your AI coding agent and the API — compressing context, filtering tool output, routing reads semantically — so the model sees what it needs without re-reading your entire codebase on every turn. The token cost calculator lets you run the numbers for your own team: what your current usage actually costs, and what a 40–50% reduction looks like in cash. Free up to approximately 20 million tokens saved. Start free — no card required.Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.
Profiles are sourced from public statements, podcast interviews, Twitter/X posts, and Indie Hackers / Reddit threads cited inline. No private claims; if you spot a factual error, contact [email protected].