The AI Tool Nobody At Uber Could Put Down

A number that should alarm every engineering leader

Imagine telling your CFO in mid-April that the full-year AI budget is gone. Not over-run by 20%. Gone. Four months into the fiscal year, with eight months still ahead. That is exactly what happened at Uber. In December 2025, Uber CTO Praveen Neppalli Naga rolled out Anthropic's Claude Code to roughly 5,000 engineers. By mid-April 2026, he was telling The Information that the company was "back to the drawing board" on AI budgeting. The tool had worked too well — and nobody had thought to ask what "working" would cost.

The leaderboard that ate the budget

Uber didn't just give engineers Claude Code. They built internal leaderboards ranking engineers by AI tool usage volume. Explicit incentive: use it more, rank higher. The result was predictable in hindsight, though apparently not in advance. Agentic coding usage went from 32% of engineers in February 2026 to 84% by March. Ninety-five percent of engineers were touching AI tools every month. Nearly 70% of code commits involved them in some capacity. Adoption metrics like that are the kind of thing you put in a board slide. Until the invoice arrives. Per-engineer monthly API costs ran between $500 and $2,000. At 5,000 engineers averaging $1,000/month, that's roughly $5M/month — $20M over four months. Consistent, in round numbers, with exhausting an annual AI coding tools budget before summer.

Where five thousand engineers' tokens actually went

Here is the part that I find genuinely clarifying, as someone who builds token tooling for a living: the problem wasn't that Claude Code was doing useless work. The problem was how agentic coding is architecturally expensive when nothing filters it. Every time an agent runs a task — reads a file, fires a shell command, checks test output, loops back — the entire context window is re-sent to the model. That's not a Claude-specific quirk; it's how stateless API calls work. Each loop turn starts from scratch. An agent doing 20 iterations on a refactor might send 200,000 input tokens per task without ever generating more than 300 lines of code. Now multiply the patterns that make this worse:

Full-directory scans at session start to orient the agent in the codebase
Repeated reads of the same files — the same 500-line module re-ingested on every turn
Verbose tool output folded wholesale into the next prompt: raw git log, complete stack traces, full test output
No context pruning between steps — the window grows, not shrinks

At Claude Sonnet pricing of $3/MTok input and $15/MTok output, a single session hitting 1M input tokens and 100K output costs $4.50. Ten sessions a day puts one developer at $1,350/month — squarely inside the reported range. The AI coding agent token cost data confirms input tokens drive 80–90% of the bill, and they're dominated by context re-reads, not by anything the model actually generates. The leaderboard made it structurally worse. An engineer optimizing for usage volume has no signal to be efficient. The system rewarded consumption; it didn't reward the thing consumption was supposed to produce. Uber COO Andrew Macdonald acknowledged this in a May 2026 interview: "It's very hard to draw a line between one of those stats and, 'Okay, now we're actually producing 25 percent more useful consumer features.'" For context, Uber's total R&D spend was $3.4 billion in 2025. The AI coding budget is a rounding error against that — but it was the rounding error that ran dry.

What an efficiency layer would have changed (ESTIMATE)

The levers behind Uber's bill are well-understood: redundant file reads, full-directory context dumps at session start, verbose tool output re-injected each turn, and no semantic retrieval to limit what gets loaded. These are precisely the patterns that token optimization targets. A conservative estimate: reduce input tokens 40–50% through smarter context management — semantic retrieval instead of directory dumps, output filtering on tool calls, cache-stable prefixes for shared context — and API cost drops proportionally. At the mid-range of $1,000/month per engineer across 5,000 engineers:

Scenario	Monthly spend	Annual run-rate
Baseline (as reported)	$5,000,000	$60,000,000
40% input reduction	$3,200,000	$38,400,000
50% input reduction	$2,750,000	$33,000,000

That 40–50% range is what independent benchmarks cite for tools applying semantic retrieval, context compression, and structure-first file reads — the techniques covered in the token reduction guide. Estimated savings at Uber's scale: $21M–$27M/year, or equivalently, the same budget covering 8–9 months instead of 4. To be fair about what optimization does and doesn't fix: the leaderboard governance problem is real and separate. Rewarding volume over outcomes is a management call; no tool patches that. But the architectural problem — no efficiency layer between the agent and the API — is exactly what tooling addresses. They were compounding each other. Fix the architecture and the governance failure at least stops getting amplified.

The lesson isn't "use AI less"

Uber is the clearest public example yet of what happens when a large engineering team adopts agentic coding at scale without a token efficiency layer. The LLM API token pricing is not a SaaS seat fee. It doesn't cap. It scales with every loop turn, every file re-read, every verbose tool output folded back into the next prompt. At 5,000 engineers, that compounding produces budget exhaustion in four months. The fix is not to use AI less. It is to stop sending tokens you don't need. Tokenade sits between your AI coding agent and the API — compressing context, filtering tool output, routing reads semantically — so the model sees what it needs without re-reading your entire codebase on every turn. The token cost calculator lets you run the numbers for your own team: what your current usage actually costs, and what a 40–50% reduction looks like in cash. Free up to approximately 10 million tokens saved. Start free — no card required.

Profiles are sourced from public statements, podcast interviews, Twitter/X posts, and Indie Hackers / Reddit threads cited inline. No private claims; if you spot a factual error, contact [email protected].