Claude Code's token bill comes down to a handful of numbers: how many tokens a session actually consumes, the lopsided split between what it reads and what it writes, the discount caching gives you on repeated context, and the per-million-token rate of the model behind it. This page collects those numbers from primary sources so you don't have to reconstruct them from a billing dashboard at 2 a.m. I build token-reduction tooling for a living, so I read these analyses with a vested interest — and the headline I keep coming back to is unglamorous: the expensive thing is what the agent reads, not what it writes. All figures are as of June 2026; model versions and rates move fast, so re-verify at the primary source before you quote them in a contract.

Key Takeaways

Claude Code token usage at a glance

•A typical 50-turn agentic session burns roughly 1,000,000 input tokens and 40,000 output tokens — a 25:1 ratio. [1]
•Input tokens drive ≈85% of total session cost — the agent re-reads its whole context on every tool call. [1]
•Uber burned through its entire 2026 AI budget in four months after Claude Code adoption reached 84% across 5,000 engineers, at $500–$2,000 per heavy user per month. [2]
•Prompt-cache reads cost 10% of fresh input on Anthropic — a 90% saving on repeated context. [3]
•Re-sent context is ≈62% of the bill at production teams audited in 2026. [4]
•On Sonnet 5 ($2/$10 per MTok), that 50-turn session costs around $2.40 with no caching. [5]

How many tokens does a Claude Code session consume?

A typical 50-turn agentic coding session consumes roughly 1,000,000 input tokens and 40,000 output tokens — a 25:1 input-to-output ratio. [1] That ratio is the single most important fact about Claude Code's economics, and it surprises almost everyone the first time they see it. The mechanism is straightforward once you say it out loud. Claude Code is an agent: every tool call (read a file, run a test, grep the repo) ships the entire accumulated context back to the model — system prompt, tool definitions, conversation history, file contents, command output. By turn 30 of a debugging session, the agent is re-reading 25,000–35,000 tokens of history on every single request. The output it writes — a patch, a paragraph of explanation — is tiny by comparison. [1] Vantage's 2026 analysis of production developer costs put a typical full-time agent user at $400–$1,500 per month, with extreme single-day spikes past $4,000. [1] If your monthly bill looks like a salary line item, the context window accumulation pattern above is why.

25:1

input-to-output token ratio in a typical 50-turn session

Source: Vantage, 2026 [1]

≈85%

of total session cost driven by input, not output

Source: Vantage, 2026 [1]

Why is input the expensive side, when output costs 5× more per token?

This is the part that trips people up. On every Claude tier, output tokens are priced at 5× the input rate — $25 vs $5 per million on Opus 4.8, $10 vs $2 on Sonnet 5, $5 vs $1 on Haiku 4.5. [5] So the naive instinct is to make the agent write less. It's the wrong lever. Input tokens drive approximately 85% of total session cost in production, because the volume gap dwarfs the price gap. [1] A 25:1 volume ratio against a 5:1 price ratio means input still wins the bill by a factor of five. Telling Claude Code to "be concise" trims the cheapest, smallest part of your spend. The high-leverage move is the opposite: control what the agent reads on each turn. Retrieve only the relevant code instead of dumping whole files, prune stale conversation history, and don't re-feed the model output it already saw. That is precisely the gap output filtering and semantic code search are built to close.

How much do the tokens actually cost?

The per-million-token rate depends on which model is driving Claude Code. The three production Claude tiers span a 5× range.

Model	Input ($/MTok)	Output ($/MTok)	Cache read	≈ cost of one 50-turn session
Claude Opus 4.8	$5.00	$25.00	10% of input	≈$6.00
Claude Sonnet 5	$2.00	$10.00	10% of input	≈$3.60
Claude Haiku 4.5	$1.00	$5.00	10% of input	≈$1.20

Session cost assumes ≈1M input + 40K output tokens, no caching. Prices: Anthropic, Claude API Pricing docs (platform.claude.com, June 2026) [5]; session volume from Vantage [1].

For the full cross-vendor picture — GPT-5.5 at $5/$30 per MTok and Google's Gemini line included — see the LLM API token pricing reference and the AI coding agent token costs breakdown. The detail that matters here: cache reads on Anthropic are billed at 10% of the base input price. [3]

How much does prompt caching change the Claude Code bill?

Anthropic prompt caching bills cache-read tokens at 10% of the base input price — a 90% saving on repeated context — while a cache write costs 1.25× the base input price for a 5-minute TTL, or 2× for a 1-hour TTL. [3] The catch is that the cached prefix must be byte-identical across requests. Reorder one instruction and the cache misses. GitHub measured this in their own production agentic workflows in 2026 and found that keeping tool definitions and system prompts stable across turns, and pruning unused MCP tool schemas, was enough to cut per-workflow token costs by up to 62%. [6] When the prefix drifts — which is the default for many agent harnesses — you pay full freight. Caching helps with the stable prefix. It does nothing for the part of the bill that is genuinely new-but-redundant: the same file re-read three turns apart, the test output you already acted on, the directory listing you've now seen five times. That residue is the bulk of what context compression targets.

How much of the Claude Code bill is just re-sent context?

A lot. A 2026 LeanOps audit of 30 engineering teams running agentic AI in production found that re-sent context accounts for ≈62% of the total bill. [4] The same audit reported that teams hitting 50–70% cost reductions within two weeks consistently combined per-user budget caps, prompt caching, model-tier routing, and context-window pruning. [4] That is the headline number for anyone trying to cut a Claude Code bill: roughly six of every ten dollars is the model re-reading things it has already seen. You cannot fix that by switching models or writing terser prompts — only by changing what gets sent.

Where the Claude Code bill actually goes

Re-sent / redundant context≈62% of bill

Net-new input (relevant code, fresh tool output)remainder of input

Output (patches, explanations)≈15% of cost

Sources: LeanOps audit [4], Vantage analysis [1] (2026). Bar widths are qualitative rankings, not a single computed metric.

What does Claude Code actually cost at scale?

Enterprise and individual data points from 2026 give a concrete sense of how token spend scales with usage intensity. Uber burned through its entire 2026 AI tools budget by April — four months in — after Claude Code adoption jumped from 32% to 84% across its 5,000-engineer workforce. Heavy users were running up $500–$2,000 per engineer per month, and Uber's leadership acknowledged they could not yet quantify the productivity return despite 70%+ of committed code being AI-generated. [2] Peter Steinberger's team ran roughly 100 Codex instances in the cloud for a month, working on the OpenClaw project, and posted a bill of $1.3 million — 603 billion tokens across 7.6 million requests. One caveat worth carrying: the exact figures come from a screenshot in Steinberger's own post rather than a published statement. But the numbers remain the most concrete public benchmark for autonomous agentic coding at scale. [7] A Reddit developer left a Claude Code loop running overnight — polling for software updates every 30 minutes — and woke up to a $6,000 bill. The culprit was a combination of an 800,000-token context window being rebuilt from scratch on every cycle (Anthropic had quietly changed the default cache TTL from 1 hour to 5 minutes) and no spending cap. There was no real-time dashboard to warn him; the first alert was the email confirming the damage. [8] At the other end of the scale, Henry Godnick documented a surprise $80 bill from a batch-processing script he left running unattended — a reminder that even small loops compound fast when each iteration rebuilds a multi-turn context. [9] These data points share a common structure: the cost spike is not a single large model call but many medium calls, each re-sending a growing context. The practical implication is that the savings dashboard matters most precisely when a session is long or a loop is running — the exact moments when you're least likely to be watching.

What does research say about agent context management costs?

A peer-reviewed study — Lindenbauer et al. (JetBrains Research), arXiv:2508.21433, published August 2025 — benchmarked context-management strategies for LLM-based software engineering agents on SWE-bench Verified across five model configurations. [10] The headline finding is counter-intuitive: simple observation masking (truncating older tool outputs) halves cost relative to the raw agent baseline while matching the solve rate of more expensive LLM-based summarisation. With Qwen3-Coder 480B, observation masking was 52% cheaper than the raw baseline and improved solve rate by 2.6 percentage points. The study challenges the assumption that smarter context management necessarily means more LLM calls — sometimes it just means not sending old data. The broader implication for Claude Code users: the agent harness design (what it keeps, what it discards, how it handles long tool outputs) is as important a cost variable as the model tier you choose.

Methodology note

Per-token prices are from Anthropic's official pricing page (primary source, verified June 2026). Prompt-caching multipliers are from Anthropic's prompt caching documentation (primary source). Session-volume and cost-split figures are from the Vantage production analysis. The 62% re-sent-context figure is from an industry audit of 30 production teams, not an academic study, and should be treated as indicative. Enterprise cost data (Uber, OpenClaw) is from contemporaneous press reporting. The Lindenbauer et al. arXiv preprint (2508.21433) is the academic source for context-management cost benchmarks on SWE-bench Verified. Indicative session costs use the published Anthropic rates with the Vantage volume profile and no caching applied. Re-verify every figure at its primary source before citing in commercial contexts.

Want these numbers to go down?

Tokenade sits in front of Claude Code (and Cursor, Codex, Copilot, Windsurf) and cuts the input side automatically — semantic code search instead of whole-file dumps, output filtering, skeleton compression, and lazy MCP loading — with a dashboard that shows exactly what you saved. Free up to ≈10M tokens/month; Pro is $24.90/mo (excl. tax), unlimited machines. Source-available, MIT-licensed.

See how Tokenade reduces Claude Code token usage →

Sources and references

[1]Vantage. "The Hidden Cost Driver in Agentic Coding Sessions in 2026". vantage.sh, 2026. Link ↗
[2]Storyboard18 / The Information. "Uber exhausts 2026 AI budget in four months amid massive Claude Code adoption". May 2026. Link ↗
[3]Anthropic. "Prompt caching — Claude API Docs". platform.claude.com, 2026. Link ↗
[4]LeanOps. "AI Agents Burn 50x More Tokens Than Chats". leanopstech.com, 2026. Link ↗
[5]Anthropic. "Pricing — Claude API Docs". platform.claude.com, 2026. Link ↗
[6]GitHub. "Improving token efficiency in GitHub Agentic Workflows". github.blog, 2026. Link ↗
[7]Tom's Hardware. "OpenClaw creator burns through $1.3 million in OpenAI API tokens in a single month — 603 billion tokens across 7.6 million requests and 100 coding agents". 2026. Link ↗
[8]MakeUseOf. "Someone left Claude Code running overnight, and it cost $6,000". makeuseof.com, 2025. Link ↗
[9]Godnick, Henry. "What Happened When I Got a Surprise $80 Claude Bill". dev.to, 2026. Link ↗
[10]Lindenbauer, Tobias et al. (JetBrains Research). "The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management". arXiv:2508.21433, August 2025. Link ↗

Claude Code Token Usage Statistics (2026)

Key figures