Key figures
TL;DR- $3 / $15Claude Sonnet 4.6 input / output per million tokens — the most common coding defaultAnthropic, Claude API Pricing docs, platform.claude.com (2026)
- 5×output tokens cost roughly five times input tokens across Claude modelsAnthropic, Claude API Pricing docs, platform.claude.com (2026)
- 10%cost of a cached input token vs. a fresh one under Anthropic's prompt cachingAnthropic, Prompt Caching docs, platform.claude.com (2026)
- $500–$2,000monthly Claude Code spend per engineer at Uber after 84% agentic adoptionThe Information / Fortune, reported May 2026
- ≈85%share of agentic session cost driven by input tokens, not outputVantage, 'The Hidden Cost Driver in Agentic Coding Sessions' (2026)
- $1.3MOpenAI token bill in 30 days running 100 Codex agents on OpenClawTom's Hardware / TheNextWeb, reporting on Peter Steinberger (2026)
- 50%Anthropic Batch API discount on both input and output tokens (async, ≤24 h)Anthropic, Claude API Pricing docs, platform.claude.com (2026)
Key Takeaways
Key numbers at a glance
- •Claude Sonnet 4.6 costs $3/MTok input and $15/MTok output — the most common coding default. [1]
- •Output tokens cost ≈5× more than input tokens across all Claude model tiers. [1]
- •Prompt-cache reads cost 10% of fresh input on Anthropic; cache writes carry a 1.25× premium for 5-min TTL or 2× for 1-hour TTL. [2]
- •Agentic coding tasks consume up to 1,000× more tokens than standard code-chat — a figure reported across multiple industry sources as real-world usage patterns have become clear. [6]
- •A typical 50-turn agentic session uses roughly 1 million input tokens and 40,000 output tokens — a 25:1 ratio — and costs $3–$6 on Sonnet. [7]
- •Uber burned its entire 2026 AI budget in four months on Claude Code: monthly spend ran $500–$2,000 per engineer at 84% agentic adoption. [10]
- •Re-sent context makes up ≈62% of the typical agent bill — the largest single lever is what the model reads, not what it writes. [8]
How much does a million tokens cost on Claude in 2026?
Claude's three production tiers — Opus 4.8, Sonnet 4.6, and Haiku 4.5 — span a 5× price range from the top to the bottom of the stack. Claude Opus 4.8 costs $5.00 per million input tokens and $25.00 per million output tokens. [1] Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens. It is the most commonly used coding-agent model, balancing capability and cost. [1] Claude Haiku 4.5 costs $1.00 per million input tokens and $5.00 per million output tokens — used for grunt-work steps where lower latency and cost matter more than peak reasoning. [1]| Model | Input ($/MTok) | Output ($/MTok) | Output/Input ratio | Context window |
|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 | 5× | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 5× | 200K |
| Claude Haiku 4.5 | $1.00 | $5.00 | 5× | 200K |
Source: Anthropic, Claude API Pricing docs (platform.claude.com, June 2026) [1]
How does Claude pricing compare to GPT-4.1 and Gemini 2.5 Flash?
Claude is not the only option for coding agents. GPT-4.1 from OpenAI and Gemini 2.5 Flash from Google are the most commonly benchmarked alternatives. GPT-4.1 costs $2.00 per million input tokens and $8.00 per million output tokens, with a 4× output-to-input ratio. [3] OpenAI applies automatic prompt caching on GPT-4.1 at a 75% discount on cached input tokens — higher than Anthropic's 90% savings but different in structure: OpenAI caches automatically with no explicit API call, while Anthropic requires acache_control breakpoint. [4]
Gemini 2.5 Flash costs $0.30 per million input tokens and $2.50 per million output tokens for non-thinking output. [5] Thinking tokens on Gemini 2.5 Flash are billed separately at $3.50 per million, meaning reasoning-heavy tasks cost substantially more than the headline rate suggests. [5]
| Model | Input ($/MTok) | Output ($/MTok) | Cache read discount | Context window |
|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | −90% (10% of input) | 200K |
| Claude Opus 4.8 | $5.00 | $25.00 | −90% (10% of input) | 200K |
| GPT-4.1 | $2.00 | $8.00 | −75% (automatic) | 1M |
| Gemini 2.5 Flash | $0.30 | $2.50 (non-thinking) | 1M |
Sources: Anthropic pricing [1], OpenAI pricing [3], Google AI pricing [5] (June 2026)
Why does the output/input price ratio matter so much for agents?
Output tokens cost 5× more than input tokens across the entire Claude model line — $25 vs $5 per million on Opus 4.8, $15 vs $3 on Sonnet 4.6, $5 vs $1 on Haiku 4.5. [1] At face value, this makes output the expensive side of the ledger. The counter-intuitive reality for coding agents is that input is the higher-cost line item in practice. A 2026 analysis of production agentic sessions found that input tokens drive approximately 85% of total session cost. [7] This is because the agent re-reads its full context — system prompt, tool definitions, conversation history, file contents, command output — on every single tool call. A 50-turn debugging session accumulates that context turn by turn, so by turn 30 the agent is paying to re-read 25,000–35,000 tokens of accumulated history on every request. [7] The output premium (5×) is real, but it applies to a much smaller volume. A typical 50-turn session generates roughly 40,000 output tokens and 1,000,000 input tokens — a 25:1 ratio. [7]How much does prompt caching save?
Anthropic's prompt caching bills cache-read tokens at 10% of the base input price — a 90% saving on repeated context. [2] The cache-write cost is 1.25× the base input price for a 5-minute TTL, or 2× for a 1-hour TTL. [2] The breakeven is favourable: with the 5-minute cache, a single re-read pays off the write premium; with the 1-hour cache, two re-reads are enough. For a 50-turn session on Sonnet 4.6 where the system prompt and tool definitions (10,000 tokens) are stable across all turns:- Without caching: 10,000 × 50 × $3 / 1,000,000 = $1.50 just for that static context
- With caching (1 write + 49 reads): (10,000 × 1.25 × $3 / 1,000,000) + (10,000 × 49 × $0.30 / 1,000,000) = $0.038 + $0.147 = $0.185 — a reduction of roughly 88%
How many tokens does a real coding session consume?
Production usage data from 2026 makes clear that agentic coding costs are not theoretical. At the session level, Vantage's analysis of production teams found a typical 50-turn coding session at approximately 1 million input + 40,000 output tokens, with indicative costs ranging from $0.60 (lighter models) to $6.00 (Opus-class), and average monthly spend of $400–$1,500 for developers running coding agents full-time. [7] At the organisation level, Uber burned its entire 2026 AI budget in four months after rolling out Claude Code to approximately 5,000 engineers. Monthly costs ran $500–$2,000 per engineer for power users; overall adoption reached 84% agentic and 95% of engineers using AI tools monthly. [10] Microsoft's response was to cancel Claude Code licences across its Experiences and Devices division and redirect engineers to GitHub Copilot CLI ($39/seat flat vs Claude Code's $20 seat + usage), with a deadline of 30 June 2026. [11] At the extreme end, Peter Steinberger (creator of OpenClaw) ran 100 Codex agents simultaneously for a full month, accumulating 603 billion tokens across 7.6 million requests — a total bill of $1,305,088.81 covered by OpenAI. [13] A research paper on agent context management (arXiv:2508.21433, presented at the NeurIPS 2025 DL4Code workshop) confirmed that re-read context is the dominant cost driver and found that simple observation masking — replacing old tool outputs with placeholders — halves agent cost while matching the solve rate of more expensive LLM-summarisation approaches. [6] LeanOps audited 30 engineering teams running agentic AI in production between March and May 2026, finding that re-sent context accounts for ≈62% of the total bill at those organisations. [8]| Session type | Approx. input tokens | Approx. output tokens | Indicative cost (Sonnet 4.6, no cache) |
|---|---|---|---|
| Typical 50-turn agentic session | ≈1,000,000 | ≈40,000 | ≈$3.60 |
| Complex single debugging session | 500,000+ | variable | $1.50+ |
| Multi-agent runaway loop (11 days) | billions | variable | $47,000 total [14] |
Sources: Vantage [7], dev.to postmortem [14]. Indicative costs use $3/MTok input + $15/MTok output; no caching applied.
What discounts are available beyond prompt caching?
Three additional cost-reduction mechanisms are available on Anthropic's platform as of 2026. Batch API (50% discount). The Anthropic Message Batches API processes requests asynchronously within 24 hours at exactly 50% off both input and output tokens. Claude Sonnet 4.6 drops from $3/$15 to $1.50/$7.50 per million; Haiku 4.5 from $1/$5 to $0.50/$2.50. [1] There is no quality difference between batch and synchronous responses — only timing. This is suited to offline code-analysis pipelines, nightly evaluation runs, and bulk refactoring jobs that can tolerate hours of latency. Combining caching and Batch API. The two discounts stack. A batch request that also hits the cache on a stable 10,000-token system prompt pays 50% off the write premium (1.25× × 0.5 = 0.625×) and 50% off cache-read tokens (0.1× × 0.5 = 0.05× — 5% of the base input price). [2] Model tier routing. Because Haiku 4.5 costs one-third of Sonnet 4.6 at every tier, routing simpler agent steps — file listing, output formatting, search-equivalent reads — to the lighter model is a compound saving. The LeanOps audit found that teams achieving 50–70% cost reductions within two weeks consistently combined per-user budget caps, prompt caching, model-tier routing, and context-window pruning. [8]What the numbers mean in aggregate
Cost levers ranked by typical impact
Sources: LeanOps audit [8], Vantage analysis [7], Anthropic pricing [1] (2026). Bar widths are qualitative rankings, not a single computed metric.
Methodology note
Per-token prices are from Anthropic's official pricing page (primary source, verified June 2026). Prompt-caching multipliers are from Anthropic's prompt caching documentation (primary source). OpenAI and Google prices are from their respective developer pricing pages. Session-volume figures are from the Vantage production analysis and the LeanOps industry audit of 30 production teams. The Uber and Microsoft figures are from The Information and Windows Central respectively. The OpenClaw figures are from Tom's Hardware and TheNextWeb, reporting on a public disclosure by Peter Steinberger. The $47,000 agent-loop and $6,000 overnight figures are from published postmortems on dev.to and MakeUseOf. The arXiv 2508.21433 findings are from a peer-reviewed paper presented at the NeurIPS 2025 DL4Code workshop. All figures should be re-verified at primary sources before citing in commercial contexts.Sources and references
- [1]Anthropic. "Pricing — Claude API Docs". platform.claude.com, 2026. Link ↗
- [2]Anthropic. "Prompt caching — Claude API Docs". platform.claude.com, 2026. Link ↗
- [3]OpenAI. "OpenAI API Pricing". openai.com, 2026. Link ↗
- [4]OpenAI. "Prompt caching — OpenAI Platform Docs". platform.openai.com, 2026. Link ↗
- [5]Google. "Gemini Developer API pricing". ai.google.dev, 2026. Link ↗
- [6]Lindenbauer et al. (JetBrains Research). "The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management". arXiv:2508.21433, presented at NeurIPS 2025 DL4Code workshop. Link ↗
- [7]Vantage. "The Hidden Cost Driver in Agentic Coding Sessions in 2026". vantage.sh, 2026. Link ↗
- [8]LeanOps. "AI Agents Burn 50x More Tokens Than Chats". leanopstech.com, 2026. Link ↗
- [9]GitHub. "Improving token efficiency in GitHub Agentic Workflows". github.blog, 2026. Link ↗
- [10]The Information. "Uber CTO Shows How Claude Code Can Blow Up AI Budgets". theinformation.com, May 2026. Link ↗
- [11]Windows Central. "Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI". windowscentral.com, 2026. Link ↗
- [12]MakeUseOf. "Someone left Claude Code running overnight, and it cost $6,000". makeuseof.com, 2026. Link ↗
- [13]Tom's Hardware. "OpenClaw creator burns through $1.3 million in OpenAI API tokens in a single month". tomshardware.com, 2026. Link ↗
- [14]Anhaia, Gabriel. "The Agent That Spent $47K on Itself: An Autonomous-Loop Postmortem". dev.to, 2026. Link ↗
Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.