Key figures
TL;DR- $5.00 / $25.00Claude Opus 4.8 input / output per million tokens — Anthropic's flagship modelAnthropic, anthropic.com/pricing, June 2026
- $5.00 / $30.00GPT-5.5 input / output per million tokens — OpenAI's frontier flagshipOpenAI, openai.com/api/pricing, June 2026
- $1.50 / $9.00Gemini 3.5 Flash input / output per million tokens — Google's latest fast modelGoogle, ai.google.dev/gemini-api/docs/pricing, June 2026
- 4–6×output tokens cost 4–6× more than input tokens across major 2026 modelsDerived from Anthropic, OpenAI, and Google official pricing pages, June 2026
- 90%prompt cache-read discount on Claude (cache read = 10% of base input price)Anthropic, anthropic.com/pricing, June 2026
- 50%batch API discount offered by Anthropic, OpenAI, and Google on all listed modelsAnthropic, OpenAI, and Google official pricing pages, June 2026
- $0.20 / $1.25GPT-5.4-nano input / output per million tokens — lowest-cost OpenAI text model in 2026OpenAI, platform.openai.com/docs/pricing, June 2026
Why these numbers matter
The per-million-token prices from Anthropic, OpenAI, and Google are the single most actionable numbers for any developer running AI workloads at scale. A model that costs 5× more per output token doesn't just cost 5× more per response — it compounds across every turn of an agent loop, every test-generation run, and every large-file refactor. This page collects the canonical June 2026 figures from official pricing pages so developers can reason about cost before optimising. Two ratios determine your bill more than the model name: how much more output costs than input (the output premium), and how cheaply caching serves repeated context (the caching discount). Both are covered here with exact numbers. For how these prices translate into real coding-session costs, see AI Coding Agent Token Costs. For the mechanics behind reducing the volume of tokens you send and receive, see the guide on reducing AI coding agent token usage.Key Takeaways
- •Claude Opus 4.8 costs $5.00 input / $25.00 output per million tokens — the 5× output premium is uniform across all three Claude tiers [1]
- •GPT-5.5 costs $5.00 input / $30.00 output per million tokens — a 6× output premium, the steepest among the flagships surveyed [2]
- •Gemini 3.5 Flash costs $1.50 input / $9.00 output per million tokens — the most affordable single-tier flagship in 2026 [3]
- •Claude prompt cache reads cost $0.50/MTok on Opus 4.8 — a 90% discount vs the $5.00 base input price [1]
- •Batch APIs cut costs by 50% across Anthropic, OpenAI, and Google for asynchronous workloads [1][2][3]
- •GPT-5.4-nano at $0.20 input / $1.25 output is the cheapest OpenAI text model — 25× cheaper input than GPT-5.5 [4]
Anthropic Claude pricing in 2026
Anthropic publishes three active Claude 4.x models as of June 2026: Opus 4.8, Sonnet 4.6, and Haiku 4.5. All prices are list rates in USD per million tokens for standard (non-batch) synchronous API calls. [1]Claude model prices — Input vs Output ($/MTok)
Source: [1] Anthropic, anthropic.com/pricing, June 2026
Claude prompt caching prices
Anthropic's prompt caching allows frequently reused context (system prompts, reference documents, tool definitions) to be stored and re-read at a steep discount. The TTL is 5 minutes for standard prompt caching. [1]| Model | Base Input $/MTok | Cache Write $/MTok | Cache Read $/MTok | Cache Read Discount |
|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $6.25 | $0.50 | 90% |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $0.30 | 90% |
| Claude Haiku 4.5 | $1.00 | $1.25 | $0.10 | 90% |
Source: [1] Anthropic, anthropic.com/pricing, June 2026. Cache write is priced at 1.25× base input; cache read at 0.10× base input (5-minute TTL).
OpenAI GPT-5.x pricing in 2026
OpenAI's 2026 lineup has shifted from the GPT-4.x generation to a GPT-5.x family. The main production models are GPT-5.5 (frontier), GPT-5.4 (performance/cost balance), GPT-5.4-mini (lightweight), and GPT-5.4-nano (ultra-low-cost). All prices below are standard (non-batch) list rates. [2][4] GPT-5.5, the flagship model, is priced at $5.00 per million input tokens and $30.00 per million output tokens, with cached input at $0.50/MTok. [2] Context window: 1 million tokens. GPT-5.4 sits at $2.50 / $15.00 with cached input at $0.25/MTok. [4] Context window: 1.05 million tokens. Note: prompts exceeding 272k tokens are billed at 2× input and 1.5× output for the full session. GPT-5.4-mini is priced at $0.75 / $4.50 with cached input at $0.075/MTok. [4] Context window: 400k tokens. GPT-5.4-nano, OpenAI's cheapest text model, costs $0.20 / $1.25 with cached input at $0.02/MTok. [4]OpenAI GPT-5.x model pricing — Input cost comparison ($/MTok)
Source: [2][4] OpenAI, openai.com/api/pricing and platform.openai.com/docs/pricing, June 2026
Google Gemini 3.x pricing in 2026
Google's Gemini API has moved to a Gemini 3.x generation as of mid-2026. The primary developer-facing models are Gemini 3.5 Flash (the new fast flagship), Gemini 3.1 Flash-Lite (lightweight), and Gemini 3.1 Pro Preview (high-performance, preview status). [3] Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens (paid tier), with context caching at $0.15/MTok. [3] Context window: 1 million tokens. The output price includes thinking tokens. Gemini 3.1 Flash-Lite costs $0.25 input / $1.50 output per million tokens (text/image/video; audio is double), with context caching at $0.025/MTok. [3] Context window: 1 million tokens. Gemini 3.1 Pro Preview operates on a two-tier input price: $2.00 / $12.00 per million tokens for prompts up to 200k tokens, scaling to $4.00 / $18.00 for prompts exceeding 200k tokens. Context caching is $0.20/MTok (≤200k) or $0.40/MTok (>200k). [3] Context window: 1 million tokens.| Model | Input $/MTok | Output $/MTok | Context Cache $/MTok | Output Premium |
|---|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | 6× |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $0.025 | 6× |
| Gemini 3.1 Pro Preview (≤200k) | $2.00 | $12.00 | $0.20 | 6× |
| Gemini 3.1 Pro Preview (>>200k) | $4.00 | $18.00 | $0.40 | 4.5× |
Source: [3] Google, ai.google.dev/gemini-api/docs/pricing, June 2026. Context caching = cache-read price. Output includes thinking tokens where applicable.
Master comparison: all models, all prices (June 2026)
| Provider | Model | Input $/MTok | Output $/MTok | Output Premium | Context Window |
|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.8 | $5.00 | $25.00 | 5× | 1M tokens |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 5× | 1M tokens |
| Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 5× | 200k tokens |
| OpenAI | GPT-5.5 | $5.00 | $30.00 | 6× | 1M tokens |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | 6× | 1.05M tokens |
| OpenAI | GPT-5.4-mini | $0.75 | $4.50 | 6× | 400k tokens |
| OpenAI | GPT-5.4-nano | $0.20 | $1.25 | 6.25× | |
| Gemini 3.5 Flash | $1.50 | $9.00 | 6× | 1M tokens | |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 6× | 1M tokens | |
| Gemini 3.1 Pro Preview (≤200k) | $2.00 | $12.00 | 6× | 1M tokens |
Sources: [1][2][3][4]. All prices are standard (non-batch) list rates per million tokens ($/MTok), USD, as of June 2026. Verify before quoting in a cost model — prices change at model release.
The output premium: why it dominates your bill
The output-to-input price ratio ranges from 5× to 6.25× across the models on this page. Claude's 5× ratio is the most favourable for output-heavy workloads; OpenAI and Gemini consistently run at 6×.Batch API and caching: the two structural discounts
Both prompt caching and batch APIs represent orthogonal discounts — they can be combined, and they target different workload shapes. Batch processing (50% off, all three providers) suits large asynchronous jobs: nightly test runs, bulk document processing, offline refactor passes. The constraint is a 24-hour turnaround window and the inability to use real-time results mid-run. Prompt caching (90% off cache reads on Claude; ≈90% on Gemini; ≈90% on OpenAI cached input) targets repeated context — system prompts, knowledge bases, long preambles — that recurs on every turn of a live conversation or agent loop. It rewards real-time interactive workloads that batch APIs cannot serve. Applied together (for batch workloads that also carry repeated context), the effective discount on cached input tokens is multiplicative: 50% batch × 90% cache-read = 95% discount on that portion of input. At scale on Opus 4.8, that is a $0.25/MTok effective input price for cached context in a batch job, versus the $5.00 standard list rate.Source notes
Sources and references
- [1]Anthropic. "API Pricing — Claude models". anthropic.com/pricing, June 2026. Link ↗
- [2]OpenAI. "OpenAI API Pricing". openai.com/api/pricing, June 2026. Link ↗
- [3]Google. "Gemini Developer API Pricing". ai.google.dev/gemini-api/docs/pricing, June 2026. Link ↗
- [4]OpenAI. "Pricing — Detailed model table". platform.openai.com/docs/pricing, June 2026. Link ↗
- [5]Anthropic. "Models overview — context windows and max output". platform.claude.com/docs/en/about-claude/models/overview, June 2026. Link ↗
All prices are standard (non-batch) USD list rates per million tokens as of June 2026. Batch discounts (50%), data-residency surcharges (typically +10%), and volume-tier discounts are not reflected in the base rates. Re-verify all figures before using in a cost model — LLM API prices change at each model release without notice.
Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.