LLM API Token Pricing (2026)

Canonical 2026 reference for LLM API token prices: Anthropic Claude, OpenAI GPT-5.x, and Google Gemini 3.x — input/output rates, output premiums, caching discounts, and batch savings, all sourced from official pricing pages.

Profile photo of Paul Irolla

By Paul Irolla

Founder · AI & developer tools · Tokenade

Ph.D. in AI · builds token-optimization tooling for AI coding agents

View author page
14 min read
Cite this page

Key figures

TL;DR
  • $5.00 / $25.00
    Claude Opus 4.8 input / output per million tokens — Anthropic's flagship model
    Anthropic, anthropic.com/pricing, June 2026
  • $5.00 / $30.00
    GPT-5.5 input / output per million tokens — OpenAI's frontier flagship
    OpenAI, openai.com/api/pricing, June 2026
  • $1.50 / $9.00
    Gemini 3.5 Flash input / output per million tokens — Google's latest fast model
    Google, ai.google.dev/gemini-api/docs/pricing, June 2026
  • 4–6×
    output tokens cost 4–6× more than input tokens across major 2026 models
    Derived from Anthropic, OpenAI, and Google official pricing pages, June 2026
  • 90%
    prompt cache-read discount on Claude (cache read = 10% of base input price)
    Anthropic, anthropic.com/pricing, June 2026
  • 50%
    batch API discount offered by Anthropic, OpenAI, and Google on all listed models
    Anthropic, OpenAI, and Google official pricing pages, June 2026
  • $0.20 / $1.25
    GPT-5.4-nano input / output per million tokens — lowest-cost OpenAI text model in 2026
    OpenAI, platform.openai.com/docs/pricing, June 2026

Why these numbers matter

The per-million-token prices from Anthropic, OpenAI, and Google are the single most actionable numbers for any developer running AI workloads at scale. A model that costs 5× more per output token doesn't just cost 5× more per response — it compounds across every turn of an agent loop, every test-generation run, and every large-file refactor. This page collects the canonical June 2026 figures from official pricing pages so developers can reason about cost before optimising. Two ratios determine your bill more than the model name: how much more output costs than input (the output premium), and how cheaply caching serves repeated context (the caching discount). Both are covered here with exact numbers. For how these prices translate into real coding-session costs, see AI Coding Agent Token Costs. For the mechanics behind reducing the volume of tokens you send and receive, see the guide on reducing AI coding agent token usage.

Key Takeaways

  • Claude Opus 4.8 costs $5.00 input / $25.00 output per million tokens — the 5× output premium is uniform across all three Claude tiers [1]
  • GPT-5.5 costs $5.00 input / $30.00 output per million tokens — a 6× output premium, the steepest among the flagships surveyed [2]
  • Gemini 3.5 Flash costs $1.50 input / $9.00 output per million tokens — the most affordable single-tier flagship in 2026 [3]
  • Claude prompt cache reads cost $0.50/MTok on Opus 4.8 — a 90% discount vs the $5.00 base input price [1]
  • Batch APIs cut costs by 50% across Anthropic, OpenAI, and Google for asynchronous workloads [1][2][3]
  • GPT-5.4-nano at $0.20 input / $1.25 output is the cheapest OpenAI text model — 25× cheaper input than GPT-5.5 [4]

Anthropic Claude pricing in 2026

Anthropic publishes three active Claude 4.x models as of June 2026: Opus 4.8, Sonnet 4.6, and Haiku 4.5. All prices are list rates in USD per million tokens for standard (non-batch) synchronous API calls. [1]

Claude model prices — Input vs Output ($/MTok)

Opus 4.8$5 / $25
Sonnet 4.6$3 / $15
Haiku 4.5$1 / $5
Input Output

Source: [1] Anthropic, anthropic.com/pricing, June 2026

Claude Opus 4.8 is priced at $5.00 per million input tokens and $25.00 per million output tokens. [1] It supports a 1-million-token context window on the Claude API and 128k max output tokens (extendable to 300k via streaming). Claude Sonnet 4.6, the standard production default for coding tasks, sits at $3.00 / $15.00 per million tokens, also with a 1M context window. [1] Claude Haiku 4.5, the fastest and most economical tier, is priced at $1.00 / $5.00 per million tokens with a 200k context window. [1] The output-to-input ratio is a clean across all three Claude models. This consistency means model-switching within the Claude family preserves the same structural cost profile — it changes the absolute price but not the ratio. For a 128k-output session on Opus 4.8, output tokens alone cost $3.20 regardless of how the input is structured.

Claude prompt caching prices

Anthropic's prompt caching allows frequently reused context (system prompts, reference documents, tool definitions) to be stored and re-read at a steep discount. The TTL is 5 minutes for standard prompt caching. [1]
ModelBase Input $/MTokCache Write $/MTokCache Read $/MTokCache Read Discount
Claude Opus 4.8$5.00$6.25$0.5090%
Claude Sonnet 4.6$3.00$3.75$0.3090%
Claude Haiku 4.5$1.00$1.25$0.1090%

Source: [1] Anthropic, anthropic.com/pricing, June 2026. Cache write is priced at 1.25× base input; cache read at 0.10× base input (5-minute TTL).

The cache write penalty — 1.25× the base input rate — is recovered on the second cache hit. Every hit after that represents a net $4.50/MTok saving on Opus 4.8. For an agent that re-reads a 50k-token system context on every turn, enabling prompt caching cuts Opus 4.8 input costs from $0.25 per turn to $0.025 per turn after the first write.

OpenAI GPT-5.x pricing in 2026

OpenAI's 2026 lineup has shifted from the GPT-4.x generation to a GPT-5.x family. The main production models are GPT-5.5 (frontier), GPT-5.4 (performance/cost balance), GPT-5.4-mini (lightweight), and GPT-5.4-nano (ultra-low-cost). All prices below are standard (non-batch) list rates. [2][4] GPT-5.5, the flagship model, is priced at $5.00 per million input tokens and $30.00 per million output tokens, with cached input at $0.50/MTok. [2] Context window: 1 million tokens. GPT-5.4 sits at $2.50 / $15.00 with cached input at $0.25/MTok. [4] Context window: 1.05 million tokens. Note: prompts exceeding 272k tokens are billed at 2× input and 1.5× output for the full session. GPT-5.4-mini is priced at $0.75 / $4.50 with cached input at $0.075/MTok. [4] Context window: 400k tokens. GPT-5.4-nano, OpenAI's cheapest text model, costs $0.20 / $1.25 with cached input at $0.02/MTok. [4]

OpenAI GPT-5.x model pricing — Input cost comparison ($/MTok)

GPT-5.5$5.00 in / $30.00 out
GPT-5.4$2.50 in / $15.00 out
GPT-5.4-mini$0.75 in / $4.50 out
GPT-5.4-nano$0.20 in / $1.25 out

Source: [2][4] OpenAI, openai.com/api/pricing and platform.openai.com/docs/pricing, June 2026

The output-to-input ratio on GPT-5.5 is ($30 ÷ $5) — the highest among the flagship models in this comparison. GPT-5.4 and GPT-5.4-mini both carry a 6× ratio at lower absolute prices. GPT-5.4-nano's ratio drops to 6.25× ($1.25 ÷ $0.20). The 6× ratio across OpenAI's lineup is consistently higher than Claude's 5×, which makes OpenAI models comparatively less sensitive to input reduction and more sensitive to output reduction. OpenAI's Batch API cuts input and output prices by 50% for asynchronous tasks processed within 24 hours. [2] At batch rates, GPT-5.5 costs $2.50 / $15.00 per million tokens — matching GPT-5.4 standard pricing.

Google Gemini 3.x pricing in 2026

Google's Gemini API has moved to a Gemini 3.x generation as of mid-2026. The primary developer-facing models are Gemini 3.5 Flash (the new fast flagship), Gemini 3.1 Flash-Lite (lightweight), and Gemini 3.1 Pro Preview (high-performance, preview status). [3] Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens (paid tier), with context caching at $0.15/MTok. [3] Context window: 1 million tokens. The output price includes thinking tokens. Gemini 3.1 Flash-Lite costs $0.25 input / $1.50 output per million tokens (text/image/video; audio is double), with context caching at $0.025/MTok. [3] Context window: 1 million tokens. Gemini 3.1 Pro Preview operates on a two-tier input price: $2.00 / $12.00 per million tokens for prompts up to 200k tokens, scaling to $4.00 / $18.00 for prompts exceeding 200k tokens. Context caching is $0.20/MTok (≤200k) or $0.40/MTok (>200k). [3] Context window: 1 million tokens.
ModelInput $/MTokOutput $/MTokContext Cache $/MTokOutput Premium
Gemini 3.5 Flash$1.50$9.00$0.15
Gemini 3.1 Flash-Lite$0.25$1.50$0.025
Gemini 3.1 Pro Preview (≤200k)$2.00$12.00$0.20
Gemini 3.1 Pro Preview (>>200k)$4.00$18.00$0.404.5×

Source: [3] Google, ai.google.dev/gemini-api/docs/pricing, June 2026. Context caching = cache-read price. Output includes thinking tokens where applicable.

Gemini 3.5 Flash's output-to-input ratio is ($9 ÷ $1.50), identical to the OpenAI flagship ratio. The 3.1 Pro Preview drops to 4.5× on long-context prompts because the input price doubles while the output price increases by only 50%. Gemini's Batch API cuts all prices by 50% for asynchronous workloads. [3] Google's context caching is priced at roughly 10% of base input across all three Gemini models — numerically similar to Anthropic's 90% cache-read discount. The storage cost ($1.00/MTok per hour) applies on top, making Gemini's caching economics most favourable for frequently reused large contexts accessed many times within a short window.

Master comparison: all models, all prices (June 2026)

ProviderModelInput $/MTokOutput $/MTokOutput PremiumContext Window
AnthropicClaude Opus 4.8$5.00$25.001M tokens
AnthropicClaude Sonnet 4.6$3.00$15.001M tokens
AnthropicClaude Haiku 4.5$1.00$5.00200k tokens
OpenAIGPT-5.5$5.00$30.001M tokens
OpenAIGPT-5.4$2.50$15.001.05M tokens
OpenAIGPT-5.4-mini$0.75$4.50400k tokens
OpenAIGPT-5.4-nano$0.20$1.256.25×
GoogleGemini 3.5 Flash$1.50$9.001M tokens
GoogleGemini 3.1 Flash-Lite$0.25$1.501M tokens
GoogleGemini 3.1 Pro Preview (≤200k)$2.00$12.001M tokens

Sources: [1][2][3][4]. All prices are standard (non-batch) list rates per million tokens ($/MTok), USD, as of June 2026. Verify before quoting in a cost model — prices change at model release.

The output premium: why it dominates your bill

The output-to-input price ratio ranges from 5× to 6.25× across the models on this page. Claude's 5× ratio is the most favourable for output-heavy workloads; OpenAI and Gemini consistently run at 6×.
Claude output premium (all tiers)
Source: [1]
OpenAI GPT-5.x & Gemini 3.x output premium
Source: [2][3]
50%
Batch API discount (all three providers)
Source: [1][2][3]
The practical implication: if an agent emits 10k output tokens per turn and runs 100 turns, the output cost alone on Claude Opus 4.8 is $25.00. Cutting output volume by 30% saves $7.50 per 100 turns — more than the total input cost of many sessions. The token volume, not the model tier, is the primary bill driver for agentic workloads. Tools that reduce token volume — trimming context, filtering file reads, compressing repetitive content — apply a fixed multiplier to costs regardless of which model or provider is running. A 40% token reduction at $5.00 input / $30.00 output on GPT-5.5 saves more in absolute terms than the same reduction on Haiku 4.5, but both yield the same percentage saving. If you want to reduce your API bill proportionally across the stack, Tokenade compresses the tokens sent and received by AI coding agents at the proxy layer.

Batch API and caching: the two structural discounts

Both prompt caching and batch APIs represent orthogonal discounts — they can be combined, and they target different workload shapes. Batch processing (50% off, all three providers) suits large asynchronous jobs: nightly test runs, bulk document processing, offline refactor passes. The constraint is a 24-hour turnaround window and the inability to use real-time results mid-run. Prompt caching (90% off cache reads on Claude; ≈90% on Gemini; ≈90% on OpenAI cached input) targets repeated context — system prompts, knowledge bases, long preambles — that recurs on every turn of a live conversation or agent loop. It rewards real-time interactive workloads that batch APIs cannot serve. Applied together (for batch workloads that also carry repeated context), the effective discount on cached input tokens is multiplicative: 50% batch × 90% cache-read = 95% discount on that portion of input. At scale on Opus 4.8, that is a $0.25/MTok effective input price for cached context in a batch job, versus the $5.00 standard list rate.

Source notes

Sources and references

  1. [1]Anthropic. "API Pricing — Claude models". anthropic.com/pricing, June 2026. Link ↗
  2. [2]OpenAI. "OpenAI API Pricing". openai.com/api/pricing, June 2026. Link ↗
  3. [3]Google. "Gemini Developer API Pricing". ai.google.dev/gemini-api/docs/pricing, June 2026. Link ↗
  4. [4]OpenAI. "Pricing — Detailed model table". platform.openai.com/docs/pricing, June 2026. Link ↗
  5. [5]Anthropic. "Models overview — context windows and max output". platform.claude.com/docs/en/about-claude/models/overview, June 2026. Link ↗

All prices are standard (non-batch) USD list rates per million tokens as of June 2026. Batch discounts (50%), data-residency surcharges (typically +10%), and volume-tier discounts are not reflected in the base rates. Re-verify all figures before using in a cost model — LLM API prices change at each model release without notice.

Up to 88% fewer tokens. Zero config.

Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.