Is Claude Code cheaper on a subscription or on the API?

For most people who code with Claude every working day, the Max subscription is cheaper than the API — and it's not close. For occasional or bursty use, pay-as-you-go API billing wins because you only pay for the tokens you actually burn. The break-even sits somewhere around a few hours of heavy agentic work per week, and exactly where that line falls depends almost entirely on how many tokens you waste. I run Claude Code daily — it's the agent I build tooling around — so I've watched both meters closely. This article is the math I wish someone had handed me before I picked a billing mode. I'll keep the numbers concrete, cite the real prices, and then make the point that everyone skips: the subscription-vs-API question is downstream of a token question. Cut the tokens and both bills shrink, but the ranking between them can flip. If you want the agent-agnostic version of the cost picture first, read How to reduce AI coding agent token usage. If you're specifically on Claude Code, the companion piece is How to reduce Claude Code token usage. If you want the full price list first — every Claude plan, the API rates, and what each one actually includes — start with Claude Code pricing. This piece assumes those numbers and goes straight to the comparison.

What are the two ways to pay for Claude Code?

There are two billing models, and they bill completely different units. The Claude subscription (Pro or Max) bills a flat monthly fee for access, metered by usage limits rather than per-token charges. The Anthropic API bills per million tokens, split into input and output, with no flat fee. The subscription. Claude Pro and the Max tiers give you Claude Code access bundled with claude.ai. You pay a fixed amount per month and get a usage allowance that resets on a rolling window. You don't see a per-token line item — you hit a cap and wait for the window to reset, or upgrade. Anthropic publishes the current tier prices and limits on its plans page (anthropic.com/pricing); I'm deliberately not quoting the subscription dollar figures here because Anthropic adjusts the tiers and limits more often than the API rates, and a stale number is worse than no number. The API. You generate an API key, drop it into Claude Code, and every token is billed at the published per-model rate. Nothing is bundled; nothing is capped except your own spending limits. This is metered electricity — you pay for exactly what you use, including the tokens you wasted. The mechanics matter because they fail differently. A subscription fails by throttling: you run out of allowance mid-task and lose momentum. The API fails by surprising you: you leave an agent looping overnight and find a bill that bought you very little. Picking between them is partly a question of which failure mode you'd rather manage.

What does the API actually cost per token?

The API cost is set by which model you run and how the context window is split between input and output. Here are the current per-million-token (MTok) rates from Anthropic's API pricing (anthropic.com/pricing):

Model	Input / MTok	Output / MTok
Claude Opus 4.8	$5	$25
Claude Sonnet 5	$2	$10
Claude Haiku 4.5	$1	$5

For comparison, OpenAI's GPT-5.5 runs $5 / MTok input and $30 / MTok output (openai.com/pricing), so Opus and GPT-5.5 are roughly the same on input and GPT-5.5 is pricier on output. The numbers that matter most for an agent, though, aren't in that table — they're in two facts about how agents consume tokens. First, output is 5x the price of input on every Claude tier. That sounds like it favours read-heavy work, and it does, but agents read enormously: a single agentic task can pull tens of thousands of input tokens through file reads and tool results before it writes a line. Second, cache reads cost about 10% of the input rate. Anthropic's prompt caching lets repeated context — the system prompt, a file you keep re-reading, the early transcript — bill at roughly a tenth of the normal input price once it's cached. On Opus that turns $5/MTok of repeated input into about $0.50/MTok. Claude Code uses caching automatically, which is the single biggest reason API costs aren't as terrifying as the raw input rate suggests. The catch: caching only helps the context you re-send. It does nothing for the context you should never have sent in the first place. Both of these facts point the same direction. On the API, the bill is dominated by how much context an agentic loop drags through the model on every turn — not by the headline price.

Where's the break-even between them?

The break-even is the point where your monthly API token spend equals the subscription fee — and you can estimate it from one number: tokens burned per day. Because Claude Code re-reads the full transcript on every step of its agentic loop, a day of real work moves a surprising volume of tokens. Here's a worked example. Say a focused Claude Code session on Sonnet 5 moves roughly 2M input tokens and 200k output tokens — file reads, tool results, a long transcript, and the code it writes. With prompt caching covering, say, 70% of the input as cache reads, the input bill is roughly: 0.6M fresh input at $2 (= $1.20) plus 1.4M cached at ~$0.20/MTok (= $0.28), and 200k output at $10 (= $2.00). That's about $3.48 for the session. Twenty working days of one such session each is around $70/month on the API. That figure is squarely in subscription territory — which is exactly why heavy daily users come out ahead on Max. But notice how load-bearing my assumptions were: the model tier, the cache hit rate, and above all the 2M input tokens. Halve the wasted input and the API session drops below the point where the subscription pays for itself. The break-even isn't a fixed dollar amount; it's a function of your token discipline. This is also why I distrust generic "subscription is always cheaper" advice. It's true for an undisciplined daily user on Opus. It's false for someone who's pruned their context engineering and uses the agent in tight bursts. You have to run your own numbers — and the inputs to those numbers are things you control.

How do you decide which to pick?

Pick the subscription if you code with Claude most days and value predictable cost over precision; pick the API if your usage is spiky, multi-seat, or you want per-token visibility. Concretely:

Estimate your daily token volume. Run a week on the API (or check a usage dashboard) and record input/output tokens per day. This single measurement answers the question better than any blog post.
Multiply out a month at your dominant model's rate, discounting cached input to ~10%. Compare that to the Max fee on Anthropic's plans page.
If you're within ~30% either way, pick the subscription — the predictability and the no-overnight-surprise property are worth the rounding error.
If you're a team, lean API for now: per-seat subscription math gets expensive fast, and the API gives you one consolidated, attributable bill.
Either way, cut the tokens first. Both bills are a function of token volume; reducing it improves whichever side you land on.

That last point is the one I actually care about. The billing-mode decision is a one-time toggle. Token volume is something you pay for every single day, on either plan.

How does cutting tokens change the answer?

Cutting tokens lowers both bills and can flip the break-even, because the API cost is volume-based and the subscription's usage cap is volume-based too. On the API you pay less directly; on the subscription you hit the throttle far less often, which is the same as buying yourself a bigger effective allowance for free. The waste in a Claude Code session is concentrated in a few places, and none of them are signal:

Eager whole-file reads when a semantic code search would have returned the 30 relevant lines.
Raw tool output — a 15,000-token failing test run when the agent needed the failing assertion. That's what output filtering is for.
The MCP manifest re-sent every turn, whether or not any tool fires. See Best MCP servers for Claude Code for what's worth keeping connected.
An unbounded transcript that re-bills every past read on every new turn.

This is the problem I built Tokenade to attack. It sits between your agent and the model and trims the noise automatically: semantic code search instead of blind reads, output filtering on tool results, skeleton compression for large files, and lazy MCP loading so dormant tool manifests stop riding along on every turn — with a dashboard so you can watch the saved tokens add up. It works with Claude Code, Cursor, Codex, Copilot, Windsurf and the rest, and it's source-available under MIT, so you can read exactly what it does to your prompts before you trust it. The freemium tier is free up to about 10M tokens a month, which covers a lot of solo work outright. Pro is $24.90/mo (excl. tax) in the US, €19.90/mo TTC in France, with unlimited machines. If you're running anywhere near the daily volumes in the break-even example above, the math is easy: the tool costs a fraction of the tokens it saves on either billing model. See the pricing page for the details.

What goes wrong (anti-patterns)

The most expensive mistakes here are decision mistakes, not pricing mistakes. A few I've watched people make, including me:

Picking a billing mode without measuring. "I'll just get Max, everyone says it's cheaper" — then using the agent twice a week and paying for capacity you never touch. Measure first.
Running everything on Opus. Opus 4.8 is five times the input price of Haiku 4.5 and meaningfully pricier than Sonnet. Reserve the big model for work that needs it; a lot of agentic grunt work runs fine on Sonnet.
Assuming caching saves you from waste. Prompt caching discounts repeated context to ~10%, but it can't discount a 15,000-token tool dump you should have filtered. Caching rewards re-sending the same context, not sending less.
Leaving an agent looping unattended on the API. This is the overnight-bill failure mode. Set a spending limit, and prune the context so each loop iteration is cheap even if it runs long.
Treating the subscription cap as free. When you stop seeing a per-token bill, token discipline quietly erodes. The cap is real; you're just paying for it in throttled afternoons instead of dollars.

The honest summary: the subscription-vs-API question has a clean answer once you know your token volume, and almost no clean answer before you do. So go get that number — and then go shrink it.

See also:

Claude Code: Subscription vs API Pricing