Is Claude Code cheaper on a subscription or on the API?
For most people who code with Claude every working day, the Max subscription is cheaper than the API — and it's not close. For occasional or bursty use, pay-as-you-go API billing wins because you only pay for the tokens you actually burn. The break-even sits somewhere around a few hours of heavy agentic work per week, and exactly where that line falls depends almost entirely on how many tokens you waste. I run Claude Code daily — it's the agent I build tooling around — so I've watched both meters closely. This article is the math I wish someone had handed me before I picked a billing mode. I'll keep the numbers concrete, cite the real prices, and then make the point that everyone skips: the subscription-vs-API question is downstream of a token question. Cut the tokens and both bills shrink, but the ranking between them can flip. If you want the agent-agnostic version of the cost picture first, read How to reduce AI coding agent token usage. If you're specifically on Claude Code, the companion piece is How to reduce Claude Code token usage.What are the two ways to pay for Claude Code?
There are two billing models, and they bill completely different units. The Claude subscription (Pro or Max) bills a flat monthly fee for access, metered by usage limits rather than per-token charges. The Anthropic API bills per million tokens, split into input and output, with no flat fee. The subscription. Claude Pro and the Max tiers give you Claude Code access bundled with claude.ai. You pay a fixed amount per month and get a usage allowance that resets on a rolling window. You don't see a per-token line item — you hit a cap and wait for the window to reset, or upgrade. Anthropic publishes the current tier prices and limits on its plans page (anthropic.com/pricing); I'm deliberately not quoting the subscription dollar figures here because Anthropic adjusts the tiers and limits more often than the API rates, and a stale number is worse than no number. The API. You generate an API key, drop it into Claude Code, and every token is billed at the published per-model rate. Nothing is bundled; nothing is capped except your own spending limits. This is metered electricity — you pay for exactly what you use, including the tokens you wasted. The mechanics matter because they fail differently. A subscription fails by throttling: you run out of allowance mid-task and lose momentum. The API fails by surprising you: you leave an agent looping overnight and find a bill that bought you very little. Picking between them is partly a question of which failure mode you'd rather manage.What does the API actually cost per token?
The API cost is set by which model you run and how the context window is split between input and output. Here are the current per-million-token (MTok) rates from Anthropic's API pricing (anthropic.com/pricing):| Model | Input / MTok | Output / MTok |
|---|---|---|
| Claude Opus 4.8 | $5 | $25 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Haiku 4.5 | $1 | $5 |
Where's the break-even between them?
The break-even is the point where your monthly API token spend equals the subscription fee — and you can estimate it from one number: tokens burned per day. Because Claude Code re-reads the full transcript on every step of its agentic loop, a day of real work moves a surprising volume of tokens. Here's a worked example. Say a focused Claude Code session on Sonnet 4.6 moves roughly 2M input tokens and 200k output tokens — file reads, tool results, a long transcript, and the code it writes. With prompt caching covering, say, 70% of the input as cache reads, the input bill is roughly: 0.6M fresh input at $3 (= $1.80) plus 1.4M cached at ~$0.30/MTok (= $0.42), and 200k output at $15 (= $3.00). That's about $5.20 for the session. Twenty working days of one such session each is around $104/month on the API. That figure is squarely in subscription territory — which is exactly why heavy daily users come out ahead on Max. But notice how load-bearing my assumptions were: the model tier, the cache hit rate, and above all the 2M input tokens. Halve the wasted input and the API session drops below the point where the subscription pays for itself. The break-even isn't a fixed dollar amount; it's a function of your token discipline. This is also why I distrust generic "subscription is always cheaper" advice. It's true for an undisciplined daily user on Opus. It's false for someone who's pruned their context engineering and uses the agent in tight bursts. You have to run your own numbers — and the inputs to those numbers are things you control.How do you decide which to pick?
Pick the subscription if you code with Claude most days and value predictable cost over precision; pick the API if your usage is spiky, multi-seat, or you want per-token visibility. Concretely:- Estimate your daily token volume. Run a week on the API (or check a usage dashboard) and record input/output tokens per day. This single measurement answers the question better than any blog post.
- Multiply out a month at your dominant model's rate, discounting cached input to ~10%. Compare that to the Max fee on Anthropic's plans page.
- If you're within ~30% either way, pick the subscription — the predictability and the no-overnight-surprise property are worth the rounding error.
- If you're a team, lean API for now: per-seat subscription math gets expensive fast, and the API gives you one consolidated, attributable bill.
- Either way, cut the tokens first. Both bills are a function of token volume; reducing it improves whichever side you land on.
How does cutting tokens change the answer?
Cutting tokens lowers both bills and can flip the break-even, because the API cost is volume-based and the subscription's usage cap is volume-based too. On the API you pay less directly; on the subscription you hit the throttle far less often, which is the same as buying yourself a bigger effective allowance for free. The waste in a Claude Code session is concentrated in a few places, and none of them are signal:- Eager whole-file reads when a semantic code search would have returned the 30 relevant lines.
- Raw tool output — a 15,000-token failing test run when the agent needed the failing assertion. That's what output filtering is for.
- The MCP manifest re-sent every turn, whether or not any tool fires. See Best MCP servers for Claude Code for what's worth keeping connected.
- An unbounded transcript that re-bills every past read on every new turn.
What goes wrong (anti-patterns)
The most expensive mistakes here are decision mistakes, not pricing mistakes. A few I've watched people make, including me:- Picking a billing mode without measuring. "I'll just get Max, everyone says it's cheaper" — then using the agent twice a week and paying for capacity you never touch. Measure first.
- Running everything on Opus. Opus 4.8 is five times the input price of Haiku 4.5 and meaningfully pricier than Sonnet. Reserve the big model for work that needs it; a lot of agentic grunt work runs fine on Sonnet.
- Assuming caching saves you from waste. Prompt caching discounts repeated context to ~10%, but it can't discount a 15,000-token tool dump you should have filtered. Caching rewards re-sending the same context, not sending less.
- Leaving an agent looping unattended on the API. This is the overnight-bill failure mode. Set a spending limit, and prune the context so each loop iteration is cheap even if it runs long.
- Treating the subscription cap as free. When you stop seeing a per-token bill, token discipline quietly erodes. The cap is real; you're just paying for it in throttled afternoons instead of dollars.
See also:
Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.