What is the Claude Code token limit, and how do you stay under it?
The Claude Code token limit is a usage budget — not a fixed token count printed on your plan — that resets on a rolling 5-hour window with an additional weekly cap, and you stay under it by spending fewer tokens per task rather than by rationing your prompts. Every file Claude Code reads, every command output it ingests, and every turn it re-reads the transcript draws down that budget. Cut the tokens each action costs and you do more work before you ever see the "limit reached" banner. I build token-reduction tooling for a living, so I'll be blunt about the part most guides skip: on Pro and Max plans, Anthropic does not publish a hard number of tokens you're allowed per week. The limit is described as a usage budget that resets on a rolling window, and the same budget is shared across claude.ai, Claude Desktop and Claude Code (Anthropic — usage and length limits, Anthropic — Claude Code with Pro or Max). That sounds vague until you reframe it: if the cap is "how much you can spend," then spending less per task is the entire game. This is the Claude Code–specific companion to How to reduce Claude Code token usage — read that for the deep mechanics; this piece is about the limit itself and how not to faceplant into it.How does the Claude Code limit actually work?
The Claude Code limit works as a shared usage budget that refills on a rolling 5-hour cycle, with a separate weekly cap layered on top, and it counts your usage across every Claude surface you touch. Three properties matter for staying under it: It's a rolling window, not a daily reset. The session-style budget refills roughly every five hours rather than at midnight (Anthropic — usage and length limits). So a brutal one-hour debugging session can lock you out for the rest of that window even if you've barely worked that day. Pacing matters. It's shared across surfaces. Your usage of claude.ai, Claude Desktop and Claude Code all draws from the same budget (Anthropic — Claude Code with Pro or Max). If you've spent the afternoon chatting in the web app, you walk into your terminal session with less headroom than you think. It's denominated in tokens you can't see directly. What gets billed against the budget is tokens — input and output, plus the full transcript re-read on every turn. This is the crucial bit: Claude Code re-reads the entire conversation history on each step of its agentic loop, so an oversized file read early in a session is paid for again on every turn that follows. The limit isn't measuring your prompts; it's measuring the cumulative token weight of everything still in the context window. Anthropic doesn't publish exact per-plan token counts, and the caps are explicitly subject to change to manage capacity (Anthropic — usage and length limits). I'd rather give you a method that survives a pricing-page edit than a number that's stale by next quarter.Why do I hit the limit faster than I expect?
You hit the limit faster than expected because agentic coding spends tokens on a quadratic curve, not a linear one — the transcript re-read makes early waste compound over the whole session. Here's the mechanism in one sentence: on every turn, Claude Code re-reads the entire context, so a 6,000-token file you read on turn 2 is re-billed on turns 3, 4, 5, and so on. Read a dozen files up front "to understand the codebase," carry them for twenty turns, and you've paid for that exploration twenty times. That's agentic coding working as designed; it's also why naive sessions burn budget alarmingly fast. Four patterns do most of the damage:- Eager whole-file reads. A 500-line TypeScript file is roughly 5,000–7,000 tokens. Claude Code defaults to reading the whole thing when it only needed a function signature.
- Unfiltered command output. A failed
npm testcan dump 15,000 tokens of stack traces and passing-test checkmarks. The model needed about 50 of them: the failing test name and the broken assertion. See output filtering for the fix. - The MCP manifest. Every connected MCP server re-advertises its full tool definitions on every turn, used or not. Five servers is a silent constant overhead bolted onto your budget.
- One endless session. A transcript that sprawls across several unrelated tasks re-bills every early read forever. This is the single biggest avoidable drain I see.
How do you stay under the Claude Code token limit?
You stay under the limit by attacking the four token drains above, in roughly this order: bound the transcript, filter command output, search instead of reading whole files, and prune your MCP servers. Concrete steps you can apply today:- Run
/compactproactively, between subtasks — not when the warning fires./compactreplaces earlier turns with a compressed summary, reclaiming context. Running it after each finished subtask means every new subtask starts from a lean baseline instead of re-billing the last one's file reads. Treat it as session hygiene, not an emergency brake. - Start a fresh session per task. Unrelated work doesn't belong in the same transcript. A new session is the cheapest possible context — there's nothing to re-read.
- Make the agent search, not read. Point Claude Code at a function or symbol rather than asking it to "read the file." Semantic code search returns the relevant 200 tokens instead of the whole 6,000-token file.
- Filter before the model sees it. Pipe noisy commands through something that returns the error line, not the full log. The model fixes the bug just as well from 50 tokens as from 15,000.
- Disconnect MCP servers you're not using this session. Each one's manifest is a per-turn tax. Lazy-loading tool definitions only when invoked removes that constant overhead.
- Keep
CLAUDE.mdstable and short. Prompt caching on Claude only hits when the prefix is byte-identical across turns, and a cache read is priced at roughly 10% of a fresh input token. ACLAUDE.mdyou edit mid-session defeats the cache and gets re-billed at full price. Stable and concise is both a token cut and a cache multiplier.
Does buying a bigger plan fix it?
Buying a bigger plan raises the ceiling but doesn't change the slope — if your sessions waste tokens, a 20x plan just lets you waste 20x more before stalling. The Max tiers genuinely do offer much higher limits, and if you're a heavy user that's the right move (Anthropic — Claude Code with Pro or Max). But spend a moment on the unit economics first. If you ever move to the API, the waste is no longer rate-limiting — it's billed. At list pricing, Claude Opus 4.8 is $5 per million input tokens and $25 per million output; Sonnet 4.6 is $3 / $15; Haiku 4.5 is $1 / $5. A cached read costs roughly 10% of a fresh input token. Re-reading that 6,000-token file twenty times on Opus is 120,000 input tokens — about $0.60 — for one file you read once. Multiply across a real session and the "limit" stops being abstract; it's money. The AI coding agent token costs breakdown has the full math, and the LLM token cost calculator lets you plug in your own numbers. My honest take: optimise the slope first, then size the plan to your real usage. Most people who think they need Max actually just need to stop carrying dead context.How do you do all this without micromanaging every prompt?
You automate the levers instead of remembering them, because the techniques above only work if you apply them consistently — and humans are bad at consistency under deadline. That's the gap Tokenade closes. It applies semantic retrieval, output compression, structure-first reads and lazy MCP loading automatically inside Claude Code (and Cursor, Codex, Copilot, Windsurf — the same mechanics are agent-agnostic, since every transcript-re-reading agent has the same problem), with a dashboard so you can actually see what you're saving instead of guessing. It's source-available under MIT, so you can audit exactly what it sends. Free up to roughly 20M tokens a month; Pro is $9.90/mo excl. tax in the US (€9.90/mo TTC in France) with 3 seats. The point isn't to never hit the limit — it's to make hitting it the exception instead of your Tuesday.What goes wrong (anti-patterns)
"Read the whole project first." Feels thorough; it's a token grenade. Claude Code front-loads tens of thousands of tokens it will largely ignore, then re-bills them every turn — straight into your limit. Asking the model to "be brief" to save budget. This trims output, which is the cheap, small part. The limit is dominated by input — the transcript and file reads you keep feeding in. Brevity on output barely moves the needle. One marathon session for the whole day. Unbounded history is the fastest route to the limit. Compacting and restarting between tasks isn't a hack; it's correct session management. Waiting for the limit warning before compacting. By the time the warning fires, you've already paid for the bloat on dozens of turns. Compact proactively. Adding MCP servers and forgetting them. Each connected server's manifest is billed every turn, used or not. Connect what this session needs; disconnect the rest.Frequently asked questions
What is the exact token limit for Claude Code on Pro or Max?
Anthropic doesn't publish a fixed token number. The limit is described as a usage budget that resets on a rolling window (roughly five hours) with a weekly cap, shared across claude.ai, Claude Desktop and Claude Code, and explicitly subject to change to manage capacity (Anthropic — usage and length limits). Practically, "the limit" is however much token weight your sessions accumulate — which is why reducing per-task tokens is the lever, not memorising a number.Does Claude Code share its limit with claude.ai?
Yes. Usage across claude.ai, Claude Desktop and Claude Code all counts toward the same budget (Anthropic — Claude Code with Pro or Max). A heavy web-app afternoon leaves you with less terminal headroom.Will cutting tokens make Claude Code's answers worse?
No — done correctly, it makes them better. You're removing low-value context (raw logs, irrelevant file bodies, unused tool definitions), not the signal the model reasons over. A leaner context window raises the signal-to-noise ratio. The only way to hurt quality is to compress away something load-bearing, which is why structure-first reads keep every function signature and output filters keep the actual error.Does /compact reset my limit?
No. /compact shrinks the context window for future turns by summarising earlier history; it doesn't refund tokens you've already spent or reset the usage budget. Its value is forward-looking: every turn after a compact is cheaper, so you stretch the budget you have left further.
Do these techniques work with Cursor, Copilot or Windsurf?
Yes. The mechanics are agent-agnostic — every token-billed agent that re-reads its transcript each turn has the same problem and benefits from the same levers. The cross-agent picture is in How to reduce AI coding agent token usage.See also:
- How to reduce Claude Code token usage — the deep mechanics behind every lever here.
- How to reduce AI coding agent token usage — the cross-agent pillar this branches from.
- Context engineering for AI coding agents — the discipline behind a lean context window.
- AI coding agent token costs — the pricing math behind the limit.
- LLM token cost calculator — plug in your own session numbers.
Up to 88% fewer tokens. Zero config.
Tokenade is the simplest way to cut what your coding agent sends to the model — set it up once, save on every prompt. Works with Claude Code, Cursor, Codex, Copilot & more.