A token is the basic unit of text a large language model reads and generates — typically a short chunk such as a word, part of a word, or a piece of punctuation. Models don't see characters or whole sentences directly; they see sequences of tokens, and everything they do is measured in them.For AI coding agents, tokens are the unit of cost and capacity. Every prompt you send and every answer you get back is counted in input and output tokens, and providers bill (and rate-limit) on that count. A rough rule of thumb in English is that one token is about four characters, so ~750 words is roughly 1,000 tokens — but code tokenizes differently, often into more tokens than prose.
Why tokens matter in 2026
Tokens matter because agentic coding tools re-read their working context on every step, so the same tokens get paid for again and again across a session. The expensive part of a coding session is rarely the model's output — it's the input the agent keeps re-ingesting: files, command output, and the growing conversation. Controlling token usage is therefore the main lever on both cost and rate limits. See How to reduce AI coding agent token usage.
When token-counting is less useful
Tiny, one-off prompts where cost is negligible — optimisation isn't worth the effort.
Comparing across models blindly — the same text becomes a different number of tokens under different tokenizers, so token counts aren't directly comparable between providers.