Context Window

Cite this page

What is a context window?

A context window is the maximum amount of text — measured in tokens — that a language model can take into account at one time. It holds everything the model is "thinking about" on a given turn: the system prompt, the conversation so far, any files or search results pulled in, tool definitions, and command output. If the combined input exceeds the window, something has to be dropped or summarised. Modern coding models have large windows (often hundreds of thousands of tokens), which can make it tempting to fill them. But the window is a budget, not a target: everything you put in is billed, and re-billed each turn an agent re-reads it.

Why the context window matters in 2026

It matters because filling the window has a hidden quality cost on top of the obvious price. Models attend less to information buried in the middle of a long context, so a window stuffed with marginally-relevant files can actually make answers worse — the signal the model needs gets lost in the noise. Managing what goes into the window is the discipline of context engineering; keeping it small is the fastest way to cut cost without hurting quality.

When a bigger context window doesn't help

  • When the extra content is low-signal — adding more files the model doesn't need lowers answer quality and raises cost.
  • When you could retrieve instead — pulling the few relevant functions via semantic code search beats dumping everything into a large window.

See also