Prompt caching is a provider feature that lets a model reuse the work it already did on a stable prefix of your prompt, instead of reprocessing (and fully re-billing) that text on every call. You mark or structure the unchanging part — a system prompt, project conventions, reference material — so that repeated requests hit the cache and only the new tokens are charged at the full rate.It doesn't make a single request smaller; it makes repeated requests that share a common prefix much cheaper. The savings compound over a session.
Why prompt caching matters in 2026
It matters because agentic coding sessions are long and repetitive: the same system prompt and project rules are sent on turn after turn. Without caching, that fixed context is re-billed every step. With it, the stable prefix is largely free after the first call, which is why keeping your instructions stable and cache-friendly is a real token lever — see Reduce Claude Code token usage.
When prompt caching doesn't help
When the prefix keeps changing — if you rewrite your system prompt or reorder context each turn, there's nothing stable to cache.
For one-shot prompts — a single request with no repetition has no prefix to reuse.