Context compression is any technique that reduces the size of the text an agent puts into the model's context window while keeping the information the model actually needs. It covers several moves: showing a file's skeleton (signatures and top-level declarations) instead of its full body, summarising long history, and filtering noisy output down to its signal.The goal isn't to send less for its own sake — it's to drop the parts the model won't use while preserving the parts it will. A good skeleton, for example, keeps every public function signature so the model can reason about the file, and omits only the bodies it can fetch later if needed.
Why context compression matters in 2026
It matters because agents re-read their context every turn, so any bloat is paid repeatedly across a session. Compression attacks the biggest slices — file reads and history — directly: structure-first reads can cut a file by more than half, and that saving recurs every time the file would otherwise be re-read. Combined with semantic search and output filtering, it's how context engineering keeps both cost and noise down.
When compression can backfire
When the dropped detail was load-bearing — compressing away a body the model actually needed forces a re-fetch, costing more than it saved.
Lossy summaries of precise facts — exact numbers, IDs or error strings should be preserved, not summarised.