Glossary
Plain-language definitions of 11 token, LLM and AI coding agent terms — from context windows to MCP, written for developers who pay the bills.
Context Compression
Context compression shrinks what an agent feeds the model — via skeletons, summaries and filtering — while preserving the signal, so the context window stays small and cheap.
Read definitionContext Window
A context window is the maximum amount of text, measured in tokens, a model can consider at once — everything the agent reads on a turn must fit inside it.
Read definitionEmbeddings
Numeric vector representations of text (or code) that capture semantic meaning, enabling AI models to find, rank, and reason about content by similarity rather than keyword match.
Read definitionMCP (Model Context Protocol)
MCP is an open protocol that lets AI agents connect to external tools and data through servers — the standard way to extend coding agents like Claude Code.
Read definitionOutput Filtering
Output filtering compacts noisy command and tool output — logs, builds, test runs — down to its signal before it reaches the model, cutting tokens with no loss of meaning.
Read definitionPrompt Caching
Prompt caching lets a model reuse a previously-processed, unchanging prompt prefix instead of re-billing it at full rate — cutting cost on long, repetitive sessions.
Read definitionRAG (Retrieval-Augmented Generation)
A pattern that fetches relevant documents at query time and injects them into the LLM prompt, letting the model answer from current, specific knowledge without retraining.
Read definitionRate Limit
A provider-enforced ceiling on how many tokens or requests an API client can send per minute or day, which throttles or blocks calls that exceed the threshold.
Read definitionSemantic Code Search
Semantic code search finds code by meaning rather than exact keywords, using embeddings — so an agent retrieves the relevant functions instead of reading whole files.
Read definitionToken
A token is the unit of text an LLM processes — a word or sub-word chunk. AI coding agents are billed and rate-limited per token, input and output.
Read definitionTokenizer
A tokenizer splits text into the tokens a model processes, usually with byte-pair encoding (BPE). The same text becomes different token counts across tokenizers.
Read definition