Best Claude Code Token Optimizers (2026)

Which Claude Code token optimizer actually cuts your bill?

There are roughly two kinds of tools in this space. Focused tools nail one mechanism — output filtering, prompt compression, semantic search, skeleton compression — and do it well. Combined tools stack several mechanisms behind a single install. Which type wins depends on your bottleneck: if you have one specific pain (noisy logs, huge file reads), a sharp focused tool is clean and low-overhead; if token waste shows up everywhere in your sessions, installing four separate tools is overhead in itself. This roundup covers every tool that is actually deployed and used by developers in 2026 — ranked on four real criteria, with honest limitations for every entry, including our own. TL;DR — the ranked picks:

Tokenade — broadest coverage, one install, savings dashboard.
rtk — outstanding command-output filtering; the focused tool to beat.
LLMLingua — research-grade prompt compression; heavy to integrate.
claude-context — production-ready semantic search; requires external vector DB.
codegraph — deep symbol + call-graph indexing; strong published benchmarks.
tokensave — same mechanism as codegraph in Rust; no published numbers.
token-optimizer — comprehensive multi-layer compression; PolyForm license.
ccusage — measures usage, does not reduce it; include it anyway.

How we ranked these

Four criteria, weighted for solo-developer Claude Code use:

Savings — how much it actually trims, on what work, and how reproducibly those claims are backed.
Coverage — how many sources of token waste it addresses (a single mechanism vs many).
Setup — install time, external dependencies, ongoing config burden.
Quality-safety — does it cut tokens without dropping the signal the model needs to write correct code?

Tools are ranked on the combination. A focused tool that is trivially correct (can't corrupt your code) scores well on quality-safety even if coverage is low. A broad tool that requires a cloud vector database scores down on setup.

1. Tokenade — broadest coverage in one install

Tokenade ranks first because it stacks the same mechanisms the other tools do individually — output filtering, symbol indexing, semantic search, skeleton compression, MCP optimization — behind a single binary with no external dependencies and a zero-config install. Tokenade combines several layers in a single binary: pre-execution command rewriting (terser flags before the shell runs), per-format output filtering covering git/cargo/docker/kubectl/Terraform and more, on-device semantic code search that surfaces only the relevant files (no external vector database, no API key, no model download), skeleton compression for code/YAML/Markdown/Terraform (−64% on file reads with every top-level declaration preserved), MCP optimization that works with any connected server and keeps unused tools out of the context, SERP and HTML compaction, and a tokenade dashboard showing live measured savings. On a 14-repo benchmark (Rust/Python/Go/JS/TS, with adversarial impact tests), Tokenade's internal evaluation reports:

Session mix	Token savings
Balanced	88.3%
Build-heavy	86.7%
Navigation-heavy	83.5%
Web-heavy	68.6%

Quality score was 1.00 (no regression) across all adversarial tests on three of four mixes; navigation-heavy dropped to 0.94. Install. Run tokenade install — it auto-detects Claude Code, Cursor, Codex, Copilot, Kilo Code, Windsurf and merges the MCP config without overwriting yours. The embedder is bundled, so there's no first-run download and no API key. Pricing. Freemium: free up to ~10 million tokens/month (no card required), then Pro at €19.90/month TTC (FR) / $19.90/month excl. tax (US). Genuine limitation. If you have a single, known bottleneck (e.g. only noisy build logs), the relevant focused tool below is lighter. Tokenade is also not yet on crates.io or Homebrew — you build from source. And like any compactor, it can in theory fold output a command returned correctly; the escape hatch (tokenade raw <cmd>) exists for that case. Best for: developers who want most of the savings without assembling and maintaining their own stack.

2. rtk — the best focused tool for command-output filtering

rtk is the sharpest single-mechanism tool in this roundup: it wraps CLI commands and compacts their output before it reaches the model, covering 100+ commands with a Rust binary that adds under 10 ms of startup overhead. According to the project's README (source: reports/rtk.md in Tokenade's internal analysis), rtk claims "60–90% token reduction" with per-command breakdowns: cargo test −90%, git operations −80%, and similar wins on npm, pytest, docker, aws, and terraform. It supports 13 AI coding tools including Claude Code, Cursor, Copilot, and Gemini CLI. The hook-based integration is transparent — you don't need to prefix commands manually. A rtk gain subcommand tracks per-command savings in SQLite so you can verify the actual reduction on your own sessions. A useful design detail: on filter failure, rtk falls back to raw output and saves the full log via a tee mechanism. That means a missed filter costs you nothing worse than you'd have had without rtk. Install. brew install rtk or a one-line curl installer or cargo install. No external services. One of the easiest setups in this list. Genuine limitation. rtk operates at the shell boundary — it only touches commands, not file reads, MCP tool manifests, or prompt content. In a session dominated by file reads rather than noisy commands, it won't move the needle. The hook also works on bash but not on Claude Code's built-in Read or Grep tools. And if you're already running Tokenade, rtk's output-filtering mechanism is subsumed. Best for: sessions dominated by verbose build, test, and infrastructure output; developers who want a proven, focused tool with zero external dependencies.

3. LLMLingua — research-grade prompt compression

LLMLingua is the strongest choice for programmatic prompt compression, with peer-reviewed savings claims and LangChain/LlamaIndex integrations — but it requires a heavy ML dependency and is closer to a library than a turnkey Claude Code plugin. LLMLingua comes from Microsoft Research (EMNLP 2023, ACL 2024). It scores each token with a small language model (GPT2-small or LLaMA-7B for the original variant; a BERT-level encoder for the faster LLMLingua-2), then drops tokens below a threshold. On research benchmarks — RAG, meeting transcripts, chain-of-thought — it claims up to 20× compression with minimal performance loss. LLMLingua-2 is 3–6× faster than the first variant at comparable quality. The companion LongLLMLingua variant is specifically tuned for "lost in the middle" RAG quality, recovering up to 21.4% on downstream metrics while using one quarter of the tokens. LangChain retriever and LlamaIndex node postprocessor integrations exist, which means it slots naturally into Python agent pipelines. Genuine limitation. The dependency weight is real: you load LLaMA-7B or GPT2-small just to compress the prompt. That's gigabytes of model to install, a Python environment, and meaningful latency per call. There is no packaged Claude Code plugin; integration means writing glue code. The research benchmarks are on document/RAG tasks, not necessarily on the kind of code-navigation output Claude Code produces. For purely CLI-level usage without Python, this tool doesn't apply. Best for: builders running Python-based agent pipelines who want principled, reproducible prompt compression and don't mind the integration weight.

4. claude-context — production-ready semantic search with a managed hosting option

claude-context is the most complete semantic code search implementation in the field, combining tree-sitter AST chunking, hybrid BM25 + dense vector search, and incremental Merkle-tree indexing — with a production-ready managed option via Zilliz Cloud. The tool (by Zilliz, the team behind the Milvus vector database) reports "~40% token reduction under the condition of equivalent retrieval quality" in its own evaluation. It chunks code at AST boundaries (never splits a long function between chunks), supports 13+ languages, and allows multiple embedding providers: OpenAI, VoyageAI, Ollama, Gemini. Its incremental indexing only re-indexes changed files. There is both an MCP server package (npx @zilliz/claude-context-mcp@latest) and a VSCode extension. The retrieval mechanism directly addresses the biggest single source of token waste in navigation-heavy sessions: agents reading entire files to find one function. Semantic code search replaces that with a ranked retrieval over chunks, so the model sees the three relevant blocks instead of thirty files. Genuine limitation. Unlike the Rust tools in this list, claude-context requires an external vector database: either a self-hosted Milvus instance or a Zilliz Cloud account. That's a real dependency — setup overhead, an external service to keep running, and (for Zilliz Cloud) a second bill. An embedding API key is also required unless you run Ollama locally. The tool focuses exclusively on the retrieval mechanism; it does nothing for command output, MCP manifests, or file structure. Best for: teams with existing Milvus/Zilliz infrastructure, or developers whose primary bottleneck is file-read-heavy navigation on a large codebase.

5. codegraph — deep call-graph indexing with the strongest published benchmark

codegraph is the best-benchmarked pure navigation tool: its published results across 7 repositories show −35% cost, −57% tokens, and −71% fewer tool calls at the median, with framework-aware routing across 14 frameworks. codegraph builds a SQLite + FTS5 knowledge graph from tree-sitter extraction, covering 20+ languages. Its framework detection spans Django, Flask, Express, NestJS, Laravel, Rails, Spring, Axum, and others, including rare ones (Drupal, Vapor). A bundled Node.js runtime means zero installation complexity. A debounced file watcher keeps the index current as you edit, with per-file staleness banners so the agent knows which files are pending. The interactive installer auto-detects 8 agent environments. Among the pure-navigation tools, codegraph has the clearest, most verifiable benchmark methodology (4 runs per repo across 7 repositories), which is why it ranks above tokensave despite similar architecture. Genuine limitation. The mechanism is powerful for navigation but covers only one dimension of token waste. It won't help with noisy command output, fat MCP manifests, or file reads that aren't navigational. The TypeScript/Node.js runtime adds a dependency relative to compiled Rust tools. Token savings numbers are real but derive from 4 runs per repo — a larger battery would strengthen the claim. Best for: large multi-language repositories where the agent's bottleneck is "I don't know where X is defined, so I'll read 10 files to find out".

6. tokensave — same mechanism as codegraph, compiled in Rust

tokensave delivers the same call-graph indexing architecture as codegraph in a compiled Rust binary — 34 languages, multi-branch indexing, subprocess isolation — but publishes no benchmark numbers. The technical depth is impressive: 34 languages via feature-gated tiers, libSQL graph DB, multi-branch indexing (diff/search across branches without checkout), subprocess isolation so a single tree-sitter parser crash doesn't kill the service, atomic edit primitives with AST rewriting, and 48 MCP tools. Framework routing matches codegraph's 14-framework coverage. The Rust foundation gives it a fast startup and low memory footprint relative to the Node.js tools. Genuine limitation. No published benchmark. "Fewer tokens · Fewer tool calls · 100% local" is the whole claim, which is honest but doesn't let you predict the savings on your codebase before installing. As with codegraph, the mechanism is navigation-only. Best for: developers who prefer a compiled binary and the broader language/branch support, and are comfortable evaluating the saving themselves.

7. token-optimizer — comprehensive multi-layer compression with a quality dashboard

token-optimizer (by alexgreensh) is the most feature-complete Python-based solution: it combines AST structure maps, session continuity checkpoints, 16 bash output handlers, a quality score, and a per-turn HTML dashboard — but the PolyForm Noncommercial license is a meaningful restriction for commercial use. The reported savings are striking: 180,000-token files compressed to roughly 250 tokens via AST-based structure maps (Python/TypeScript, 95–99% claimed compression). Over 30 days and 942 sessions, the developer reports "$1,500–$2,500/month" savings — a self-reported figure without public reproducibility, but the mechanism is coherent. The 7-signal quality score (context fill, stale reads, bloated results, compaction depth, duplicates, decision density, agent efficiency) is a thoughtful anti-regression guard. A Coach mode runs 11 waste detectors to audit your CLAUDE.md and session patterns. Genuine limitation. The PolyForm Noncommercial license means you can't use it in a commercial product without a separate agreement. Setup is more complex than the Rust binaries above — Python 3.9+, TypeScript adapters for non-Claude-Code platforms, an HTML dashboard to configure. The savings numbers are self-reported. This is a powerful tool for personal or research use, but the license matters. Best for: hobbyists and researchers on Claude Code who want the most comprehensive instrumentation and don't have commercial restrictions.

8. ccusage — measures usage; doesn't reduce it

ccusage is not a token optimizer, but it belongs in this list because you should run it first: it tells you exactly where your tokens are going before you decide which optimizer to reach for. ccusage reads the JSONL transcripts that Claude Code writes locally and produces daily, weekly, monthly, and session-level reports with per-model breakdowns and cache tracking (separate columns for cache-creation vs cache-read cost). It supports 15 agent environments and distributes as a platform-specific binary (bunx ccusage or npx ccusage@latest). About 15,000 GitHub stars makes it the de-facto standard in the measurement category; a half-dozen other trackers (codeburn, Claude-Code-Usage-Monitor, tokscale) are essentially UIs built on top of it. The correct workflow: run ccusage, understand whether your expensive sessions are build-heavy (rtk or Tokenade's output filter wins), navigation-heavy (semantic search or codegraph), or a mix (Tokenade). Don't optimize blind. Genuine limitation. It is purely a meter. It reports what you spent; it does not reduce that spend by one token. Pair it with one of the tools above. Best for: everyone — run this before deciding which optimizer to install.

At a glance

Tool	Mechanism(s)	Coverage	Setup	License
Tokenade	Output filter + semantic search + skeleton + lazy MCP + web compact	Broad (13 layers)	One binary, build from source	Freemium
rtk	Command output filtering	Focused	`brew install`	OSS
LLMLingua	Learned prompt compression	Focused	Python lib + LLM dependency	MIT
claude-context	Hybrid BM25 + vector search	Focused	External vector DB required	Apache-2.0
codegraph	Symbol + call-graph index	Focused	Node.js, bundled runtime	OSS
tokensave	Symbol + call-graph index	Focused	Rust binary, build from source	OSS
token-optimizer	Structure map + session compress + dash	Broad	Python setup	PolyForm NC
ccusage	Usage measurement	Diagnostic	`bunx ccusage`	MIT

How to choose

If your transcripts are full of build logs and command noise: rtk is the simplest, most battle-tested fix. If you want output filtering plus everything else, Tokenade covers it. If your agent reads too many files to find what it needs: claude-context (managed hosting available) or codegraph (best published benchmark) are the right focused tools. Tokenade's built-in semantic search runs fully on-device if you'd rather not add a separate service. If you want to compress prompts programmatically inside a Python pipeline: LLMLingua is the only peer-reviewed option. Accept the ML dependency overhead. If you want broad coverage without assembling a stack: Tokenade installs as one binary, applies output filtering, semantic search, skeleton compression, and lazy MCP loading automatically, and shows you the saving on every session. The freemium tier (free up to ~10M tokens/month, no card) lets you verify the impact before committing. Start with measurement: run ccusage to understand your session profile, then match the tool to the bottleneck. The full breakdown of which lever to apply to which waste pattern is in How to reduce AI coding agent token usage.

Methodology note

Tool facts come from each project's README and the Tokenade internal competitor analysis (reports/ directory) as of 2026-06-02. Savings numbers are as-claimed by each project; where methodology was available (Tokenade's 14-repo bench, codegraph's 7-repo median, claude-context's controlled evaluation, LLMLingua's peer-reviewed papers), that is noted. Self-reported numbers without public reproducibility are labelled as such. No tool was paid to appear in this list; Tokenade is our own product and is ranked on the same criteria as the rest.