RESEARCH · ACCOUNTING

Cache-read tokens: the baseline trap.

25 April 2026

By the LLM CFO team

Cache-read tokens are billed separately from input tokens on Anthropic — and reported separately on the invoice, the API response, and most third-party dashboards. If you set a pre-engagement baseline by summing only the input-token line, you will appear to "save" 30–60% the moment caching turns on. Those savings are not real. They are a column shift. This is the single most common audit mistake we see.

How the line items lay out

An Anthropic invoice (and the `usage` block on each API response) breaks token consumption into four kinds:

Counter	Billing meaning
input_tokens	uncached input — full price
cache_creation_input_tokens	cache writes — 1.25× input price (5-min) or 2× (1-hour)
cache_read_input_tokens	cache hits — ~0.1× input price
output_tokens	generated output — full output price

Before caching is enabled, `cache_read_input_tokens` is zero and every input token is in the `input_tokens` bucket. After caching is enabled, the same prompt traffic is split: the static prefix shows up under `cache_read_input_tokens` (or `cache_creation_input_tokens` on first hit), and only the dynamic suffix is in `input_tokens`.

The trap

Pre-engagement, an analyst exports usage and writes:

baseline_spend = input_tokens × input_price + output_tokens × output_price

That formula is correct only when caching is off. The moment caching is enabled — by anyone, including a default in the SDK or a colleague's experiment — `input_tokens` drops, the formula returns a smaller number, and the dashboard reports "savings." The actual cash going to Anthropic may be unchanged or even higher (cache writes carry a premium).

Concrete example

Endpoint: a support bot, 10,000 requests/day, 8,000-token system+tools prefix, ~500-token user message. Output ~300 tokens. Sonnet 3.5 list price: $3 / M input, $15 / M output.

Day 1 — caching off:

Uncached line item	Cost
input_tokens	10,000 × 8,500 = 85,000,000 → $255.00
output_tokens	10,000 × 300 = 3,000,000 → $45.00
total	$300.00

Day 2 — caching on, 95% cache-read hit on the 8,000-token prefix:

Cached line item	Cost
input_tokens (suffix only + cold prefix)	~9,000,000 → $27.00
cache_creation_input_tokens (cold writes)	~4,000,000 × 1.25 → $15.00
cache_read_input_tokens	~76,000,000 × 0.1 → $22.80
output_tokens	3,000,000 → $45.00
true total	$109.80

The real saving is ~63%. But if the analyst's dashboard only tracks `input_tokens × input_price + output_tokens × output_price`, they see $72.00 and report a 76% saving. They are double-counting the discount because they never added the cache lines back in. When the CFO asks why the invoice doesn't match the dashboard, the engagement loses credibility.

How to reconcile properly

Pull from the provider invoice or Cost Console, not from a derived table. Anthropic's console shows all four line items.
Sum every token bucket × its own price. Not just input + output.
Pin the baseline to a fixed time window before caching was enabled anywhere in the codebase. Audit the SDK initialization across all services — `cache_control` blocks may be live in code you didn't write.
Recompute the baseline cost-per-task (cost ÷ business-meaningful unit: per ticket, per query, per generated row). Cost-per-task is invariant to traffic and the only number worth reporting up.
Snapshot the four-bucket breakdown weekly during the engagement. Movement between buckets is the audit trail for why the bill changed.

OpenAI's version is subtler

OpenAI also reports cached input separately (`prompt_tokens_details.cached_tokens` on the API; a "cached input" line on the bill). The discount is ~50%, not ~90%, and the trap is smaller — but the same arithmetic applies. If your dashboard sums `prompt_tokens × input_price`, you are over-counting cost when caching is on, because `prompt_tokens` includes the cached portion at full price. Subtract `cached_tokens × input_price × 0.5` to get the real number.

← Back to llmcfo.com