← home
RESEARCH · GUIDE

LLM cost optimization.

A practical guide · 25 April 2026

If your LLM bill is north of $20K/month, there is almost certainly 30–60% of waste in it. This guide is the order in which we look for that waste during a Tidal/LLM CFO engagement, and the techniques we apply once we find it.

1. Get the baseline right

Before touching anything, reconcile to raw provider invoices — not gateway logs, not internal dashboards. Gateway logs systematically under-count because they miss retries, fallback calls, and model-side reasoning tokens. The invoice is the ground truth.

Track these axes separately: provider, model, token type (input / output / cache-read / cache-write), endpoint or feature, environment (prod / staging / batch). Conflating cache-read tokens with input tokens is the single most common baseline error we see.

2. Find the spend concentration

Almost every engagement looks the same: ~80% of spend on 3–5 endpoints. Optimizing the long tail is a waste of operator time. Sort by spend, pick the top five, and ignore everything else for the first six weeks.

3. Apply the playbook in this order

  1. Prompt caching — provider-native, lowest implementation cost, 50–90% off cache-read tokens. Always do this first.
  2. Model routing — route easy queries to a smaller/cheaper model, hard queries to the frontier model. Often 30–50% reduction with no quality regression.
  3. Semantic caching — embed the request, look up similar prior responses. Effective on RAG, classification, and high-volume customer-support flows. 20–40% reduction on cache-friendly endpoints.
  4. Prompt compression — strip redundancy, move static context to system prompts (where it caches), tighten few-shot examples. 10–25% reduction on token-heavy endpoints.
  5. Batch routing — non-realtime workloads (eval pipelines, content generation, data enrichment) move to the Batch API. Flat 50% off on eligible volume.
  6. Provider arbitrage — equivalent-quality models priced lower on Bedrock / Vertex / OpenRouter / Together. 30–60% reduction when quality A/Bs pass.

4. A/B before you ship

Every optimization gets a 7-day A/B against the production baseline. Quality SLOs are agreed up front. Regressions auto-rollback. "Savings" that come with a quality regression aren't savings — they're a refund waiting to happen.

5. Reconcile monthly

Delivered savings = locked baseline − reconciled actuals against raw provider invoices, normalized for traffic growth. Anything else is marketing.

Related

← Back to llmcfo.com