LLM cost optimization.
A practical guide · 25 April 2026
If your LLM bill is north of $20K/month, there is almost certainly 30–60% of waste in it. This guide is the order in which we look for that waste during a Tidal/LLM CFO engagement, and the techniques we apply once we find it.
1. Get the baseline right
Before touching anything, reconcile to raw provider invoices — not gateway logs, not internal dashboards. Gateway logs systematically under-count because they miss retries, fallback calls, and model-side reasoning tokens. The invoice is the ground truth.
Track these axes separately: provider, model, token type (input / output / cache-read / cache-write), endpoint or feature, environment (prod / staging / batch). Conflating cache-read tokens with input tokens is the single most common baseline error we see.
2. Find the spend concentration
Almost every engagement looks the same: ~80% of spend on 3–5 endpoints. Optimizing the long tail is a waste of operator time. Sort by spend, pick the top five, and ignore everything else for the first six weeks.
3. Apply the playbook in this order
- Prompt caching — provider-native, lowest implementation cost, 50–90% off cache-read tokens. Always do this first.
- Model routing — route easy queries to a smaller/cheaper model, hard queries to the frontier model. Often 30–50% reduction with no quality regression.
- Semantic caching — embed the request, look up similar prior responses. Effective on RAG, classification, and high-volume customer-support flows. 20–40% reduction on cache-friendly endpoints.
- Prompt compression — strip redundancy, move static context to system prompts (where it caches), tighten few-shot examples. 10–25% reduction on token-heavy endpoints.
- Batch routing — non-realtime workloads (eval pipelines, content generation, data enrichment) move to the Batch API. Flat 50% off on eligible volume.
- Provider arbitrage — equivalent-quality models priced lower on Bedrock / Vertex / OpenRouter / Together. 30–60% reduction when quality A/Bs pass.
4. A/B before you ship
Every optimization gets a 7-day A/B against the production baseline. Quality SLOs are agreed up front. Regressions auto-rollback. "Savings" that come with a quality regression aren't savings — they're a refund waiting to happen.
5. Reconcile monthly
Delivered savings = locked baseline − reconciled actuals against raw provider invoices, normalized for traffic growth. Anything else is marketing.