CHANGELOG
What's new.
Service updates, new optimization techniques we've added to the playbook, and platform notes. Updated when something actually ships — not on a marketing cadence.
2026-04 · Q2 playbook refresh
- Added Anthropic extended prompt caching (1-hour TTL) to the playbook. ~90% discount on cache-read tokens for steady traffic.
- Added OpenAI Batch API routing for non-realtime workloads. 50% discount on eligible jobs.
- Updated cache-read token tracking in baselines — these are billed separately and were quietly inflating "input token" lines on some Anthropic invoices.
2026-03 · Provider arbitrage
- Added provider arbitrage to recommended optimizations: equivalent-quality models priced 30–60% lower on Bedrock / Vertex / OpenRouter for high-volume endpoints.
- Built quality-equivalence A/B harness with LLM-as-judge scoring + human spot-checks.
2026-02 · Semantic cache tuning
- Added embedding-model selection guide (`text-embedding-3-small` vs Cohere `embed-v3` vs Voyage) for semantic cache layers.
- Tuned default similarity thresholds per workload type (RAG retrieval, chat, classification).
2026-01 · Anomaly detection
- Rolled out cost anomaly detector in the Operator Console — flags token leaks, mis-routed traffic, runaway loops within hours instead of monthly invoices.
Older
Pre-2026 updates aren't published. Email us if you want the full history.
← Back to llmcfo.com