CHANGELOG

What's new.

Service updates, new optimization techniques we've added to the playbook, and platform notes. Updated when something actually ships — not on a marketing cadence.

2026-04 · Q2 playbook refresh

Added Anthropic extended prompt caching (1-hour TTL) to the playbook. ~90% discount on cache-read tokens for steady traffic.
Added OpenAI Batch API routing for non-realtime workloads. 50% discount on eligible jobs.
Updated cache-read token tracking in baselines — these are billed separately and were quietly inflating "input token" lines on some Anthropic invoices.

2026-03 · Provider arbitrage

Added provider arbitrage to recommended optimizations: equivalent-quality models priced 30–60% lower on Bedrock / Vertex / OpenRouter for high-volume endpoints.
Built quality-equivalence A/B harness with LLM-as-judge scoring + human spot-checks.

2026-02 · Semantic cache tuning

Added embedding-model selection guide (`text-embedding-3-small` vs Cohere `embed-v3` vs Voyage) for semantic cache layers.
Tuned default similarity thresholds per workload type (RAG retrieval, chat, classification).

2026-01 · Anomaly detection

Rolled out cost anomaly detector in the Operator Console — flags token leaks, mis-routed traffic, runaway loops within hours instead of monthly invoices.

Older

Pre-2026 updates aren't published. Email us if you want the full history.

← Back to llmcfo.com