RESEARCH
AI FinOps research.
Field notes on reducing LLM spend without hiding quality regressions. Start with the category guides, then use the technique pages for specific levers.
Start here
- AI FinOps - the operating model for measuring, allocating, optimizing, and reconciling LLM spend.
- LLM cost monitoring - what to track, how to tag it, and which dashboards actually matter.
- OpenAI cost optimization - the highest-leverage ways to lower API spend in production.
- LLM cost optimization: a practical guide — the full playbook: routing, caching, compression, batch APIs, and provider arbitrage.
- Provider price benchmarks — current list-price references and methodology.
- Glossary — plain-English definitions for billing and optimization terms.
Governance
- AI governance for finance leaders — the pillar guide: policy, allocation, accountability.
- What is AI governance? — plain-English definition for 2026.
- AI governance framework — the five-layer operating model and 90-day plan.
New and timely
- AI FinOps in 2026 - why cost optimization is shifting from isolated prompt tricks to an operating model.
- OpenAI Flex vs Batch - the 2026 playbook for moving low-priority work off the expensive standard path.
- Prompt caching in 2026 - why teams still leave money on the table even after "turning caching on."
- Reasoning tokens are the hidden line item on your AI bill - why invisible output spend is becoming a real production issue.
- Agent spend guardrails - the budgets, retry limits, and loop controls that keep agent workflows sane.
- Conversation state is a cost lever now - why state handling is becoming a real AI cost architecture problem.
- Multimodal costs sneak up faster than text costs - why image, audio, and realtime AI often need their own cost model.
- AI chargeback and showback are becoming real in 2026 - what has to be true before allocating AI spend internally.
- Evals need cost discipline too - why evaluation quality and evaluation economics now have to coexist.
- Background mode is an economics feature, not just a reliability feature - why async AI work changes cost architecture, not just uptime.
- Built-in tools are not free sidecars anymore - why search, retrieval, and code execution deserve their own budgets.
- Your GenAI telemetry schema is now a cost decision - why observability structure now shapes FinOps quality.
Optimization levers
- LLM cost per request - the unit economics metric and how to segment it.
- Model routing without quality regressions
- Semantic caching for LLMs
- Prompt caching: OpenAI vs Anthropic vs Bedrock
- Batch API routing: 50% off for the work that can wait
- Provider arbitrage: same model, different price
Accounting and tooling
- LLM cost monitoring: what to track and how to control it
- LLM cost dashboards - the five views that matter and the fields they need.
- Tracking LLM costs with OpenTelemetry GenAI conventions - gen_ai.* attributes and deriving cost from spans.
- How to track AI token usage - token types, per-request fields, and aggregation levels.
- Cache-read tokens: the baseline trap
- Cache invalidation cost: the hidden line item
- Scale Tier vs Flex vs Batch vs Standard
- LiteLLM vs Helicone vs LangFuse