← home
RESEARCH · CATEGORY

AI FinOps.

Operating model guide · 29 April 2026

By the LLM CFO team

AI FinOps is the discipline of measuring, allocating, optimizing, and reconciling AI spend against business value. In practice, that means making LLM usage visible by team, feature, customer, and provider, then applying the same financial rigor to token spend that cloud teams already apply to compute and storage.

Why AI FinOps exists at all

Traditional cloud FinOps assumes infrastructure-like units: instances, storage, bandwidth, reservations. LLM systems behave differently. Cost is driven by prompts, output length, retries, model choice, cache behavior, tool calls, and agent loops. The unit of waste is not only infrastructure. It is architectural.

That is why AI FinOps sits between engineering, finance, and product. Finance needs reconciled reporting. Engineering needs request-level visibility. Product needs to know whether a feature is economically viable at current usage and quality.

The four jobs of AI FinOps

  1. Measure. Track tokens, tool calls, latency, model mix, cache hits, and estimated cost per request.
  2. Allocate. Attribute spend to teams, features, environments, customers, and internal cost centers.
  3. Optimize. Reduce waste with routing, batching, prompt cleanup, caching, and quota controls.
  4. Reconcile. Tie internal estimates back to provider or cloud billing truth every month.

The minimum data model

Every model call should be tagged with provider, model, feature, environment, customer or workspace, and request path. You also want input tokens, output tokens, cache-read or cache-write tokens where available, latency, retry count, and estimated cost. Without that, AI FinOps turns into month-end guesswork.

OpenTelemetry's GenAI semantic conventions are useful here because they give teams a common schema for token usage, model identity, conversation identifiers, and retrieval context. Even if you never expose raw traces to finance, the discipline of structured telemetry keeps cost data consistent.

Where the real savings usually come from

AI FinOps is not mostly about negotiating list prices. Most savings come from changing the request path.

What a good AI FinOps stack looks like

The best stack usually has four layers. Provider or cloud billing is the system of record. A gateway or proxy layer centralizes routing and policy. An observability layer adds product context. A warehouse or BI layer joins AI spend to business dimensions like account, plan, and margin.

Rule of thumb: if finance, platform, and product all use different numbers for AI spend, you do not have AI FinOps yet. You have dashboards.

Which metrics matter most

Who owns AI FinOps

No single team can do it alone. Engineering owns instrumentation and optimization. Finance owns reporting and controls. Product owns value and traffic intent. A workable operating model usually gives one platform or FinOps lead responsibility for the scorecard, then pushes optimization into the engineering teams that own the expensive endpoints.

When teams need help

The inflection point is usually when AI becomes a top-three line item or when no one can explain the invoice by feature. At that stage, you need more than prompt advice. You need baseline reconciliation, ranked savings opportunities, and a governance model that survives the next provider or product launch.

Related

← Back to llmcfo.com