← home
RESEARCH · OPERATIONS

LLM cost monitoring.

Operations guide · 29 April 2026

By the LLM CFO team

LLM cost monitoring is the practice of tracking spend at request level, not just invoice level. If you cannot explain which feature, user, model, or retry pattern created the cost spike, you are not monitoring spend. You are only seeing the bill after the damage is done.

The four layers of monitoring

Good LLM cost monitoring combines four data sources. Provider billing tells you what was charged. Gateway logs tell you routing and policy decisions. Application telemetry tells you product context. Warehouse reporting lets you join spend to customers, plans, releases, and margin.

Each layer answers a different question. Provider data answers "what is true?" Application telemetry answers "why did it happen?"

What to track on every request

Use a consistent telemetry schema

OpenTelemetry's GenAI conventions are a strong baseline because they standardize fields like model, input tokens, output tokens, cache-read tokens, conversation identifiers, and provider names. That makes it easier to swap vendors or observability backends without rebuilding your data model every quarter.

The dashboards that matter

Most teams overbuild their dashboard layer and underbuild their tagging layer. Start with five views:

  1. Spend by feature. Top ten endpoints or workflows by total cost.
  2. Spend by model. Where the premium models are actually being used.
  3. Spend by customer or workspace. Profitability and quota control.
  4. Input and output token trends. Detect prompt growth and verbose output drift.
  5. Cache hit and retry rates. Two of the easiest places to find waste.

Alerts worth setting up

Common failure modes

The most common monitoring mistake is relying on response usage fields alone and assuming they are enough for finance. They are not. Another common mistake is logging usage without business metadata, which makes it impossible to tie spend back to customers or features. A third is ignoring tool-call costs and caching behavior, which increasingly show up as material line items.

Monitoring rule: if your dashboard cannot tell you whether the spike came from prompt growth, model drift, retries, or one noisy customer, your instrumentation is still too thin.

Simple stack that works

For a startup, a practical stack is provider-native billing plus a gateway, request metadata in application logs, and an observability tool like Langfuse or Helicone. For larger teams, add a warehouse layer and monthly reconciliation against provider or cloud billing exports.

Monitoring is not optimization, but it is the prerequisite

Monitoring alone does not lower your bill. It gives you the ranked list of what to fix first. Once that list is visible, the usual wins are smaller default models, output caps, prompt cleanup, prompt caching, batching, and quota controls.

Related

← Back to llmcfo.com