← home
RESEARCH · OPERATIONS

LLM cost per request.

Operations guide · 11 June 2026

By the LLM CFO team

LLM cost per request is the unit economics metric that measures the total cost to serve one API call. It is the atomic unit of LLM profitability. If you only know total spend and total requests, you cannot distinguish between a cheap mass-market feature and an expensive specialist workflow—yet they require opposite optimization strategies.

The formula

Cost per request is deceptively simple to state and challenging to compute correctly. The formula is:

cost_per_request =
(input_tokens × input_price_per_1m) +
(output_tokens × output_price_per_1m) +
(cache_read_tokens × discounted_price_per_1m)

The trick is the pricing table. It must be keyed by provider + model + token type because OpenAI's GPT-4o input and output prices differ, Anthropic's cache-read discount is ~90% (not 50%), and the same model name on Azure or Bedrock has different pricing than native.

Why averages are dangerous

Segment by feature, model, customer, and request type

The operational power of cost per request comes from segmentation:

Use it for margin analysis

Cost per request is the denominator of unit economics. The numerator is revenue per request, which you compute the same way:

margin_per_request = revenue_per_request − cost_per_request

For a feature billed at $0.01 per request, a cost per request of $0.004 leaves a $0.006 contribution. For the same billed price, if cost per request rises to $0.008, the contribution drops to $0.002. At scale, that difference is material. Tools like Langfuse and Helicone expose this data directly; internal deployments can compute it from provider billing exports plus application logs.

What moves cost per request

Unit economics rule: if your margin per request is below 50% of your gross margin per user, that feature is a loss leader masquerading as a commodity.

How to measure it in practice

Start with your observability layer. If you use OpenTelemetry GenAI conventions, you have provider, model, input tokens, output tokens, and cache tokens on every span. Multiply each by the price table. If you use LiteLLM, it tracks cost per request natively. If you self-host, export provider usage CSVs (OpenAI, Anthropic, Bedrock, Azure, Vertex) and join them to application logs by timestamp and model. The result is a cost column you can segment by any business dimension.

Directional guidance only: compute your own baseline

Every team's cost per request is shaped by their own models, context length, output requirements, and caching strategy. Publishing a benchmark number like "Sonnet costs $0.002 per request" is tempting but useless: it depends on whether you cache, how long your prompts are, whether you retry, and which token ratios apply to your workload. Instead, use the formula above, gather 30 days of real telemetry, compute the p50 and p95 cost per request for each feature, and use those as your baseline. Then measure whether prompt caching, smaller models, or shorter contexts actually reduce it. Real data beats industry folklore every time.

Related

← Back to llmcfo.com