What is LLM cost per request?

LLM cost per request is the total cost to serve one API call. It is the atomic unit of LLM profitability because it lets teams compare features, models, customers, and request types instead of looking only at total spend.

How do I calculate cost per request?

Calculate input tokens multiplied by the input price, plus output tokens multiplied by the output price, plus cache-read tokens multiplied by their discounted price. The pricing table must be keyed by provider, model, and token type because input, output, cache, Azure, Bedrock, and native provider prices can all differ.

Why are average LLM request costs misleading?

Averages hide the long tail, model mix shifts, feature-level profitability, and caching effects. One customer with very long prompts or one endpoint moving to a premium model can be invisible in the aggregate until the bill arrives. Segmenting by feature, model, customer, and request type is what makes the metric operational.

What reduces cost per request?

The main levers are model choice, prompt length, output caps, prompt caching, fewer retries and fallbacks, and batch processing for latency-insensitive work. Measure against your own 30-day baseline because each team's cost per request depends on its models, context lengths, output requirements, and caching strategy.

RESEARCH · OPERATIONS

LLM cost per request.

Operations guide · July 1, 2026

By the LLM CFO team

LLM cost per request is the unit economics metric that measures the total cost to serve one API call. It is the atomic unit of LLM profitability. If you only know total spend and total requests, you cannot distinguish between a cheap mass-market feature and an expensive specialist workflow;yet they require opposite optimization strategies.

The formula

Cost per request is deceptively simple to state and challenging to compute correctly. The formula is:

cost_per_request =
(input_tokens × input_price_per_1m) +
(output_tokens × output_price_per_1m) +
(cache_read_tokens × discounted_price_per_1m)

The trick is the pricing table. It must be keyed by provider + model + token type because OpenAI's GPT-4o input and output prices differ, Anthropic's cache-read discount is ~90% (not 50%), and the same model name on Azure or Bedrock has different pricing than native.

Why averages are dangerous

Mean hides the long tail. If one customer submits 100k-token prompts and the rest submit 1k tokens, the mean cost per request is misleading. The p95 cost per request matters more for margin planning.
Model mix shift is invisible at the aggregate. Switching one endpoint from GPT-3.5 to GPT-4 doubles cost per request for that endpoint. If it is 5% of your traffic, the overall cost per request rise is barely visible until the bill arrives.
Feature profitability inverts with scale. A feature that costs $0.001 per request and generates $0.002 revenue is profitable at 100k users. At 1M users, if operational costs are fixed, it becomes a loss leader;but only if you segment by feature.
Caching returns are invisible to the aggregate. Implementing prompt caching on one endpoint might reduce its cost per request by 60%, but if that endpoint is only 3% of your traffic, you will not see it in the company-wide metric.

Segment by feature, model, customer, and request type

The operational power of cost per request comes from segmentation:

By feature or endpoint. Which workflows are expensive? Which are cheap? Are your high-margin features using cheap models?
By model. What is the actual cost per request for GPT-4, Sonnet, and Haiku in your actual prompt distribution? Real data beats benchmark sheets.
By customer or workspace. Do heavy users trigger longer contexts or more retries? Does one customer pay more per request than another because they use different models?
By request type. Is your search feature cheaper than your summarization feature? Which retry loops are the most expensive?

Use it for margin analysis

Cost per request is the denominator of unit economics. The numerator is revenue per request, which you compute the same way:

margin_per_request = revenue_per_request − cost_per_request

For a feature billed at $0.01 per request, a cost per request of $0.004 leaves a $0.006 contribution. For the same billed price, if cost per request rises to $0.008, the contribution drops to $0.002. At scale, that difference is material. Tools like Langfuse and Helicone expose this data directly; internal deployments can compute it from provider billing exports plus application logs.

What moves cost per request

Model choice. Switching to a smaller model is the fastest lever. GPT-3.5 costs ~12% of GPT-4o input price.
Prompt length. Longer system prompts, few-shot examples, and context windows all scale input tokens linearly with cost.
Output caps. Limiting max completion tokens is a hard ceiling on output cost per request.
Prompt caching. OpenAI and Anthropic both discount cached tokens heavily (Anthropic ~90%, OpenAI ~50%). For stable context (system prompts, retrieved documents, code samples), caching returns 5–10× ROI.
Retries and fallbacks. Each retry or model fallback adds cost per request. Minimize failed requests and timeouts.
Batch processing. For latency-insensitive work, batch API endpoints usually have lower per-token pricing than synchronous calls.

Unit economics rule: if your margin per request is below 50% of your gross margin per user, that feature is a loss leader masquerading as a commodity.

How to measure it in practice

Start with your observability layer. If you use OpenTelemetry GenAI conventions, you have provider, model, input tokens, output tokens, and cache tokens on every span. Multiply each by the price table. If you use LiteLLM, it tracks cost per request natively. If you self-host, export provider usage CSVs (OpenAI, Anthropic, Bedrock, Azure, Vertex) and join them to application logs by timestamp and model. The result is a cost column you can segment by any business dimension.

Directional guidance only: compute your own baseline

Every team's cost per request is shaped by their own models, context length, output requirements, and caching strategy. Publishing a benchmark number like "Sonnet costs $0.002 per request" is tempting but useless: it depends on whether you cache, how long your prompts are, whether you retry, and which token ratios apply to your workload. Instead, use the formula above, gather 30 days of real telemetry, compute the p50 and p95 cost per request for each feature, and use those as your baseline. Then measure whether prompt caching, smaller models, or shorter contexts actually reduce it. Real data beats industry folklore every time.

← Back to llmcfo.com

LLM cost per request.

The formula

Why averages are dangerous

Segment by feature, model, customer, and request type

Use it for margin analysis

What moves cost per request

How to measure it in practice

Directional guidance only: compute your own baseline

Related