RESEARCH · OPERATIONS

OpenTelemetry GenAI cost tracking.

Operations guide · June 11, 2026

By the LLM CFO team

OpenTelemetry GenAI semantic conventions define a standard set of attributes to track LLM requests: provider, model, token counts, and cache behavior. Using them as your telemetry foundation makes cost derivation, vendor swaps, and multi-provider reconciliation straightforward.

What the GenAI conventions are

OpenTelemetry's GenAI conventions are a published schema for instrumenting generative AI workloads. They define which span and event attributes to record so that LLM requests are observable across observability backends. Using them eliminates the need to invent your own token-tracking fields; instead, you write once to a standard and then export to any backend that understands the schema.

Key attributes for cost tracking

gen_ai.system. The provider name (e.g., "openai", "anthropic", "bedrock").
gen_ai.request.model. The model sent in the request (e.g., "gpt-4o", "claude-3-opus").
gen_ai.response.model. The actual model that processed the request (often identical, sometimes a fallback or version alias).
gen_ai.usage.input_tokens. Count of input tokens consumed.
gen_ai.usage.output_tokens. Count of output tokens generated.
gen_ai.usage.cache_creation_input_tokens. Tokens written to cache (with cache-write cost).
gen_ai.usage.cache_read_input_tokens. Tokens read from cache (with cache-read discount).
gen_ai.operation.name. Semantic operation name (e.g., "chat", "completions", "embeddings").
gen_ai.user. Identifier for the user or account making the request (optional but recommended for chargeback).
gen_ai.request.frequency_penalty, presence_penalty, temperature, top_p. Hyperparameters that may affect output.

Deriving cost from attributes

Once you have these attributes in your spans, calculating cost becomes a lookup join:

Build a price table. Keys are (provider, model, token_type), values are cost-per-million-tokens. Separate rows for input, output, cache-write, and cache-read to account for provider discounts.
Join the span to the price table. Use gen_ai.system, gen_ai.response.model, and token type to look up the unit price.
Apply token counts separately. Multiply gen_ai.usage.input_tokens by the input price, output_tokens by the output price, cache_creation_input_tokens by the cache-write price (usually identical to input), and cache_read_input_tokens by the cache-read price (OpenAI ~50% discount, Anthropic ~90% discount).
Sum and emit. Total cost = (input_tokens × input_price) + (output_tokens × output_price) + (cache_creation_input_tokens × input_price) + (cache_read_input_tokens × cache_read_price).

Where to export these spans

Langfuse. Native GenAI span support; cost derivation is built-in if you populate the usage attributes.
Helicone. Captures model and token fields; you can configure custom rate cards for cost calculation.
Vanilla OpenTelemetry collectors. Export to a data warehouse (BigQuery, Snowflake, PostgreSQL) for manual reconciliation and BI dashboarding.
Application logs. Emit structured JSON with the GenAI attributes to stdout or a log sink; downstream cost calculations read the logs.

Common pitfalls

Model version granularity. Pricing often depends on exact model version (e.g., gpt-4o vs gpt-4-turbo). Record gen_ai.response.model with full version string; do not abbreviate or normalize unless you have verified the prices are identical.
Missing cache fields. If your instrumentation does not capture cache_read_input_tokens separately, you will undercount the true token volume and overstate cost per token. With cache discounts at 50–90%, missing cache attribution swings profitability.
Sampling. Observability tools often sample high-volume traces to reduce storage cost. Finance requires unsampled token counts. Either disable sampling for GenAI spans or ensure sampled spans are extrapolated before cost aggregation.
Mixing token types. Never sum input, output, and cache-read tokens into a single "total tokens" field for cost; they have different unit prices. Keep them as separate attributes.

Cost tracking rule: if your telemetry does not distinguish cache-read tokens from regular input tokens, your cost dashboard will systematically overstate the per-token expense and miss the largest optimization opportunity.

Integration with observability stacks

Most observability platforms (Datadog, New Relic, OpenTelemetry Protocol exporters) understand the GenAI conventions or have published mappings. Start by instrumenting your LLM client library to emit the attributes to an OpenTelemetry API collector, then export via OTLP to your chosen backend. If your backend does not natively understand the GenAI schema, define a simple transformation rule that maps the attributes to your internal cost model.

← Back to llmcfo.com

OpenTelemetry GenAI cost tracking.

What the GenAI conventions are

Key attributes for cost tracking

Deriving cost from attributes

Where to export these spans

Common pitfalls

Integration with observability stacks

Related