← home
OPERATIONS · MONITORING

How to track AI token usage.

Operations guide · 11 June 2026

By the LLM CFO team

Token usage tracking is the practice of recording and categorizing every token type flowing through your LLM requests. Without it, cost baselines become meaningless, caching cannot be measured, and you cannot distinguish growth in reasoning workloads from growth in normal compute.

Token types and their cost structure

Essential fields to log on every request

Aggregation levels that matter

Structure your telemetry to support rollups at each level:

  1. Per-request. The atomic fact: one call to the provider with timestamp, tokens, model, and tags.
  2. Per-feature or endpoint. Sum requests by the feature that triggered them. This is where most quick wins live.
  3. Per-customer or account. Necessary for invoicing, quota enforcement, and unit economics.
  4. Per-invoice period. Match your provider statement. Variance here usually signals missing requests or timestamp misalignment.

Common pitfalls that corrupt your baseline

Tracking rule: if you cannot disaggregate your daily cost growth into (prompt length × request volume), (model mix shift), (reasoning token growth), and (cache hit rate change), your telemetry is incomplete.

Practical tools and schema

OpenTelemetry's GenAI semantic conventions are a strong starting point because they standardize field names (model, input_tokens, output_tokens, llm.usage.cache_creation_input_tokens, llm.usage.cache_read_input_tokens) and reduce vendor lock-in. Observability platforms like Langfuse, Helicone, and LiteLLM all support this convention. For teams with custom logging, the schema should mirror these fields—separate buckets for each token type, not a sum.

Related

← Back to llmcfo.com