RESEARCH · COST DRIVER

Reasoning tokens are the hidden bill.

Cost note · 4 May 2026

By the LLM CFO team

A lot of teams still think in a 2024 pricing model: input tokens, visible output tokens, done. That mental model is getting outdated. Reasoning models now create an extra cost center because the model can spend tokens thinking, and those tokens are billed as output even when you never see them.

What changed

OpenAI's reasoning-model guidance is explicit: reasoning tokens are not visible through the API, but they still consume context space and are billed as output tokens. That matters because output tokens are usually the expensive side of the bill. Once teams start routing harder traffic to reasoning models, the economics change fast.

Why this catches teams off guard

Most dashboards still emphasize prompt length because input size is easy to see and easy to blame. Reasoning-token cost is harder to spot. The user sees a short answer and assumes the request was cheap. Finance sees output spend drift upward and assumes the model is over-talking. Sometimes the real source is hidden reasoning work in the middle.

Where it usually shows up

Hard routing thresholds. Too much traffic gets escalated to premium reasoning paths.
Agent loops. The model re-thinks each step of a long workflow.
Overusing reasoning models. Work that should stay on a smaller model ends up on a heavier path by default.
Loose output budgets. Teams optimize the prompt but never constrain total output and reasoning headroom.

How to manage it

Separate model tiers by task value. Not every endpoint deserves deep reasoning.
Track output spend by endpoint and model. That is the closest operational signal when reasoning tokens are invisible.
Reserve reasoning for real complexity. Classification, extraction, and simple transformations usually do not need it.
Use smaller models for pre-routing. Let a cheap model decide whether the expensive model is necessary.

Practical rule: if your output spend rises faster than visible answer length, suspect hidden reasoning before you blame the user prompt.

What this means for cost optimization

Reasoning tokens do not make reasoning models bad. They make sloppy routing expensive. The right question is not "should we use reasoning?" It is "which requests are valuable enough to justify invisible output spend?" That pushes teams back toward the basics: better task segmentation, tighter routing, and explicit economic thresholds.

← Back to llmcfo.com

Reasoning tokens are the hidden bill.

What changed

Why this catches teams off guard

Where it usually shows up

How to manage it

What this means for cost optimization

Related