Reasoning tokens are the hidden bill.
Cost note · 4 May 2026
A lot of teams still think in a 2024 pricing model: input tokens, visible output tokens, done. That mental model is getting outdated. Reasoning models now create an extra cost center because the model can spend tokens thinking, and those tokens are billed as output even when you never see them.
What changed
OpenAI's reasoning-model guidance is explicit: reasoning tokens are not visible through the API, but they still consume context space and are billed as output tokens. That matters because output tokens are usually the expensive side of the bill. Once teams start routing harder traffic to reasoning models, the economics change fast.
Why this catches teams off guard
Most dashboards still emphasize prompt length because input size is easy to see and easy to blame. Reasoning-token cost is harder to spot. The user sees a short answer and assumes the request was cheap. Finance sees output spend drift upward and assumes the model is over-talking. Sometimes the real source is hidden reasoning work in the middle.
Where it usually shows up
- Hard routing thresholds. Too much traffic gets escalated to premium reasoning paths.
- Agent loops. The model re-thinks each step of a long workflow.
- Overusing reasoning models. Work that should stay on a smaller model ends up on a heavier path by default.
- Loose output budgets. Teams optimize the prompt but never constrain total output and reasoning headroom.
How to manage it
- Separate model tiers by task value. Not every endpoint deserves deep reasoning.
- Track output spend by endpoint and model. That is the closest operational signal when reasoning tokens are invisible.
- Reserve reasoning for real complexity. Classification, extraction, and simple transformations usually do not need it.
- Use smaller models for pre-routing. Let a cheap model decide whether the expensive model is necessary.
What this means for cost optimization
Reasoning tokens do not make reasoning models bad. They make sloppy routing expensive. The right question is not "should we use reasoning?" It is "which requests are valuable enough to justify invisible output spend?" That pushes teams back toward the basics: better task segmentation, tighter routing, and explicit economic thresholds.