← home
RESEARCH · COST DRIVER

Reasoning tokens are the hidden bill.

Cost note · 4 May 2026

By the LLM CFO team

A lot of teams still think in a 2024 pricing model: input tokens, visible output tokens, done. That mental model is getting outdated. Reasoning models now create an extra cost center because the model can spend tokens thinking, and those tokens are billed as output even when you never see them.

What changed

OpenAI's reasoning-model guidance is explicit: reasoning tokens are not visible through the API, but they still consume context space and are billed as output tokens. That matters because output tokens are usually the expensive side of the bill. Once teams start routing harder traffic to reasoning models, the economics change fast.

Why this catches teams off guard

Most dashboards still emphasize prompt length because input size is easy to see and easy to blame. Reasoning-token cost is harder to spot. The user sees a short answer and assumes the request was cheap. Finance sees output spend drift upward and assumes the model is over-talking. Sometimes the real source is hidden reasoning work in the middle.

Where it usually shows up

How to manage it

  1. Separate model tiers by task value. Not every endpoint deserves deep reasoning.
  2. Track output spend by endpoint and model. That is the closest operational signal when reasoning tokens are invisible.
  3. Reserve reasoning for real complexity. Classification, extraction, and simple transformations usually do not need it.
  4. Use smaller models for pre-routing. Let a cheap model decide whether the expensive model is necessary.
Practical rule: if your output spend rises faster than visible answer length, suspect hidden reasoning before you blame the user prompt.

What this means for cost optimization

Reasoning tokens do not make reasoning models bad. They make sloppy routing expensive. The right question is not "should we use reasoning?" It is "which requests are valuable enough to justify invisible output spend?" That pushes teams back toward the basics: better task segmentation, tighter routing, and explicit economic thresholds.

Related

← Back to llmcfo.com