Conversation state is a cost lever.
Architecture note · 4 May 2026
In 2026, conversation state is no longer just a product-design concern. It is part of your cost architecture. Teams that keep re-sending too much history, or restart reasoning unnecessarily between tool steps, are paying avoidable token tax on every multi-step workflow.
Why this matters more now
OpenAI's newer conversation-state and reasoning guidance makes the shift clear: context windows now include input, output, and reasoning tokens. Meanwhile, multi-step workflows increasingly depend on previous responses, tool outputs, and intermediate state. Once these flows get longer, how you carry state becomes an economic decision.
The expensive mistake
The expensive pattern is simple: every turn replays too much history, too much tool output, and too much irrelevant context. The system feels stateless from an application perspective, but the model keeps paying to re-ingest what it already effectively knows.
What better state handling does
- Reduces repeated input tokens.
- Reduces the chance of restarting reasoning work.
- Keeps long workflows from inflating into giant prompts.
- Makes tool-based conversations more cache- and context-friendly.
Where teams should look first
- Tool loops. Are you re-sending full history between every function call?
- Session summaries. Are you summarizing older turns instead of carrying raw transcript forever?
- Previous-response chaining. Are you using API features that preserve relevant context instead of reconstructing it manually every time?
- Prompt boundaries. Have you separated durable instructions from noisy conversational state?
Why this pairs with prompt caching
Conversation-state discipline and prompt caching reinforce each other. Better state handling reduces unnecessary replay, while stable prompt structure increases the portion of that replay that becomes cheap. If your state is chaotic, your cache will be chaotic too.
What to measure
- Average context size by turn number
- Input-token growth across long sessions
- Model calls per completed workflow
- Cost per tool step