← home
RESEARCH · PROVIDER TREND

OpenAI Flex vs Batch.

Provider playbook · 1 May 2026

By the LLM CFO team

One of the more useful 2026 shifts in OpenAI economics is that teams now have two clearer low-cost paths for non-urgent work: Batch API and Flex processing. The hard part is not knowing they exist. It is deciding which workloads belong where.

What changed

Batch API has been the obvious answer for asynchronous jobs: collect requests, submit them, and get lower pricing in exchange for waiting. Flex processing is a newer, more interesting option because it keeps the same request shape as standard processing while trading off speed and availability for lower cost.

The simplest distinction

Good Batch candidates

Good Flex candidates

Where teams get this wrong

The common mistake is leaving too much work on the standard real-time path because "it only takes a few seconds." At scale, that assumption becomes expensive. The better question is not whether the call is fast enough. It is whether the user needs the result immediately, and what the business loses if it arrives later.

Why Flex is interesting

Flex matters because it lowers the activation energy for cost optimization. You do not need to redesign a workload into a fully separate batch pipeline on day one. For some teams, moving a low-priority request to service_tier=flex is the first realistic step toward a cheaper architecture.

Why Batch still matters

Batch remains the cleanest answer when the workflow is clearly job-shaped. It gives you economic discipline by forcing you to admit that some workloads were never meant to be synchronous in the first place. That architectural honesty is often worth as much as the pricing delta.

The 2026 takeaway: real savings are coming from reclassifying work by urgency. Standard processing is no longer the default answer for everything that looks like an API call.

How to choose between them

  1. If the user never waits on it, start by asking whether it should be Batch.
  2. If it still needs API-style handling but is low priority, consider Flex.
  3. If the workload is revenue-critical or latency-sensitive, keep standard processing and optimize elsewhere first.

Related

← Back to llmcfo.com