OpenAI Flex vs Batch.
Provider playbook · 1 May 2026
One of the more useful 2026 shifts in OpenAI economics is that teams now have two clearer low-cost paths for non-urgent work: Batch API and Flex processing. The hard part is not knowing they exist. It is deciding which workloads belong where.
What changed
Batch API has been the obvious answer for asynchronous jobs: collect requests, submit them, and get lower pricing in exchange for waiting. Flex processing is a newer, more interesting option because it keeps the same request shape as standard processing while trading off speed and availability for lower cost.
The simplest distinction
- Use Batch when work is explicitly offline and can be handled as a job queue.
- Use Flex when the request still looks like a normal API call but is low-priority enough to tolerate slower handling and occasional resource unavailability.
Good Batch candidates
- eval runs
- nightly enrichment
- content backfills
- large document reprocessing
- offline classification
Good Flex candidates
- internal tools
- non-customer-facing background jobs
- low-priority asynchronous requests
- workloads that benefit from simpler request plumbing than Batch
Where teams get this wrong
The common mistake is leaving too much work on the standard real-time path because "it only takes a few seconds." At scale, that assumption becomes expensive. The better question is not whether the call is fast enough. It is whether the user needs the result immediately, and what the business loses if it arrives later.
Why Flex is interesting
Flex matters because it lowers the activation energy for cost optimization. You do not need to redesign a workload into a fully separate batch pipeline on day one. For some teams, moving a low-priority request to service_tier=flex is the first realistic step toward a cheaper architecture.
Why Batch still matters
Batch remains the cleanest answer when the workflow is clearly job-shaped. It gives you economic discipline by forcing you to admit that some workloads were never meant to be synchronous in the first place. That architectural honesty is often worth as much as the pricing delta.
How to choose between them
- If the user never waits on it, start by asking whether it should be Batch.
- If it still needs API-style handling but is low priority, consider Flex.
- If the workload is revenue-critical or latency-sensitive, keep standard processing and optimize elsewhere first.