What are the four OpenAI processing paths?

Standard is the default synchronous path at list price per token. Scale Tier is committed throughput at a premium with latency guarantees. Flex is synchronous at lower per-token price but provider can deprioritize when demand is high. Batch is offline processing with ~50% of standard pricing.

What three variables decide which processing tier to use?

Latency tolerance: users waiting favor Standard or Scale Tier; queues favor Flex; dashboards refreshing daily favor Batch. Traffic shape: smooth and predictable favors Scale Tier; spiky and unpredictable favor Standard plus Flex; pure offline favors Batch. Committed-spend posture: can finance commit 6–12 months? Scale Tier becomes interesting. If unsure, Standard plus Flex is cheaper in expectation.

Which tier should I use for customer-facing workloads?

Use Standard for interactive traffic you cannot predict, or Scale Tier if volume is large and smooth. User is waiting; capacity SLA matters more than per-token price. Move to Scale Tier only once the daily curve is predictable and boring.

Which tier should I use for batch and background workloads?

Use Batch for nightly enrichment, classification, eval runs, and content backfill. Async by design; hours of wall time are fine. The discount is the entire reason these workloads exist at scale.

RESEARCH · PROVIDER POSTURE

Scale Tier vs Flex vs Batch vs Standard.

Published · July 12, 2026

By the LLM CFO team

Most teams default every call to standard processing. That is rarely the right answer. OpenAI now ships four processing paths with materially different price and latency profiles, and the cost gap between "all standard" and "the right tier per workload" is usually the biggest line item we touch in a first engagement. This is the decision matrix we use.

The four paths, in one sentence each

Standard is the default synchronous path ; pay list price per token, get normal latency, no capacity commitments either direction. Good for interactive traffic you cannot predict.

Scale Tier is committed throughput at a premium. You reserve units of capacity for a term; in return you get latency and capacity guarantees that standard does not promise. Cheaper than standard only at the high end of utilization, and only if the commitment matches actual traffic.

Flex is the same synchronous request shape as standard, at a lower per-token price, with the explicit trade that the provider can deprioritize, slow, or briefly refuse your traffic when overall demand is high. The request returns; it just may return slower, or with a retry-after.

Batch is offline. You submit a JSONL file of requests, the provider processes them within a documented window (commonly up to 24 hours), and you collect results. The list-price posture is roughly half of standard. Wrong tool for anything a human is waiting on; right tool for anything a job scheduler is waiting on.

The three variables that decide

Every workload picks a tier on three axes, in this order:

1. Latency tolerance. If a user is staring at a spinner, you are in the standard or Scale Tier conversation. If a queue is staring at a spinner, Flex is on the table. If a dashboard refreshes in the morning, Batch is on the table.

2. Traffic shape. Smooth and predictable favors Scale Tier ; you can size the commitment honestly. Spiky and unpredictable favors standard plus Flex overflow ; you pay list for the peaks you actually use, and run cheaper paths for everything that can wait. Pure offline favors Batch.

3. Committed-spend posture. Can finance commit to a 6 or 12 month throughput line and defend it? Scale Tier becomes interesting. If next quarter's volume is a guess, do not lock anything in ; standard plus Flex is cheaper in expectation than an over-committed reservation.

Get those three answers before you look at price. Picking a tier on price first is how teams end up paying for capacity they do not use.

A decision matrix

The archetypes below cover most of what we see in the field. Read across; the rationale matters more than the label.

Workload	Default tier	Rationale
Customer-facing chat / support	Standard, Scale Tier if volume is large and smooth	User is waiting. Capacity SLA matters more than per-token price. Move to Scale Tier only once the daily curve is boring.
Internal copilot / IDE assistant	Standard for the hot path, Flex for non-blocking calls	Employees tolerate more variance than customers. Background suggestions, refactors, and explanations can run cheaper and slower without complaint.
Nightly enrichment / classification	Batch	Async by design. Hours of wall time are fine. The discount is the entire reason this workload exists at scale.
Eval runs / regression suites	Batch, Flex if you are iterating live	Most evals can wait. The exception is the engineer debugging an eval at 2pm ; Flex keeps the same request shape and is cheaper than standard.
Content backfill / corpus rewrite	Batch	Large volume, no user attached, retry-friendly. If you are running this on standard, you are leaving the most money on the table of any line item we touch.
RAG retrieval-time generation	Standard	Latency-sensitive, hard to predict shape per query. Flex risks making search feel broken when capacity is tight.
Agent loop steps (interactive)	Standard	A slow step compounds across the loop. Save Flex for tool calls that the agent can already handle async.
Agent loop steps (background workflows)	Flex, Batch for plan-then-execute	If no human is watching the agent, latency variance is acceptable. Plan-and-execute agents with offline plans can batch the plan step.
Embeddings refresh / re-index	Batch	Almost always offline; the discount applies cleanly; failure handling is simple.
Spiky high-volume customer feature	Standard with Flex overflow	Pay list for the peaks you actually serve; do not commit Scale Tier units to a peak you only hit twice a week.

When Scale Tier actually pays off

Scale Tier is the tier most teams pick wrong. The case for it is narrow:

Smooth, high-volume traffic. A daily curve you can draw from memory. If the variance band on your hourly request rate is wider than the commitment unit, you will over-buy.
Capacity matters more than price. When a missed capacity SLA hurts revenue ; checkout assistants, paid-tier chat, anything where degraded latency is a customer-facing incident ; the premium buys insurance, not throughput per dollar.
The business can predict 12 months out. If finance signs the commit and product cannot defend the volume assumption, you have just turned a variable cost into a fixed cost on bad terms.

Outside those three conditions, standard plus Flex usually wins on total cost, and Batch usually wins on the offline portion. We have moved more workloads off Scale Tier than onto it.

When Flex wins over Batch

Flex and Batch are both "cheaper than standard," but they fit different shapes.

Flex wins when the workload still wants the synchronous request shape ; request in, response out, same code path ; but is happy to retry or wait a little longer. Internal tooling, second-pass enrichment that fires from a web request, agent steps that are not on the user's critical path, regenerations after a user clicks "try again." None of those want a batch job orchestrator. They want the same SDK call, with a cheaper bill and a tolerance for the occasional retry-after.

Batch wins when you can rephrase the workload as a file. If you are already writing rows to a queue or table, you can already write JSONL. If you are already triggering work on a schedule, you can already wait on a job ID. The orchestration overhead is the price of admission; the discount pays it back fast at volume.

When Batch is the right answer even at the cost of latency

Sometimes hours of delay is a feature, not a bug. The signal that Batch is the correct call:

The output is consumed by another system, not a person.
The volume is large enough that the discount shows up as a real line item, not a rounding error.
The work is naturally idempotent ; re-running a chunk costs nothing the business cares about.
The downstream system can wait until tomorrow morning for last night's data.

Most enrichment, classification, summarization-of-corpora, eval, and content-rewrite workloads check all four boxes. Putting any of them on standard is a posture mistake, not a performance choice.

What we would never put on Scale Tier

Workloads that should not see a Scale Tier commitment, even at scale:

Anything where capacity is not the bottleneck. If your problem is per-token cost or model quality, Scale Tier does not solve it.
Anything already comfortably served by Flex. You are paying a premium to undo the discount.
Anything served by Batch. You are paying a premium to undo a larger discount.
Workloads with traffic variance wider than the commitment unit. You will eat the unused capacity every quiet hour.
Pilot or proof-of-concept volume. Commitments are for boring, proven workloads.

The pricing posture that ties it together

The pattern that holds up across engagements: most teams should be on a mix, not a single tier.

Standard carries the interactive customer path ; chat, search, checkout, anything where a slow response is a product defect. Flex covers internal tools and non-blocking customer features ; the workloads that benefit from the synchronous shape but do not need the capacity guarantee. Batch handles offline work ; enrichment, evals, embeddings, content rewrites, anything a scheduler can consume. Scale Tier is reserved, narrowly, for the workloads where a missed capacity SLA would visibly hurt revenue.

The mistake we see most often is the inverse: everything on standard "to keep it simple," with the Batch and Flex savings left on the table for a year, and a Scale Tier commitment bolted on top because someone in finance asked for predictable pricing. That stack is the most expensive way to run any of these workloads.

How to evaluate this in a week

You do not need a quarter to get the first cut right.

Day 1. Pull a week of usage. Tag every workload as interactive, internal, or offline. Note traffic shape ; smooth or spiky.

Day 2. For each workload, write the latency budget in plain language. "User waits" / "queue waits" / "scheduler waits." That sentence picks the tier.

Day 3. Move the clearest Batch candidates first ; nightly enrichment, eval runs, embeddings refresh. These are the safest moves and usually the largest dollar wins.

Day 4. Move internal-tool traffic to Flex behind a gateway or feature flag. Watch retry rates. If they stay reasonable, leave it there.

Day 5. Re-examine any existing Scale Tier commitment against the last 30 days of actual hourly usage. If utilization is below the breakeven, do not renew at that size.

That sequence usually lands the bulk of the savings before anyone has to argue about a commit. Scale Tier is a conversation you have later, once the mix is honest.

← Back to research

Scale Tier vs Flex vs Batch vs Standard.

The four paths, in one sentence each

The three variables that decide

A decision matrix

When Scale Tier actually pays off

When Flex wins over Batch

When Batch is the right answer even at the cost of latency

What we would never put on Scale Tier

The pricing posture that ties it together

How to evaluate this in a week

Related