Langfuse vs LiteLLM vs OpenLIT.
20 June 2026
We already compared LiteLLM, Helicone, and LangFuse as gateway vs. proxy vs. platform. This is the other question we keep getting: of the open-source tools that emit LLM cost and usage data, which ones speak OpenTelemetry — so the numbers land in the stack you already run (Grafana, Datadog, Honeycomb, ClickHouse) instead of a tool-specific dashboard somebody has to babysit?
The one-line version
| Tool | Primary role | OpenTelemetry-native? |
|---|---|---|
| LiteLLM | multi-provider gateway / SDK — unify the API, route, budget, fall over | Partial — emits OTLP via a callback |
| LangFuse | tracing + eval + prompt platform with its own data model | Partial — ingests OTLP, stores in its own schema |
| OpenLIT | OpenTelemetry-native GenAI observability — traces, metrics, cost | Yes — OTLP by design |
LiteLLM
A multi-provider gateway and SDK: one OpenAI-shaped API surface across ~100 providers, with virtual keys, per-team budgets, fallbacks, and a local cost table. It is where you put routing and spend caps. We cover it in depth in the three-way gateway comparison. For this discussion the relevant fact is that LiteLLM can emit OpenTelemetry spans through a callback, so the gateway you already route through can become your cost-telemetry source without a second integration.
LangFuse
An observability platform built around traces, evals, prompt management, and datasets. It accepts OTLP and has SDKs, but it stores data in its own model and you read it primarily in the LangFuse UI. Pick it when you need agent traces, LLM-as-judge evals, and prompt versioning — the analysis surface, not just the cost line. Its cost numbers are derived from a price table you maintain, the same caveat as everything else here.
OpenLIT
OpenLIT is the OpenTelemetry-native option. You add one auto-instrumentation call and it wraps your LLM, vector-DB, and framework calls, emitting OTLP traces and metrics that follow the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes). Cost is computed from token counts against a pricing file and attached to the span, so it flows to whatever OTLP backend you already run.
What it does well:
- OTLP-first: traces and metrics go to any OpenTelemetry collector — Grafana/Tempo, Datadog, Honeycomb, ClickHouse — no proprietary lock-in on the data.
- Auto-instrumentation across many SDKs and frameworks, plus vector DBs and GPU utilization, so token cost and infra cost sit in one trace.
- Standard
gen_ai.usage.input_tokens/output_tokensattributes, which means your cost queries are portable across services and tools. - Ships extras — a UI, prompt hub, secrets vault, evals — but none of them are required to get the telemetry out.
What it doesn't do (or does weakly):
- It is not a gateway. It does not route, set budgets, or fall over between providers — pair it with LiteLLM for that.
- Its dashboards are younger and thinner than LangFuse's eval and prompt tooling.
- You need somewhere to send OTLP. If you have no collector and no backend, "OTel-native" is setup cost, not a gift.
- Cost is still token-count × price table — see below.
Where the cost number actually comes from
All three compute spend the same way: token counts multiplied by a pricing table they ship or you maintain. None of them read your invoice. That means three identical failure modes — a stale price table, mis-accounted cache-read tokens (the baseline trap), and provider-specific discounts the table doesn't know about. Whatever you pick, reconcile the derived number against the provider bill monthly, or the dashboard quietly drifts from reality.
The OpenTelemetry question
The reason "OTel-native" is worth caring about is not purity. It is that a gen_ai.* span looks the same whether it came from your checkout service or your support agent, so one query answers "cost per request, by team" across the whole system — and it lives next to your latency and error telemetry instead of in a separate tool. If your telemetry schema is already an OpenTelemetry decision, an OTel-native emitter like OpenLIT (or LiteLLM's OTLP callback) keeps LLM cost in that same pipe. If you are buying an analysis product anyway, LangFuse's richer surface may matter more than schema portability.
How to pick
| Need | Recommended |
|---|---|
| Routing, virtual keys, per-team budgets across providers | LiteLLM |
| LLM cost and usage inside your existing OTel/Grafana/Datadog stack | OpenLIT |
| Agent traces, evals, and prompt versioning as a product surface | LangFuse |
| You want all three things | LiteLLM to route + OpenLIT to emit OTLP; add LangFuse if you need evals |
| You have no observability backend yet | LangFuse hosted is the fastest path to a usable dashboard |
Combining them is normal
These are layers, not competitors. A common 2026 stack is LiteLLM as the gateway (routing + budgets), OpenLIT auto-instrumentation emitting OTLP to your collector, and LangFuse where teams that need evals and prompt management want them. The unifying thread is the OpenTelemetry GenAI convention: if every layer speaks gen_ai.*, the cost number survives swapping any single tool out.
The honest caveats
- All three move fast. Feature parity shifts quarterly; verify before betting a quarter on any single capability.
- None of them save money on their own. They make spend visible. The savings come from acting on what you see — routing, caching, and budgets.
- Self-host vs. hosted is a data-handling decision. Sending prompts to a third-party SaaS is an event your DPA should cover before you turn it on.
- OTel-native is a means, not a metric. The goal is one trustworthy cost number per request; OpenTelemetry is just the cheapest way to keep that number portable.
Related
- LiteLLM vs Helicone vs LangFuse — the gateway / proxy / platform comparison this page builds on
- Tracking LLM costs with OpenTelemetry GenAI conventions
- Your GenAI telemetry schema is now a cost decision
- LLM cost monitoring: what to track and how to control it