TL;DR
- Open-source (Apache 2.0) LLM observability platform from Helicone Inc., founded 2022, backed by Y Combinator. Available as managed cloud (helicone.ai) and self-hosted.
- Proxy-first architecture: route OpenAI, Anthropic, or any OpenAI-compatible endpoint through Helicone's gateway and every request, response, latency, and cost is logged with zero code changes beyond a base URL swap.
- Provides traces, custom properties, user-level analytics, cost tracking, prompt experiments, caching, rate limiting, and a feedback API. Supports async logging via SDK when proxying is undesirable.
- Standard pick when the priority is shipping LLM telemetry fast without restructuring application code. Pairs with Langfuse or Phoenix when deeper evaluation workflows are needed.
Proxy vs SDK#
LLM observability tools differ mainly in how they intercept calls. Langfuse and Phoenix prefer SDK-level instrumentation: import their library, decorate a function, and traces flow out. Helicone's default mode is the inverse — it sits in front of the LLM provider as a proxy. The application points at `https://oai.helicone.ai/v1` instead of `https://api.openai.com/v1`, adds a `Helicone-Auth` header, and the proxy logs the request and response on the way through.
The trade-off is integration cost vs. depth. The proxy captures everything that flows through it but knows nothing about internal application steps (retrieval, tool calls, planning). For application-internal spans, Helicone offers an async logging SDK and OTel-compatible ingest. Many teams use both: proxy for raw LLM calls, SDK for orchestration steps.
Getting Started#
from openai import OpenAI
# One-line change: rewrite the base URL, add the auth header
client = OpenAI(
base_url="https://oai.helicone.ai/v1",
default_headers={
"Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
# Optional: tag this request for filtering in the dashboard
"Helicone-User-Id": user.id,
"Helicone-Property-Feature": "rag-search",
},
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": question}],
)
# Trace now visible in Helicone with user, feature, cost, latency, tokens.Custom Properties and Sessions#
Helicone's primary discriminators are HTTP headers. `Helicone-User-Id` groups requests by end user. `Helicone-Session-Id` groups multi-turn conversations. `Helicone-Property-*` headers attach arbitrary tags (feature flag, experiment arm, customer tier) that flow into filters and aggregations on the dashboard.
Because the discriminators are headers, the application stays decoupled from Helicone's SDK — switching observability vendors only requires changing what headers you emit, not rewriting code.
Gateway Features#
Sitting in the request path lets Helicone do more than log. The gateway can transparently:
- Cache identical prompts and return the cached completion (configurable TTL).
- Apply per-user or per-key rate limits before requests reach the provider.
- Route across providers (OpenAI ↔ Azure OpenAI ↔ Anthropic) with failover.
- Mask sensitive fields in logs (PII redaction) before they hit storage.
- Stream responses through the proxy without breaking SSE clients.
Every proxy adds a hop. Helicone Cloud's gateway is global and low-latency, but for self-hosted deployments measure the latency overhead against your SLO before turning the proxy on for hot paths.
Pricing and Hosting#
Helicone Cloud has a free tier (10k requests/month) and usage-based paid tiers above that. The open-source self-hosted edition is Apache 2.0 and stores data in Clickhouse plus PostgreSQL plus MinIO. The self-hosted setup is the right call for regulated industries that cannot send prompts to a third party.
Helicone vs Langfuse vs Phoenix#
Pick Helicone when the priority is visibility on raw LLM calls without code restructuring. Pick Langfuse when prompt versioning and evaluation runs sit at the centre of the workflow. Pick Phoenix when retrieval debugging or wider ML observability matter. The three are not mutually exclusive — many production stacks proxy through Helicone and ingest application-level traces into Langfuse or Phoenix.
| Tool | Default integration | Strength |
|---|---|---|
| Helicone | Proxy (URL swap) | Fastest to ship, gateway features |
| Langfuse | SDK + decorators | Prompt management, engineering workflow |
| Phoenix | OTel SDK | Offline evaluation, retrieval debugging, ML |
References
- Helicone Documentation · Helicone
- Helicone on GitHub · GitHub
- Self-Hosting Helicone · Helicone