Skip to main content

Tracing Helicone

Helicone AI Gateway is an open-source LLM gateway that provides unified access to 100+ AI models through an OpenAI-compatible API. It offers built-in caching, rate limiting, automatic failover, and comprehensive analytics with minimal latency overhead.
Helicone AI Gateway Tracing

Since Helicone AI Gateway exposes an OpenAI-compatible API, you can use MLflow's OpenAI autolog integration to automatically trace all your LLM calls through the gateway.

Getting Started

Prerequisites
Before following the steps below, you need to set up Helicone AI Gateway server.
  1. Set up your .env file with your LLM provider API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY).
  2. Run the gateway locally with npx @helicone/ai-gateway@latest.
See the Helicone AI Gateway docs for more details.
1

Install Dependencies

bash
pip install mlflow openai
2

Start MLflow Server

If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.

bash
mlflow server
3

Enable Tracing and Make API Calls

Enable tracing with mlflow.openai.autolog() and configure the OpenAI client to use Helicone AI Gateway's base URL.

python
import mlflow
from openai import OpenAI

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Set tracking URI and experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Helicone")

# Create OpenAI client pointing to Helicone AI Gateway
client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key",
)

# Make API calls - traces will be captured automatically
response = client.chat.completions.create(
model="anthropic/claude-4-5-sonnet",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
)
print(response.choices[0].message.content)
4

View Traces in MLflow UI

Open the MLflow UI at http://localhost:5000 to see the traces from your Helicone AI Gateway API calls.

Combining with Manual Tracing

You can combine auto-tracing with MLflow's manual tracing to create comprehensive traces that include your application logic:

python
import mlflow
from mlflow.entities import SpanType
from openai import OpenAI

mlflow.openai.autolog()

client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key",
)


@mlflow.trace(span_type=SpanType.CHAIN)
def ask_question(question: str) -> str:
"""A traced function that calls the LLM through Helicone AI Gateway."""
response = client.chat.completions.create(
model="anthropic/claude-4-5-sonnet", messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content


# The entire function call and nested LLM call will be traced
answer = ask_question("What is machine learning?")
print(answer)

Streaming Support

MLflow supports tracing streaming responses from Helicone AI Gateway:

python
import mlflow
from openai import OpenAI

mlflow.openai.autolog()

client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key",
)

stream = client.chat.completions.create(
model="anthropic/claude-4-5-sonnet",
messages=[{"role": "user", "content": "Write a haiku about machine learning."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

MLflow will automatically capture the complete streamed response in the trace.

Next Steps