Skip to main content

Tracing LiteLLM Proxy

LiteLLM Proxy is a self-hosted LLM gateway that provides a unified OpenAI-compatible API to access 100+ LLM providers. It offers features like load balancing, spend tracking, and rate limiting across multiple providers.

LiteLLM Proxy Tracing
Looking for LiteLLM SDK?

This guide covers the LiteLLM Proxy Server. If you're using the LiteLLM Python SDK directly in your application, see the LiteLLM SDK Integration guide instead.

Integration Options

There are two ways to trace LLM calls through LiteLLM Proxy with MLflow:

ApproachDescriptionBest For
Server-side CallbackConfigure MLflow as a callback in LiteLLM Proxy configCentralized tracing for all requests through the proxy
Client-side TracingUse OpenAI SDK with MLflow autologCombining LLM traces with your agent or application traces (how to →)
note

With server-side tracing, all requests through the proxy are captured in a single MLflow experiment, regardless of which client or application made them. For application-specific tracing, consider using client-side tracing where each application manages its own traces.

This approach configures LiteLLM Proxy to send traces directly to MLflow, capturing all LLM calls across all clients using the proxy.

1

Install LiteLLM with MLflow Support

bash
pip install 'litellm[mlflow]'
2

Start MLflow Server

If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.

bash
mlflow server
3

Configure LiteLLM Proxy

Add MLflow as a callback in your LiteLLM Proxy configuration file:

litellm_config.yaml
yaml
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
success_callback: ["mlflow"]
failure_callback: ["mlflow"]
4

Set Environment Variables

Configure the MLflow tracking URI before starting the proxy:

bash
# Required: Point to your MLflow server
export MLFLOW_TRACKING_URI="http://localhost:5000"

# Optional: Set the experiment name
export MLFLOW_EXPERIMENT_NAME="LiteLLM Proxy"
5

Start LiteLLM Proxy

bash
litellm --config litellm_config.yaml
6

Make API Calls

Make requests to the proxy using any OpenAI-compatible client:

bash
curl -X POST "http://localhost:4000/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'

Or use the OpenAI Python SDK:

python
from openai import OpenAI

# Point to your LiteLLM Proxy
client = OpenAI(
base_url="http://localhost:4000/v1", api_key="sk-1234" # Your LiteLLM Proxy API key
)

response = client.chat.completions.create(
model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
7

View Traces in MLflow UI

Open the MLflow UI at http://localhost:5000 to see the traces from your LiteLLM Proxy calls.

Option 2: Client-side Tracing

If you don't have access to configure the LiteLLM Proxy server, you can trace calls on the client side using the OpenAI SDK with MLflow autolog. Since LiteLLM Proxy exposes an OpenAI-compatible API, this works seamlessly.

1

Install Dependencies

bash
pip install 'mlflow[genai]' openai
2

Start MLflow Server

If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.

bash
mlflow server
3

Enable Tracing and Make API Calls

Enable tracing with mlflow.openai.autolog() and configure the OpenAI client to use LiteLLM Proxy's base URL.

python
import mlflow
from openai import OpenAI

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Set tracking URI and experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("LiteLLM Proxy")

# Point OpenAI client to LiteLLM Proxy
client = OpenAI(
base_url="http://localhost:4000/v1", # LiteLLM Proxy URL
api_key="sk-1234", # Your LiteLLM Proxy API key
)

# Make API calls as usual - traces will be captured automatically
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
)
print(response.choices[0].message.content)
4

View Traces in MLflow UI

Open the MLflow UI at http://localhost:5000 to see the traces.

note

With client-side tracing, you see traces from your application's perspective. Server-side callback tracing provides a complete view of all proxy activity including requests from other clients.

Combining with Manual Tracing

You can combine auto-tracing with MLflow's manual tracing to create comprehensive traces that include your application logic:

python
import mlflow
from mlflow.entities import SpanType
from openai import OpenAI

mlflow.openai.autolog()

client = OpenAI(base_url="http://localhost:4000/v1", api_key="sk-1234")


@mlflow.trace(span_type=SpanType.CHAIN)
def ask_question(question: str) -> str:
"""A traced function that calls the LLM through LiteLLM Proxy."""
response = client.chat.completions.create(
model="gpt-5", messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content


# The entire function call and nested LLM call will be traced
answer = ask_question("What is machine learning?")
print(answer)

Next Steps