Tracing Arize / Phoenix
Arize Phoenix is an open-source AI observability platform. Its OpenInference instrumentation libraries produce OpenTelemetry spans with rich AI-specific attributes for model providers like OpenAI, Anthropic, and others.
Because OpenInference is built on OpenTelemetry, you can use mlflow.otel.autolog() to mirror every instrumented call to the MLflow backend. The MLflow server automatically translates OpenInference attributes to MLflow span types, inputs, outputs, token usage, and model name. Phoenix tracing is completely unaffected; this is purely additive.
import mlflow
mlflow.otel.autolog()
MLflow automatically captures the following information from OpenInference spans:
- Span inputs and outputs
- Latencies
- Span name
- Span type mapped from OpenInference span kind (e.g.
LLM,RETRIEVER,TOOL) - Token usage (prompt, completion, total)
- Model name
- Parent-child span nesting
- Any exception if raised
Getting Started
Install Dependencies
Install MLflow and the OpenInference instrumentor for your model provider. This example uses OpenAI:
pip install mlflow openai openinference-instrumentation-openai opentelemetry-sdk
Other instrumentors are available for LangChain, LlamaIndex, Anthropic, and more. See the OpenInference repository for the full list.
Start MLflow Server
- Local (pip)
- Local (docker)
If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.
mlflow server
MLflow also provides a Docker Compose file to start a local MLflow server with a postgres database and a minio server.
git clone --depth 1 --filter=blob:none --sparse https://github.com/mlflow/mlflow.git
cd mlflow
git sparse-checkout set docker-compose
cd docker-compose
cp .env.dev.example .env
docker compose up -d
Refer to the instruction for more details, e.g., overriding the default environment variables.
Enable Tracing and Run Your Application
import mlflow
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor
# Register the MLflow span processor on the global OTEL TracerProvider
mlflow.otel.autolog()
# Auto-instrument all OpenAI SDK calls
OpenAIInstrumentor().instrument()
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Phoenix")
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is MLflow?"}],
max_tokens=256,
)
print(response.choices[0].message.content)
mlflow.otel.autolog() can be called before or after the OpenInference instrumentor. Both perform the same ProxyTracerProvider to SdkTracerProvider replacement and reuse an existing provider if one is already set, so initialization order does not matter.
View Traces in MLflow UI
Browse to the MLflow UI at http://localhost:5000 (or your MLflow server URL) and you should see the traces for your OpenInference-instrumented application. Each LLM call will appear with proper inputs, outputs, span types, token usage, and model name.
Dual Tracing with Phoenix
You can send spans to both MLflow and Phoenix simultaneously by adding an OTLP exporter to the shared TracerProvider:
import mlflow
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
# Register MLflow span processor
mlflow.otel.autolog()
# Also send spans to Phoenix (running on port 6006)
provider = trace.get_tracer_provider()
provider.add_span_processor(
SimpleSpanProcessor(OTLPSpanExporter("http://localhost:6006/v1/traces"))
)
# Auto-instrument OpenAI
OpenAIInstrumentor().instrument()
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
This works because both MLflow and Phoenix register as span processors on the same OpenTelemetry TracerProvider. Every span is dispatched to all registered processors.
Span Type Mapping
All OpenInference span kinds (LLM, EMBEDDING, TOOL, RETRIEVER, AGENT, CHAIN, RERANKER, GUARDRAIL, EVALUATOR) map 1-to-1 to the corresponding MLflow span type of the same name.
Export Mode
By default, mlflow.otel.autolog() uses batched export (BatchSpanProcessor), which buffers spans and flushes them in the background. This follows the MLFLOW_ENABLE_ASYNC_TRACE_LOGGING environment variable (default True).
For debugging or low-throughput use cases, you can switch to synchronous export so that each trace is sent immediately:
mlflow.otel.autolog(batch=False)
Or explicitly enable batched export:
mlflow.otel.autolog(batch=True)
Disable auto-tracing
Auto tracing can be disabled globally by calling mlflow.otel.autolog(disable=True).
mlflow.otel.autolog(disable=True)
After disabling, new OpenInference spans will no longer be forwarded to MLflow.