Tracing FAQ

Getting Started and Basic Usage

Q: How do I start using MLflow Tracing?

The easiest way to start is with automatic tracing for supported libraries:

python
import mlflow
import openai

# Enable automatic tracing for OpenAI
mlflow.openai.autolog()

# Your existing code now generates traces automatically
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello!"}]
)

For custom code, use the @mlflow.trace decorator:

python
@mlflow.trace
def my_function(input_data):
    # Your logic here
    return "processed result"

Q: Which libraries does MLflow Tracing support automatically?

MLflow provides automatic tracing (autolog) for 20+ popular libraries. See the complete list at Automatic Tracing Integrations.

User Interface and Jupyter Integration

Q: Can I view traces directly in Jupyter notebooks?

Yes! Jupyter integration is available in MLflow 2.20 and above. The trace UI automatically displays within notebooks when:

Cell code generates a trace
You call mlflow.search_traces()
You display a trace object

python
import mlflow

# Set tracking URI to your MLflow server
mlflow.set_tracking_uri("http://localhost:5000")


@mlflow.trace
def my_function():
    return "Hello World"


# Trace UI will appear automatically in the notebook
my_function()

To control the display:

python
# Disable notebook display
mlflow.tracing.disable_notebook_display()

# Enable notebook display
mlflow.tracing.enable_notebook_display()

Q: How can I customize the request and response previews in the UI?

You can customize what appears in the Request and Response columns of the trace list using mlflow.update_current_trace():

python
@mlflow.trace
def predict(messages: list[dict]) -> str:
    # Customize the request preview for long message histories
    custom_preview = f'{messages[0]["content"][:10]} ... {messages[-1]["content"][:10]}'
    mlflow.update_current_trace(request_preview=custom_preview)

    # Your model logic here
    result = process_messages(messages)

    # Customize response preview
    mlflow.update_current_trace(response_preview=f"Result: {result[:50]}...")
    return result

Production and Performance

Q: Can I use MLflow Tracing for production applications?

Yes, MLflow Tracing is stable and designed to be used in production environments.

When using MLflow Tracing in production environments, we recommend using the MLflow Tracing SDK (mlflow-tracing) to instrument your code/models/agents with a minimal set of dependencies and a smaller installation footprint. The SDK is designed to be a perfect fit for production environments where you want an efficient and lightweight tracing solution. Please refer to the Production Monitoring section for more details.

Q: How do I enable asynchronous trace logging?

Asynchronous logging can significantly reduce performance overhead (about 80% for typical workloads):

python
import mlflow

# Enable async logging
mlflow.config.enable_async_logging()

# Traces will be logged asynchronously
with mlflow.start_span(name="foo") as span:
    span.set_inputs({"a": 1})
    span.set_outputs({"b": 2})

# Manually flush if needed
mlflow.flush_trace_async_logging()

Configuration options:

You can configure the detailed behavior of asynchronous logging using the following environment variables:

Environment Variable	Description	Default
`MLFLOW_ASYNC_TRACE_LOGGING_MAX_WORKERS`	Maximum worker threads	`10`
`MLFLOW_ASYNC_TRACE_LOGGING_MAX_QUEUE_SIZE`	Maximum queued traces	`1000`
`MLFLOW_ASYNC_TRACE_LOGGING_RETRY_TIMEOUT`	Retry timeout in seconds	`500`

Q: How to optimize the trace size in production?

MLflow's Automatic Tracing integration captures rich information that are helpful for debugging and evaluating the model/agent. However, this comes at the cost of trace size. For example, you may not want to log the all retrieved document texts from your RAG application.

MLflow supports plugging-in custom post-processing hooks applied to trace data before exporting to the backend. This allows you to reduce the trace size by removing unnecessary data, or applying security guardrails such as masking sensitive data.

To register a custom hooks, use the mlflow.tracing.configure API. For example, the following code filters out the document contents from the retriever span output to reduce the trace size:

python
import mlflow
from mlflow.entities.span import Span, SpanType


# Define a custom hook that takes a span as input and mutates it in-place.
def filter_retrieval_output(span: Span):
    """Filter out the document contents from the retriever span output and only keep the document ids."""
    if span.span_type == SpanType.RETRIEVAL:
        documents = span.outputs.get("documents")
        document_ids = [doc.id for doc in documents]
        span.set_outputs({"document_ids": document_ids})


# Register the hook
mlflow.tracing.configure(span_processors=[filter_retrieval_output])

# Any traces created after the configuration will be filtered by the hook.
...

Refer to the Redacting Sensitive Data Safe guide for more details about the hook API and examples.

Q. Can I log traces to different experiments from a single application?

Yes, you can log traces to different experiments from a single application. By default, MLflow logs traces to the current active experiment set via either the mlflow.set_experiment API or the MLFLOW_EXPERIMENT_ID environment variable.

However, you may sometimes want to dynamically route traces to different experiments from a single application. For example, your application server may expose two endpoints, each serves a different model. Switching active experiment does not work because active experiment is globally defined and not isolated per thread or async context.

Therefore, MLflow provides two different ways to switch target experiments to log traces:

Optional 1. Set the `trace_destination` parameter when starting a manual trace

The trace_destination parameter was introduced to the @mlflow.trace decorator and the mlflow.start_span API in MLflow 3.3 to allow you to specify the target experiment for each trace explicitly.

python
import mlflow
from mlflow.entities.trace_location import MlflowExperimentLocation


@mlflow.trace(trace_destination=MlflowExperimentLocation(experiment_id="1234"))
def math_agent(request: Request):
    # Your model logic here
    ...

Note that the trace_destination parameter is only effective when it is set to the root span of the trace. If it is set to a child span, MLflow will ignore it and print a warning.

Option 2. Use `mlflow.tracing.set_destination` with `context_local=True`

The mlflow.tracing.set_destination() API is a purpose-built API for setting the destination of traces, while bypassing the overhead of mlflow.set_experiment. The context_local parameter allows you to set the destination per async task or thread, providing isolation in concurrent applications. This option is useful when you use automatic tracing and not using the manual tracing APIs.

python
import mlflow
from mlflow.entities.trace_location import MlflowExperimentLocation


@app.get("/math-agent")
def math_agent(request: Request):
    # The API is super low-overhead, so you can call it inside the request handler.
    mlflow.tracing.set_destination(MlflowExperimentLocation(experiment_id="1234"))

    # Your model logic here
    with mlflow.start_span(name="math-agent") as span:
        ...


@app.get("/chat-agent")
def chat_agent(request: Request):
    mlflow.tracing.set_destination(MlflowExperimentLocation(experiment_id="5678"))

    # Your model logic here
    with mlflow.start_span(name="chat-agent") as span:
        ...

Troubleshooting

Q: I cannot open my trace in the MLflow UI. What should I do?

There are multiple possible reasons why a trace may not be viewable in the MLflow UI:

The trace is not completed yet: If the trace is still being collected, MLflow cannot display spans in the UI. Ensure that all spans are properly ended with either "OK" or "ERROR" status.
The browser cache is outdated: When you upgrade MLflow to a new version, the browser cache may contain outdated data and prevent the UI from displaying traces correctly. Clear your browser cache (Shift+F5) and refresh the page.
MLflow server connectivity: Ensure your MLflow tracking server is running and accessible:
bash
```
mlflow ui --host 0.0.0.0 --port 5000
```
Experiment permissions: Verify you have access to the experiment containing the trace.

Q: The model execution gets stuck and my trace is "in progress" forever.

Sometimes a model or an agent gets stuck in a long-running operation or an infinite loop, causing the trace to be stuck in the "in progress" state.

To prevent this, you can set a timeout for the trace using the MLFLOW_TRACE_TIMEOUT_SECONDS environment variable. If the trace exceeds the timeout, MLflow will automatically halt the trace with ERROR status and export it to the backend, so that you can analyze the spans to identify the issue. By default, the timeout is not set.

note

The timeout only applies to MLflow trace. The main program, model, or agent, will continue to run even if the trace is halted.

For example, the following code sets the timeout to 5 seconds and simulates how MLflow handles a long-running operation:

python
import mlflow
import os
import time

# Set the timeout to 5 seconds for demonstration purposes
os.environ["MLFLOW_TRACE_TIMEOUT_SECONDS"] = "5"


# Simulate a long-running operation
@mlflow.trace
def long_running():
    for _ in range(10):
        child()


@mlflow.trace
def child():
    time.sleep(1)


long_running()

note

MLflow monitors the trace execution time and expiration in a background thread. By default, this check is performed every second and resource consumption is negligible. If you want to adjust the interval, you can set the MLFLOW_TRACE_TIMEOUT_CHECK_INTERVAL_SECONDS environment variable.

Q: My traces are not appearing in the MLflow UI. What could be wrong?

Several issues could prevent traces from appearing:

Tracking URI not set: Ensure your tracking URI is configured:

python
import mlflow

mlflow.set_tracking_uri("http://localhost:5000")  # or your server URL

Experiment not set: Make sure you're logging to the correct experiment:

python
mlflow.set_experiment("my-tracing-experiment")

Autolog not called: For supported libraries, ensure autolog is called before usage:

python
mlflow.openai.autolog()  # Call before using OpenAI

Multi-threading and Concurrency

Q: My trace is split into multiple traces when doing multi-threading. How can I combine them into a single trace?

As MLflow Tracing depends on Python ContextVar, each thread has its own trace context by default, but it is possible to generate a single trace for multi-threaded applications with a few additional steps.

Here's a quick example:

python
import contextvars
import mlflow
from concurrent.futures import ThreadPoolExecutor


@mlflow.trace
def worker_function(data):
    # Worker logic here
    return process_data(data)


@mlflow.trace
def main_function(data_list):
    with ThreadPoolExecutor() as executor:
        futures = []
        for data in data_list:
            # Copy context to worker thread
            ctx = contextvars.copy_context()
            futures.append(executor.submit(ctx.run, worker_function, data))

        results = [future.result() for future in futures]
    return results

Q: Does MLflow Tracing work with async/await code?

Yes, MLflow Tracing supports async functions. The @mlflow.trace decorator works seamlessly with async functions:

python
import asyncio
import mlflow


@mlflow.trace
async def async_function(query: str):
    # Async operations are traced normally
    result = await some_async_operation(query)
    return result


# Usage
asyncio.run(async_function("test query"))

Configuration and Control

Q: How do I temporarily disable tracing?

To disable tracing, mlflow.tracing.disable() API will cease the collection of trace data from within MLflow and will not log any data to the MLflow Tracking service regarding traces.

To enable tracing (if it had been temporarily disabled), mlflow.tracing.enable() API will re-enable tracing functionality for instrumented models that are invoked.

python
import mlflow

# Disable tracing
mlflow.tracing.disable()


# Your traced functions won't generate traces
@mlflow.trace
def my_function():
    return "No trace generated"


my_function()

# Re-enable tracing
mlflow.tracing.enable()

# Now traces will be generated again
my_function()  # This will generate a trace

Q: Can I enable/disable tracing for my application without modifying code?

Yes, you can use environment variables and global configuration:

Environment variable: Set MLFLOW_TRACING_ENABLED=false to disable all tracing:

bash
export MLFLOW_TRACING_ENABLED=false
python your_app.py  # No traces will be generated

Conditional tracing: Use programmatic control:

python
import mlflow
import os

# Only trace in development
if os.getenv("ENVIRONMENT") == "development":
    mlflow.openai.autolog()

MLflow Runs Integration

Q: How do I associate traces with MLflow runs?

If a trace is generated within a run context, it will automatically be associated with that run:

python
import mlflow

# Create and activate an experiment
mlflow.set_experiment("Run Associated Tracing")

# Start a new MLflow Run
with mlflow.start_run() as run:
    # Traces created here are associated with the run
    with mlflow.start_span(name="Run Span") as parent_span:
        parent_span.set_inputs({"input": "a"})
        parent_span.set_outputs({"response": "b"})

You can then retrieve traces for a specific run:

python
# Retrieve traces associated with a specific Run
traces = mlflow.search_traces(run_id=run.info.run_id)
print(traces)

Data Management

Q: How do I delete traces?

You can delete traces using the mlflow.client.MlflowClient.delete_traces() method:

python
from mlflow.client import MlflowClient
import time

client = MlflowClient()

# Get the current timestamp in milliseconds
current_time = int(time.time() * 1000)

# Delete traces older than a specific timestamp
deleted_count = client.delete_traces(
    experiment_id="1", max_timestamp_millis=current_time, max_traces=10
)

tip

Deleting a trace is an irreversible process. Ensure that the settings provided within the delete_traces API meet the intended range for deletion.

Q: Where are my traces stored?

Traces are stored in your MLflow tracking backend:

Local filesystem: When using mlflow ui locally, traces are stored in the mlruns directory

Remote tracking server: When using a remote MLflow server, traces are stored in the configured backend (database + artifact store)

Database: Trace metadata is stored in the MLflow tracking database

Artifact store: Large trace data may be stored in the artifact store (filesystem, S3, etc.)

Integration and Compatibility

Q: Is MLflow Tracing compatible with other observability tools?

Yes, MLflow Tracing is built on OpenTelemetry standards and can integrate with other observability tools:

OpenTelemetry export: Export traces to OTLP-compatible systems

Custom exporters: Build custom integrations for your observability stack

Standard formats: Use industry-standard trace formats for interoperability

For production monitoring, see Production Tracing for integration patterns.

Q: Can I create custom manual traces and spans?

Yes, MLflow provides comprehensive manual tracing capabilities. Please refer to the Manual Tracing guide for detailed information on creating traces and spans manually using decorators, context managers, and low-level APIs.

Getting Help

Q: Where can I find more help or report issues?

Documentation: Start with the MLflow Tracing documentation

GitHub Issues: Report bugs or request features at MLflow GitHub

Community: Join discussions in the MLflow Slack community

Stack Overflow: Search or ask questions tagged with mlflow

Databricks Support: For managed MLflow features, contact Databricks support

For additional questions or issues not covered here, please check the MLflow documentation or reach out to the community.

Getting Started and Basic Usage​

Q: How do I start using MLflow Tracing?​

Q: Which libraries does MLflow Tracing support automatically?​

User Interface and Jupyter Integration​

Q: Can I view traces directly in Jupyter notebooks?​

Q: How can I customize the request and response previews in the UI?​

Production and Performance​

Q: Can I use MLflow Tracing for production applications?​

Q: How do I enable asynchronous trace logging?​

Q: How to optimize the trace size in production?​

Q. Can I log traces to different experiments from a single application?​

Optional 1. Set the trace_destination parameter when starting a manual trace​

Option 2. Use mlflow.tracing.set_destination with context_local=True​

Troubleshooting​

Q: I cannot open my trace in the MLflow UI. What should I do?​

Q: The model execution gets stuck and my trace is "in progress" forever.​

Q: My traces are not appearing in the MLflow UI. What could be wrong?​

Multi-threading and Concurrency​

Q: My trace is split into multiple traces when doing multi-threading. How can I combine them into a single trace?​

Q: Does MLflow Tracing work with async/await code?​

Configuration and Control​

Q: How do I temporarily disable tracing?​

Q: Can I enable/disable tracing for my application without modifying code?​

MLflow Runs Integration​

Q: How do I associate traces with MLflow runs?​

Data Management​

Q: How do I delete traces?​

Q: Where are my traces stored?​

Integration and Compatibility​

Q: Is MLflow Tracing compatible with other observability tools?​

Q: Can I create custom manual traces and spans?​

Getting Help​

Q: Where can I find more help or report issues?​