Monitoring GenAI Application in Production

Machine learning projects don't conclude with their initial launch. Ongoing monitoring and incremental enhancements are critical for long-term success. MLflow Tracing offers observability for your production application, supporting the iterative process of continuous improvement.

Managed Monitoring for GenAI Application [Recommended]

If you are looking for a managed solution for monitoring your GenAI application with complete observability by MLflow Tracing, we recommend using Lakehouse Monitoring for GenAI on Databricks.

info

Don't have a Databricks account? Sign up for free and get started in a minute!

Monitoring Hero

This solution provides an instant access to a fully functional monitoring system and dashboard for your GenAI application, which includes:

Track operational metrics like request volume, latency, errors, and cost.
Monitor quality metrics such as correctness, safety, context sufficiency, and more using managed evaluation.
Configure custom metrics with Python function.
Root cause analysis by looking at the recorded traces from MLflow Tracing.

Lakehouse Monitoring for GenAI can be used for your GenAI application, regardless of whether it is hosted on Databricks or not. You can run the application hosted on any cloud or on-premise, and configure MLflow Tracing to send traces to Databricks to monitor the application.

For more details about the product and how to set it up, please refer to the Lakehouse Monitoring for GenAI documentation.

OpenTelemetry Integration

Traces generated by MLflow are compatible with the OpenTelemetry trace specs. Therefore, MLflow traces can be exported to various observability platforms that support OpenTelemetry.

By default, MLflow exports traces to the MLflow Tracking Server. To enable exporting traces to an OpenTelemetry Collector, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable (or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) to the target URL of the OpenTelemetry Collector before starting any trace.

pip install opentelemetry-exporter-otlp

import mlflow
import os

# Set the endpoint of the OpenTelemetry Collector
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
# Optionally, set the service name to group traces
os.environ["OTEL_SERVICE_NAME"] = "<your-service-name>"

# Trace will be exported to the OTel collector at http://localhost:4317/v1/traces
with mlflow.start_span(name="foo") as span:
    span.set_inputs({"a": 1})
    span.set_outputs({"b": 2})

Click on the following icons to learn more about how to set up OpenTelemetry Collector for your specific observability platform.

Configurations

MLflow uses the standard OTLP Exporter for exporting traces to OpenTelemetry Collector instances. Thereby, you can use all of the configurations supported by OpenTelemetry. The following example configures the OTLP Exporter to use HTTP protocol instead of the default gRPC and sets custom headers:

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317/v1/traces"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_TRACES_HEADERS="api_key=12345"

warning

MLflow only exports traces to a single destination. When the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is configured, MLflow will not export traces to the MLflow Tracking Server and you will not see traces in the MLflow UI.

Similarly, if you deploy the model to the Databricks Model Serving with tracing enabled, using the OpenTelemetry Collector will result in traces not being recorded in the Inference Table.

Self-host Tracking Server

You can keep using the MLflow tracking server to store production traces. However, tracking server is optimized for offline experience and generally not suitable for handling the hyper scale traffic. Thereby, we recommend using the other two options for production monitoring use case.

If you choose to keep using the tracking server in production, we strongly recommend using SQL-based tracking server on top of a scalable database and artifact storage, as it will be a key factor for write and query performance. Refer to the tracking server setup guide for more details. In addition, tracking server by default uses infinite retention date for trace data, hence it is recommended to set up periodic deletion job using the SDK or REST API.

Managed Monitoring for GenAI Application [Recommended]​

OpenTelemetry Integration​

Configurations​

Self-host Tracking Server​

Managed Monitoring for GenAI Application [Recommended]

OpenTelemetry Integration

Configurations

Self-host Tracking Server