Introducing MLflow Agents Dashboard
We're excited to introduce the Agent Dashboard, a new Overview tab in MLflow GenAI experiments that gives you instant comprehensive visibility into your agent's health.
Challenges
Building GenAI applications is hard, but keeping them running well in production is harder. Teams often struggle to answer basic questions:
- Why is my application slow today?
- Are my outputs maintaining quality over time?
- Which tools in my agent are causing failures?
- How much am I spending on tokens?
Previously, answering these questions required digging through individual traces or building custom dashboards. The Overview tab changes that.
One Dashboard, Complete Visibility
The Overview tab automatically aggregates metrics from all traces logged to an experiment. It consolidates these insights into three focused views:
- Usage: Track requests, latency, errors, and token consumption over time
- Quality: Monitor agent quality based on your MLflow scorers
- Tool Calls: Analyze agent tool performance, success rates, and error patterns
Each view includes time range and granularity controls, making it easy to zoom in on issues or spot long-term trends.
Get Started
The Overview tab is available since MLflow 3.9.0. Install the latest version:
# Using uv
uv pip install mlflow>=3.9.0
# Using pip
pip install mlflow>=3.9.0
Then enable tracing in your application. For example, with OpenAI autologging:
import mlflow
mlflow.openai.autolog()
MLflow will automatically trace all OpenAI calls and populate the dashboard with your metrics. MLflow supports 40+ integrations including LangChain, LiteLLM, Claude Code, and more.
For details on each tab and available metrics, check out the MLflow Tracing UI documentation.
We'd love to hear your feedback as you explore this new feature! If you find MLflow useful, please give us a ⭐ on GitHub.
