Skip to main content

MLflow 3.13.0 Highlights: Role-Based Access Control, Trace Archival, Coding Agents, and Hermes Agent Support

· 6 min read
MLflow maintainers
MLflow maintainers

MLflow 3.13.0 is a major release for running AI observability at scale, focused on access control, the lifecycle of your trace data, and richer support for agents. Highlights include a full Role-Based Access Control system with a new Admin UI, automatic trace archival, one-click onboarding for coding agents, new engines for MLflow Assistant, span log levels, an official Helm chart for Kubernetes, and Hermes Agent support.

1. Role-Based Access Control and Admin UI

Sharing a self-hosted MLflow server across a team used to mean granting permissions one resource at a time, with no central place to manage them. A new Role-Based Access Control (RBAC) system replaces that: define roles as reusable bundles of permissions, assign them to users, and let workspace-level grants express both membership and admin authority. A user's effective access is the union of their roles, and experiments, models, prompts, scorers, and AI Gateway endpoints are all covered.

A new web Admin UI makes this manageable without touching REST endpoints, with a self-service /account page for viewing your roles and changing your password, and Platform Admin pages for managing users, roles, and grants. Just start mlflow server with authentication enabled.

Learn more about Role-Based Access Control

2. Trace Retention and Auto Archival

On a long-running tracking server, trace span data piles up in your SQL backend and eventually slows it down. Trace archival keeps it in check: a background pass automatically moves traces older than your retention window out of SQL and into cheap object storage such as S3, while keeping every trace fully readable in the UI and through the APIs. Retention is policy-driven, resolving from server to workspace to experiment.

Enable it by pointing the server at a YAML config:

trace_archival:
enabled: true
location: s3://my-bucket/trace-archive
retention: 7d
interval_seconds: 60

Workspace managers and experiment owners can then tighten retention from the UI or CLI, and the effective cutoff shows up as an "Archive after" badge on each trace.

Learn more about Trace Archival

3. One-Click Observability and Governance for Coding Agents

Putting a coding agent like Claude Code, OpenAI Codex, or Gemini CLI under observability and governance used to require wiring up a gateway endpoint by hand. The AI Gateway QuickStart now does it in one click: pick your agent and MLflow provisions a pre-configured endpoint (no API key needed, since the agent brings its own credentials) and hands you a ready-to-paste starter snippet. From then on, every request the agent makes is captured as a trace and subject to usage tracking, budgets, and guardrails.

export ANTHROPIC_BASE_URL="http://localhost:5000/gateway/proxy/my-claude-endpoint" && claude

Learn more about coding agents in the AI Gateway

4. New Engines for MLflow Assistant

MLflow Assistant launched in 3.9.0 tied to Claude Code. You can now choose the engine that powers it: run the Assistant on a local Ollama model, the OpenAI Codex CLI, or any MLflow AI Gateway endpoint, all selectable from the setup wizard. That means you can keep everything on your own machine with Ollama, or reuse a provider you already route through the Gateway.

Thanks to community contributor @SuperSonnix71 for contributing the Ollama and OpenAI Codex engines!

Learn more about MLflow Assistant

5. Helm Chart for Kubernetes Deployment

Deploying MLflow on Kubernetes used to mean writing and maintaining your own manifests. An official, production-ready Helm chart now does it for you, with TLS, persistent storage, Ingress, Prometheus metrics, a restrictive NetworkPolicy, RBAC, and optional mlflow gc garbage collection built in. Download the chart and install it (requires Kubernetes 1.23+ and Helm 3.8+):

helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
--set storage.enabled=true \
--set mlflow.backendStoreUri="sqlite:////mlflow/mlflow.db" \
--set mlflow.artifactsDestination="/mlflow/artifacts"

Learn more about the Kubernetes Helm deployment

6. Hermes Agent Support

Agent observability now reaches beyond coding agents to long-running autonomous runtimes. Hermes Agent from Nous Research integrates on two surfaces: route its model calls through the AI Gateway for centralized usage tracking, budgets, and guardrails, and capture full end-to-end traces, including LLM calls, tool invocations, and long-running sessions, through MLflow Tracing over OpenTelemetry.

To route Hermes through the Gateway, create an endpoint and run hermes setup model; for tracing, install the community hermes-otel plugin to export OTLP traces to your MLflow server.

Learn more about Hermes Agent and the AI Gateway

7. Log Levels for Trace Spans

A busy trace can bury the spans you care about under chain plumbing and parser calls. Spans now carry Python-logging-style severity levels (DEBUG through CRITICAL), assigned automatically from the span type, so LLM, tool, and retriever calls surface as INFO while internal steps stay DEBUG, and any span that raises is promoted to ERROR. In the trace explorer, a new Minimum log level slider hides everything below the threshold, with no code changes required. You can also set a level explicitly:

import mlflow

with mlflow.start_span("plumbing", log_level="DEBUG") as span:
...

Thanks to community contributor @rrtheonlyone for contributing this feature!

Learn more about span log levels

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Upgrade to try these new features:

pip install mlflow==3.13.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.12.0

· 3 min read
MLflow maintainers
MLflow maintainers

MLflow 3.12.0 is a release focused on improving our LLM observability workflows, making tracing more accessible, feature-rich, and performant.

🖼️ Multimodal Tracing

Trace with imageTrace with audio

Users can now store multimodal content in tracing spans as artifact attachments instead of inline binary data. We've also patched the UI to support the new mlflow-attachment:// style URI, with rich rendering available for PDFs, audio, and images.

This feature works out of the box with autologging, but manual attachment management is also possible. Visit the documentation page to learn more.

🤖 Codex, Gemini, Qwen coding agent tracing support

Codex Tracing

Similar to our Claude Code tracing integration, we've now added support for the Codex, Gemini, and Qwen coding agent platforms as well! For intructions on how to get started, check out the doc pages at:

🛡️ Gateway guardrails

You can now set guardrails on your gateway endpoints to prevent unsafe or non-compliant model inputs and outputs. Try it out in the MLflow UI, and visit the documentation page to learn more!

⚡ Trace table pagination

The traces tab is now paginated, rather than fetching all traces up to a limit of 1000. This improves initial load time, and makes the page feel more responsive overall.

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.12.0 to try these new features:

pip install mlflow==3.12.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.11.1 Highlights: Automatic Issue Detection, Gateway Budget Management, and Pickle-Free Models!

· 5 min read
MLflow maintainers
MLflow maintainers

MLflow 3.11.1 is a major release that significantly advances MLflow's AI Observability, security, and governance capabilities. This release brings automated quality issue detection for agents, fine-grained spending controls for AI Gateway, interactive trace graph visualization, native OpenTelemetry GenAI semantic convention support, and safer pickle-free model serialization — alongside broad improvements to tracing integrations, evaluation pipelines, and the MLflow UI.

1. Automatic Issue Identification

Automatically surface quality problems in your agent without manual inspection! Use the new Detect Issues button in the traces table to analyze selected traces with AI and identify potential problems across categories like correctness, safety, and performance. Detected issues are linked directly to the relevant traces, making it easy to investigate root causes and debug your agent at scale.

Learn more about automatic issue detection

2. Gateway Budget Alerts & Limits

Gateway Budget Alerts & Limits

Take control of your AI Gateway spending with configurable budget policies. Set spending limits by time window (daily, weekly, or monthly), receive alerts before hitting limits, and block runaway costs automatically when thresholds are exceeded. The new budget management UI lets you track current spending, configure webhook notifications, and monitor violations across all gateway endpoints — all without writing any code.

Learn more about Gateway budget alerts and limits

3. Trace Graph View

Trace Graph View

Navigate complex agent interactions with a new interactive graph view for traces. Visualize multi-level trace hierarchies, understand parent-child span relationships at a glance, and debug intricate multi-agent systems more effectively with a visual representation of your trace topology.

Learn more about the trace graph view

4. Native OpenTelemetry GenAI Convention Support

Native OpenTelemetry GenAI Convention Support

MLflow now natively supports the OpenTelemetry GenAI Semantic Conventions for trace export. When exporting traces via OTLP with MLFLOW_ENABLE_OTEL_GENAI_SEMCONV enabled, MLflow automatically translates spans to follow the OTel GenAI semantic conventions — enabling seamless integration with OTel-compatible observability platforms while preserving all GenAI-specific metadata.

Learn more about OTel GenAI semantic convention support

5. OpenCode Tracing Integration

Debug smarter with the new OpenCode CLI tracing integration. OpenCode is an open-source, terminal-based AI coding assistant. Track and analyze code execution flows directly from your development workflow, making it easier to identify performance bottlenecks and trace issues back to specific code paths without leaving your terminal.

Learn more about OpenCode tracing

6. Native UV Support for Model Dependencies

Automatic dependency inference now supports UV. MLflow detects UV projects and captures exact, locked dependencies — including SHA-256 hashes for every package — from your lockfile when logging models, ensuring fully reproducible environments when serving or sharing models that were built with UV. This provides a safer approach against supply chain attacks: if an attacker publishes a modified package under an existing version number, the hash check fails and installation is blocked.

Learn more about UV dependency management

7. Pickle-Free Model Serialization

Enhance the security of your ML pipelines with pickle-free model formats. MLflow now supports safer model serialization using torch.export and skops formats, with improved controls when MLFLOW_ALLOW_PICKLE_DESERIALIZATION=False. Comprehensive documentation guides you through migrating existing models to pickle-free formats for production deployments.

Learn more about pickle-free model formats

Breaking Changes

  • TypeScript SDK Package Renaming: The MLflow TypeScript SDK packages have been renamed to use npm organization scoping. Update your package.json dependencies: mlflow-tracing@mlflow/core, mlflow-openai@mlflow/openai, mlflow-anthropic@mlflow/anthropic, mlflow-gemini@mlflow/gemini. All packages are now at version 0.2.0.
  • The MLFLOW_ENABLE_INCREMENTAL_SPAN_EXPORT environment variable has been removed.
  • litellm and gepa have been removed from genai extras.
  • / and : are now blocked in Registered Model names.

Full Changelog

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

What's Next

Get Started

Install MLflow 3.11.1 to try these new features:

pip install mlflow==3.11.1

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.10.1

· 4 min read
MLflow maintainers
MLflow maintainers

MLflow 3.10.1 is a patch release that contains some minor feature enhancements, bug fixes, and documentation updates.

Features:

Bug fixes:

  • [UI] Fix "View full dashboard" link in gateway usage tab when workspace is enabled (#21191, @copilot-swe-agent)
  • [UI] Persist AI Gateway default passphrase security banner dismissal to localStorage (#21292, @copilot-swe-agent)
  • [Evaluation] Demote unused parameters log message from WARNING to DEBUG in instructions judge (#21294, @copilot-swe-agent)
  • [UI] Clear "All" time selector when switching to overview tab (#21371, @daniellok-db)
  • [Prompts / UI] Fix Traces view in Prompts tab not being scrollable (#21282, @TomeHirata)
  • [UI] Fix judge builder instruction textarea (#21299, @daniellok-db)
  • [UI] Fix group mode to aggregate "Additional runs" as "Unassigned" group in charts (#21155, @copilot-swe-agent)
  • [UI] Fix artifact download when workspaces are enabled (#21074, @timsolovev)
  • [Tracing] Fix NOT NULL constraint on assessments.trace_id during trace export (#21348, @dbczumar)
  • [Tracking] Fix 403 Forbidden for artifact list via query param when default_permission=NO_PERMISSIONS (#21220, @copilot-swe-agent)
  • [UI] [ML-63097] Fix broken LLM judge documentation links (#21347, @smoorjani)
  • [Tracing] Fix Run Judge failed with litellm.InternalServerError: Invalid response object. (#21262, @PattaraS)
  • [Tracing / UI] Update Action menu: indentation to avoid confusion (#21266, @PattaraS)
  • [Model Registry] Fix MlflowClient.copy_model_version for the case that copy UC model across workspaces (#21212, @WeichenXu123)
  • [UI] Fix empty description box rendering for sanitized-empty experiment descriptions (#21223, @copilot-swe-agent)
  • [Artifacts] Fix single artifact downloading through HttpArtifactRepository (#12955, @Koenkk)
  • [Tracing] Fix find_last_user_message_index skipping skill content injections (#21119, @alkispoly-db)
  • [Tracing] Fix retrieval context extraction when span outputs are stored as strings (#21213, @smoorjani)
  • [UI] Fix visibility toggle button in chart tooltip not working (#21071, @daniellok-db)
  • [UI] Move gateway experiment filtering to server-side query to fix inconsistent page sizes (#21138, @copilot-swe-agent)
  • [Gateway] Downgrade spurious warning to debug log for gateway endpoints with fallback_config but no FALLBACK models (#21123, @copilot-swe-agent)
  • [Tracing] Fix MCP fn_wrapper to pass None for optional params with UNSET defaults (#21051, @yangbaechu)
  • [Tracking] Add CASCADE to logged_model tables experiment_id foreign keys (#20185, @harupy)
  • [Tracing] Fix MCP fn_wrapper handling of Click UNSET defaults (#20953) (#20962, @yangbaechu)

Documentation updates:

  • [Docs] Update SSO oidc plugin doc: add google identity platform / AWS cognito / Azure Entra ID configuration guide (#20591, @WeichenXu123)
  • [Docs / Tracing] Fix distributed tracing rendering and improve doc (#21070, @B-Step62)
  • [Docs] docs: Add single quotes to install commands with extras to prevent zsh errors (#21227, @mshavliuk)
  • [Docs / Model Registry] Fix outdated docstring claiming models:/ URIs are unsupported in register_model (#21197, @copilot-swe-agent)
  • [Docs] Replace MinIO with RustFS in docker-compose setup (#21099, @jmaggesi)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.10.0 Highlights: Multi-Workspace Support, Multi-Turn Evaluation, and many UI Enhancements!

· 5 min read
MLflow maintainers
MLflow maintainers

MLflow 3.10.0 is a major release that enhances MLflow's AI Observability and evaluation capabilities, while also making these features easier to use, both for new users and organizations operating at scale. This release brings multi-workspace support, evaluation and simulation for chatbot conversations, cost tracking for your traces, usage tracking for your AI Gateway endpoints, and a number of UI enhancements to make your apps and agent development much more intuitive.

1. Workspace Support in MLflow Tracking Server

MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of granularity and logically isolate them in a single tracking server. To enable this feature, pass the --enable-workspaces flag to the mlflow server command, or set the MLFLOW_ENABLE_WORKSPACES environment variable to true.

Learn more about multi-workspace support

2. Multi-turn Evaluation & Conversation Simulation

MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested.

Learn more about multi-turn evaluation

3. Trace Cost Tracking

Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. Additionally, costs are aggregated and broken down in the "Overview" tab, giving you granular insights into your LLM spend patterns.

Learn more about trace cost tracking

4. Navigation bar redesign

As we continue to add more features to the MLflow UI, we found that navigation was getting cluttered and overwhelming, with poor separation of features for different workflow types. We've redesigned the navigation bar to be more intuitive and easier to use, with a new sidebar that provides a more relevant set of tabs for both GenAI apps and agent developers, as well as classic model training workflows. The new experience also gives more space to the main content area, making it easier to focus on the task at hand.

5. MLflow Demo Experiment

New to MLflow GenAI? With one click, launch a pre-populated demo and explore LLM tracing, evaluation, and prompt management in action. No configuration, no code required. This feature is available in the MLflow UI's homepage, and provides a comprehensive overview of the different functionality that MLflow has to offer.

Get started by clicking the button as shown in the video above, or by running mlflow demo in your terminal.

6. Gateway Usage Tracking

Monitor your AI Gateway endpoints with detailed usage analytics. A new "Usage" tab shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end AI observability.

To turn this feature on for your AI Gateway endpoints, make sure to check the "Enable usage tracking" toggle in your endpoint settings, as shown in the video above.

Learn more about Gateway usage tracking

7. In-UI Trace Evaluation

Run custom or pre-built LLM judges directly from the traces and sessions UI, no code required! This enables quick evaluation of individual traces and individual without context switching to the Python SDK. In order to use this feature, make sure to set up an AI gateway endpoint, as you'll need to select an endpoint to use when running LLM judges.

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.10.0 to try these new features:

pip install mlflow==3.10.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization

· 6 min read
MLflow maintainers
MLflow maintainers

MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.

1. MLflow Assistant Powered by Claude Code

MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context—it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.

Key capabilities include:

  • No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
  • Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
  • Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
  • Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.

Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.

Learn more about MLflow Assistant

2. Dashboards for Agent Performance Metrics

A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.

Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.

Learn more about GenAI Dashboards

3. MemAlign: A New Judge Optimizer Algorithm

MemAlign is a new optimization algorithm for LLM-as-a-judge evaluation that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.

Use the MemAlignOptimizer to optimize your judges with historical feedback:

import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer

# Create a judge
judge = make_judge(
name="politeness",
instructions=(
"Given a user question, evaluate if the chatbot's response is polite and respectful. "
"Consider the tone, language, and context of the response.\n\n"
"Question: {{ inputs }}\n"
"Response: {{ outputs }}"
),
feedback_value_type=bool,
model="openai:/gpt-5-mini",
)

# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(reflection_lm="openai:/gpt-5-mini")

# Retrieve traces with human feedback
traces = mlflow.search_traces(return_type="list")

# Align the judge
aligned_judge = judge.align(traces=traces, optimizer=optimizer)

Learn more about MemAlign

4. Configuring and Building a Judge with Judge Builder UI

A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.

Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.

Learn more about Judge Builder

5. Continuous Online Monitoring with MLflow LLM Judges

Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.

Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.

Learn more about Agent Evaluation

6. Distributed Tracing for Tracking End-to-end Requests

Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. LLM tracing maintains trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.

Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context:

# Service A: Inject context into the headers of the outgoing request
import requests
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root"):
headers = get_tracing_context_headers_for_http_request()
requests.post(
"https://your.service/handle", headers=headers, json={"input": "hello"}
)
# Service B: Extract context from incoming request
import mlflow
from flask import Flask, request
from mlflow.tracing import set_tracing_context_from_http_request_headers

app = Flask(__name__)

@app.post("/handle")
def handle():
headers = dict(request.headers)
with set_tracing_context_from_http_request_headers(headers):
with mlflow.start_span("server-handler") as span:
# ... your logic ...
span.set_attribute("status", "ok")
return {"ok": True}

Learn more about Distributed Tracing

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.9.0 to try these new features:

pip install mlflow==3.9.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.8.1

· One min read
MLflow maintainers
MLflow maintainers

MLflow 3.8.1 includes several bug fixes and documentation updates.

Bug fixes:

  • [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
  • [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
  • [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
  • [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)

Small bug fixes and documentation updates:

#19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.8.0

· 5 min read
MLflow maintainers
MLflow maintainers

MLflow 3.8.0 includes several major features and improvements

Major Features

  • ⚙️Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
  • In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
  • ⚖️DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
  • 🛡️Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
  • Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)

Important Notice

  • Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.

Features:

  • [Tracking] Add default passphrase support (#19360, @BenWilson2)
  • [Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
  • [Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
  • [Tracking] Add --max-results option to mlflow experiments search (#19359, @alkispoly-db)
  • [Tracking] Enhance encryption security (#19253, @BenWilson2)
  • [Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
  • [Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
  • [Tracing] Support batch span export to UC Table (#19324, @B-Step62)
  • [Tracking] Add endpoint tags (#19308, @BenWilson2)
  • [Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
  • [Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
  • [Prompts] render text prompt previews in markdown (#19200, @ispoljari)
  • [UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
  • [Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
  • [Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
  • [Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
  • [Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
  • [Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
  • [Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
  • [Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
  • [Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
  • [Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
  • [Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
  • [Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
  • [Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
  • [Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)

Bug fixes:

  • [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
  • [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
  • [Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
  • [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
  • [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
  • [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
  • [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
  • [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
  • [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
  • [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
  • [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
  • [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
  • [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
  • [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
  • [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
  • [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
  • [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
  • [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
  • [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
  • [Models] Add context loading for ChatModel (#19250, @BenWilson2)
  • [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
  • [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
  • [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)

Documentation updates:

  • [Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
  • [Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
  • [Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
  • [Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
  • [Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
  • [Docs / Evaluation] fix: Confusing documentation for mlflow.genai.evaluate() (#19380, @brandonhawi)
  • [Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
  • [Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
  • [Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.7.0

· 9 min read
MLflow maintainers
MLflow maintainers

MLflow 3.7.0 includes several major features and improvements for GenAI Observability, Evaluation, and Prompt Management.

Major Features

  • 📝 Experiment Prompts UI: New prompts functionality in the experiment UI allows you to manage and search prompts directly within experiments, with support for filter strings and prompt version search in traces. (#19156, #18919, #18906, @TomeHirata)
  • 💬 Multi-turn Evaluation Support: Enhanced mlflow.genai.evaluate now supports multi-turn conversations, enabling comprehensive assessment of conversational AI applications with DataFrame and list inputs. (#18971, @AveshCSingh)
  • ⚖️ Trace Comparison: New side-by-side comparison view in the Traces UI allows you to analyze and debug LLM application behavior across different runs, making it easier to identify regressions and improvements. (#17138, @joelrobin18)
  • 🌐 Gemini TypeScript SDK: Auto-tracing support for Google's Gemini in TypeScript, expanding MLflow's observability capabilities for JavaScript/TypeScript AI applications. (#18207, @joelrobin18)
  • 🎯 Structured Outputs in Judges: The make_judge API now supports structured outputs, enabling more precise and programmatically consumable evaluation results. (#18529, @TomeHirata)
  • 🔗 VoltAgent Tracing: Added auto-tracing support for VoltAgent, extending MLflow's observability to this AI agent framework. (#19041, @joelrobin18)

Breaking Changes

Features

Bug Fixes

Documentation Updates

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.6.0

· 3 min read
MLflow maintainers
MLflow maintainers

MLflow 3.6.0 includes several major features and improvements for AI Observability, Experiment UI, Agent Evaluation and Deployment.

#1: Full OpenTelemetry Support in MLflow Tracking Server

OpenTelemetry Trace Example

MLflow now offers comprehensive OpenTelemetry integration, allowing you to use OpenTelemetry and MLflow seamlessly together for your observability stack.

  • Ingest OpenTelemetry spans directly into the MLflow tracking server
  • Monitor existing applications that are instrumented with OpenTelemetry
  • Choose Arbitrary Languages for your AI applications and trace them, including Java, Go, Rust, and more.
  • Create unified traces that combine MLflow SDK instrumentation with OpenTelemetry auto-instrumentation from third-party libraries

For more information, please check out the blog post for more details.

#2: Session-level View in Trace UI

Session-level View in Trace UI

New chat sessions tab provides a dedicated view for organizing and analyzing related traces at the session level, making it easier to track conversational workflows.

See the Track Users & Sessions guide for more details.

#3: New Supported Frameworks in TypeScript Tracing SDK

Auto-tracing support for Vercel AI SDK, LangChain.js, Mastra, Anthropic SDK, Gemini SDK in TypeScript, expanding MLflow's observability capabilities across popular JavaScript/TypeScript frameworks.

For more information, please check out the TypeScript Tracing SDK.

#4: Tracking Judge Cost and Traces

Comprehensive tracking of LLM judge evaluation costs and traces, providing visibility into evaluation expenses and performance with automatic cost calculation and rendering

See LLM Evaluation Guide for more details.

#5: New experiment tab bar

The experiment tab bar has been fully overhauled to provide more intuitive and discoverable navigation of different features in MLflow.

Upgrade to MLflow 3.6.0 to try it out!

#6: Agent Server for Lightning Agent Deployment

import agent
from mlflow.genai.agent_server import AgentServer

agent_server = AgentServer("ResponsesAgent")
app = agent_server.app

def main():
agent_server.run(app_import_string="start_server:app")

if __name__ == "__main__":
main()
python3 start_server.py

curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
"stream": true
}'

New agent server infrastructure for managing and deploying scoring agents with enhanced orchestration capabilities.

See Agent Server Guide for more details.

Breaking Changes and deprecations

  • Drop numbering suffix (_1, _2, ...) from span names (#18531)
  • Deprecate promptflow, pmdarima, and diviner flavors (#18597, #18577)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.