MLflow 3.0 (Preview)
Discover the next generation of MLflow, designed to streamline your AI experimentation and accelerate your journey from idea to production. MLflow 3.0 brings cutting-edge support for GenAI workflows, enabling seamless integration of generative AI models into your projects.
What is MLflow 3.0?​
MLflow 3.0 delivers best-in-class experiment tracking, observability, and performance evaluation for machine learning models, AI applications, and generative AI agents! With MLflow 3.0, it's now easier than ever to:
- Centrally track and analyze the performance of your models, agents, and generative AI applications across all environments, from interactive queries in a development notebook through production batch or real-time serving deployments.
- Select the best models, agents, and generative AI applications for production with an enhanced performance comparison experience powered by MLflow's tracing and evaluation capabilities.
MLflow 3.0's enhanced model tracking helps you manage and evaluate different model, agent, and generative AI application configurations in your experiments. Improved comparison workflows with the new LoggedModel enable you to quickly identify the best candidates for production, and MLflow tracing provides rich observability everywhere that your models, agents, and generative AI applications are deployed.
Enhanced Model Tracking​
In MLflow 3.0, we introduce a refined architecture along with revamped APIs and UI, tailored to enhance generative AI and deep learning workflows. With GenAI agents, there are multiple rounds of offline evaluation via batch jobs and interactive queries from human beta testers. In deep learning, training often generates multiple model checkpoints, where the best candidates are further evaluated before production deployment.
We are introducing a new first-class object, the LoggedModel entity, into MLflow Tracking to streamline these processes. As you define and evaluate your GenAI agents in code, or your deep learning jobs create and evaluate models, they will be automatically stored as MLflow Logged Models in your MLflow Experiment.
When developing a new version of a GenAI agent or application, you can group all of the traces and metrics from interactive queries and automated evaluation jobs using MLflow's LoggedModel API. This enables rich, comprehensive comparisons across versions. Similarly, every deep learning checkpoint is stored as a LoggedModel with its own metrics and parameters, simplifying the process of identifying the best checkpoint for deployment or continued training.
Quickstart​
Prerequisites: Run the following command to install MLflow 3.0 and OpenAI packages.
pip install --upgrade mlflow>=3.0.0rc0 --pre
pip install openai
This quickstart demonstrates how to create a generative AI application with prompt engineering and evaluate it using MLflow 3.0. It highlights the integration of the LoggedModel lineage feature with runs and traces, showcasing seamless tracking and observability for GenAI workflows.
from openai import OpenAI
import mlflow
from mlflow.metrics.genai import answer_correctness, answer_similarity, faithfulness
# turn on autologging for automatic tracing
mlflow.openai.autolog()
# Initialize OpenAI client
client = OpenAI()
# define a prompt template
prompt_template = """\
You are an expert AI assistant. Answer the user's question with clarity, accuracy, and conciseness.
## Question:
{question}
## Guidelines:
- Keep responses factual and to the point.
- If relevant, provide examples or step-by-step instructions.
- If the question is ambiguous, clarify before answering.
Respond below:
"""
# groundtruth result for evaluation
mlflow_ground_truth = (
"MLflow is an open-source platform for managing "
"the end-to-end machine learning (ML) lifecycle. It was developed by Databricks, "
"a company that specializes in big data and machine learning solutions. MLflow is "
"designed to address the challenges that data scientists and machine learning "
"engineers face when developing, training, and deploying machine learning models."
)
# Define evaluation metrics
metrics = {
"answer_similarity": answer_similarity(model="openai:/gpt-4o"),
"answer_correctness": answer_correctness(model="openai:/gpt-4o"),
"faithfulness": faithfulness(model="openai:/gpt-4o"),
}
question = "What is MLflow?"
# # Start a run to represent the evaluation process
with mlflow.start_run():
response = (
client.chat.completions.create(
messages=[
{"role": "user", "content": prompt_template.format(question=question)}
],
model="gpt-4o-mini",
temperature=0.1,
max_tokens=2000,
)
.choices[0]
.message.content
)
# Calculate metrics based on the input, response and ground truth
# The evaluation metrics are callables that can be invoked directly
answer_similarity_score = metrics["answer_similarity"](
predictions=response, inputs=question, targets=mlflow_ground_truth
).scores[0]
answer_correctness_score = metrics["answer_correctness"](
predictions=response, inputs=question, targets=mlflow_ground_truth
).scores[0]
faithfulness_score = metrics["faithfulness"](
predictions=response, inputs=question, context=mlflow_ground_truth
).scores[0]
# Fetch the LoggedModel that's automatically created during autologging
logged_model = mlflow.last_logged_model()
# Log metrics and pass model_id to link the metrics
mlflow.log_metrics(
{
"answer_similarity": answer_similarity_score,
"answer_correctness": answer_correctness_score,
"faithfulness": faithfulness_score,
},
model_id=logged_model.model_id,
)
print(f"LoggedModel model id: {logged_model.model_id}")
# LoggedModel model id: a208e70b-e80b-4f8e-b210-20737faadd20
traces = mlflow.search_traces(model_id=logged_model.model_id)
print(traces)
# request_id trace ... tags assessments
# 0 5882df1240cf4dbf845fdc9fa26c4168 Trace(request_id=5882df1240cf4dbf845fdc9fa26c4... ... {'mlflow.artifactLocation': 'file:///Users/ser... []
# [1 rows x 11 columns]
Navigate to the Models tab of the experiment to view the newly created LoggedModel. Evaluation metrics, model ID, source run, parameters, and other details are displayed on the models detail page, providing a comprehensive overview of the model's performance and lineage.
Traces tab of the model displays the auto-generated traces:
Clicking on the source_run takes you to the evaluation run's page with all the metrics:
MLflow 3.0 Showcases​
Explore the examples below to see how MLflow 3.0's powerful features can be applied across various domains.
Discover how to log, evaluate, and trace GenAI agents using MLflow 3.0.
Learn how to leverage MLflow 3.0 to identify the best models in deep learning workflows.
Migration Guide​
MLflow 3.0 introduces some key API changes while also removes some outdated features. This guide will help you transition smoothly to the latest version.
Key changes​
mlflow.<flavor>.log_model
API usage:artifact_path
parameter is deprecated, usename
instead
- MLflow 2.x
- MLflow 3.0
with mlflow.start_run():
mlflow.pyfunc.log_model(artifact_path="model", python_model=python_model, ...)
Pass name
when logging a model. This allows you to later search for LoggedModels using this name.
with mlflow.start_run():
mlflow.pyfunc.log_model(name="python_model", python_model=python_model, ...)
- Model artifacts storage location change: In MLflow 2.x, model artifacts are stored as run artifacts.
Since MLflow 3.0, those artifacts will be stored into models artifacts location. Note: this impacts the behavior of
list_artifacts
API.
Removed Features​
- MLflow Recipes
- Flavors: the following model flavors are no longer supported
- fastai
- h2o
- mleap
- AI gateway client APIs: use deployments APIs instead