Usage Tracking

Starting with version 3.2.0, MLflow collects anonymized usage data by default. This data contains no sensitive or personally identifiable information.

important

MLflow does not collect any data that contains personal information, in accordance with GDPR and other privacy regulations. As a Linux Foundation project, MLflow adheres to the LF telemetry data collection and usage policy. This implementation has been reviewed and approved by the Linux Foundation, with the approved proposal documented at the Completed Reviews section in the official policy. See the Data Explanation section below for details on what is collected.

note

Telemetry is only enabled in Open Source MLflow. If you're using MLflow through a managed service or distribution, please consult your vendor to determine whether telemetry is enabled in your environment. In all cases, you can choose to opt out by following the guidance provided in our documentation.

Why is data being collected?

MLflow uses anonymous telemetry to understand feature usage, which helps guide development priorities and improve the library. This data helps us identify which features are most valuable and where to focus on bug fixes or enhancements.

Under the General Data Protection Regulation (GDPR), data controllers and processors are responsible for handling personal data with care, transparency, and accountability.

MLflow complies with GDPR in the following ways:

No Personal Data Collected: The telemetry data collected is fully anonymized and does not include any personal or sensitive information (e.g., usernames, IP addresses, file names, parameters, or model content). MLflow generates a random UUID for each session for aggregating usage events, which cannot be used to identify or track individual users.
Purpose Limitation: Data is only used to improve the MLflow project based on aggregate feature usage patterns.
Data Minimization: Only the minimum necessary metadata is collected to inform project priorities (e.g., feature toggle state, SDK/platform used, version info).
User Control: Users can opt out of telemetry at any time by setting the environment variable MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. MLflow respects these settings immediately without requiring a restart.
Transparency: Telemetry endpoints and behavior are documented publicly, and MLflow users can inspect or block the relevant network calls.

For further inquiries or data protection questions, users can file an issue on the MLflow GitHub repository.

What data is collected?

MLflow collects only non-sensitive, anonymized data to help us better understand usage patterns. The below section outlines the data currently collected in this version of MLflow. You can view the exact data collected in the source code.

Data Explanation

Data Element	Explanation	Example	Why we track this
Unique session ID	A randomly generated, non-personally identifiable UUID is created for each session—defined as each time MLflow is imported	45e2751243e84c7e87aca6ac25d75a0d	As an identifier for the data in current MLflow session
Source SDK	The current used SDK name	mlflow \| mlflow-skinny \| mlflow-tracing	To understand adoption of different MLflow SDKs and identify enhancement areas
MLflow version	The current SDK version	3.2.0	To identify version-specific usage patterns and support, bug fixes, or deprecation decisions
Python version	The current python version	3.10.16	To ensure compatibility across Python versions and guide testing or upgrade recommendations
Operating System	The operating system on which MLflow is running	macOS-15.4.1-arm64-arm-64bit	To understand platform-specific usage and detect platform-dependent issues
Tracking URI Scheme	The scheme of the current tracking URI	file \| sqlite \| mysql \| postgresql \| mssql \| https \| http \| custom_scheme \| None	To determine which tracking backends are most commonly used and optimize backend support
Event Name	The tracked event name (see below table for what events are tracked)	create_experiment	To measure feature usage and improvements
Event Status	Whether the event succeeds or not	success \| failure \| unknown	To identify common failure points and improve reliability and error handling
Timestamp (nanoseconds)	Time when the event occurred	1753760188623715000	As an identifier for the event
Duration	The time the event call takes, in milliseconds	1000	To monitor performance trends and detect regressions in response time
Parameters (boolean or enumerated values)	See below table for collected parameters for each event	create_logged_model event: `{"flavor": "langchain"}`	To better understand the usage pattern for each event

Tracked Events

No details about the specific model, code, or weights are collected. Only the parameters listed under the Tracked Parameters column are recorded alongside the event; For events with None in the Tracked Parameters column, only the event name is recorded. If "MLFLOW_EXPERIMENT_ID" environment variable exists, it is tracked as a param.

Event Name	Tracked Parameters	Example
create_experiment	Created Experiment ID (random uuid or integer)	`{"experiment_id": "0"}`
create_run	Imported packages among MODULES_TO_CHECK_IMPORT are imported or not; experiment ID used when creating the run	`{"imports": ["sklearn"], "experiment_id": "0"}`
create_logged_model	Flavor of the model (e.g. langchain, sklearn)	`{"flavor": "langchain"}`
get_logged_model	Imported packages among MODULES_TO_CHECK_IMPORT are imported or not	`{"imports": ["sklearn"]}`
create_registered_model	None	None
create_model_version	None	None
create_prompt	None	None
load_prompt	Whether alias is used	`{"uses_alias": True}`
start_trace	None	None
log_assessment	Type of the assessment and source	`{"type": "feedback", "source_type": "CODE"}`
evaluate	None	None
create_webhook	Entities of the webhook	`{"events": ["model_version.created"]}`
genai_evaluate	Builtin scorers used during GenAI Evaluate	`{"builtin_scorers": ["relevance_to_query"]}`
prompt_optimization	Optimizer type, number of prompts, and number of scorers	`{"optimizer_type": True, "prompt_count": 5, "scorer_count": 1}`
log_dataset	None	None
log_metric	Whether synchronous mode is on or not	`{"synchronous": False}`
log_param	Whether synchronous mode is on or not	`{"synchronous": True}`
log_batch	Information on whether metrics, parameters, or tags are logged, and the logging mode	`{"metrics": False, "params": True, "tags": False, "synchronous": False}`
invoke_custom_judge_model	Judge model provider	`{"model_provider": "databricks"}`
make_judge	Model provider (extracted from model string if format is provider:model)	`{"model_provider": "openai"}`
align_judge	Number of traces provided and optimizer type	`{"trace_count": 100, "optimizer_type": "AlignmentOptimizer"}`
autologging	Flavor and metadata	`{"flavor": "openai", "log_traces": True, "disable": False}`
ai_command_run	Command key and invocation context (cli or mcp)	`{"command_key": "genai/analyze_experiment", "context": "cli"}`

Why is MLflow Telemetry Opt-Out?

MLflow uses an opt-out telemetry model to help improve the platform for all users based on real-world usage patterns. Collecting anonymous usage data by default allows us to:

Understand how MLflow is being used across a wide range of environments and workflows
Identify common pain points and identify feature improvements area more effectively
Measure the impact of changes and ensure they improve the experience for the broader community

If telemetry were opt-in, only a small, self-selected subset of users would be represented, leading to biased insights and potentially misaligned priorities. We are committed to transparency and user choice. Telemetry is clearly documented, anonymized, and can be easily disabled at any time through configuration. This approach helps us make MLflow better for everyone, while giving you full control. Check what we are doing with this data section for more information.

How to opt-out?

MLflow supports opt-out telemetry through either of the following environment variables:

MLFLOW_DISABLE_TELEMETRY=true
DO_NOT_TRACK=true

Setting either of these will immediately disable telemetry, no need to re-import MLflow or restart your session.

note

MLflow automatically disables telemetry in some CI environments. If you'd like support for additional CI environments, please open an issue on our GitHub repository.

CI
Github Actions
CircleCI
GitLab CI/CD
Jenkins Pipeline
Travis CI
Azure Pipelines
BitBucket
AWS CodeBuild
BuildKite
...

Scope of the setting

The environment variable only takes effect in processes where it is explicitly set or inherited.
If you spawn subprocesses from a clean environment, those subprocesses may not inherit your shell's environment, and telemetry could remain enabled. e.g. subprocess.run([...], env={})

Recommendations to ensure telemetry is consistently disabled across all environments:

Add the variable to your shell startup file (~/.bashrc, ~/.zshrc, etc.): export MLFLOW_DISABLE_TELEMETRY=true
If you're using subprocesses or isolated environments, use a dotenv manager or explicitly pass the variable when launching.

How to validate telemetry is disabled?

Use the following code to validate telemetry is disabled.

python
from mlflow.telemetry import get_telemetry_client

assert get_telemetry_client() is None, "Telemetry is enabled"

How to opt-out for your organization?

Organizations can disable telemetry by blocking network access to https://config.mlflow-telemetry.io. When this endpoint is unreachable, MLflow automatically disables telemetry.

What are we doing with this data?

We aggregate anonymized usage data and plan to share insights with the community through public dashboards. You'll be able to see how MLflow features are used and help improve them by contributing.

Why is data being collected?​

GDPR Compliance​

What data is collected?​

Data Explanation​

Tracked Events​

Why is MLflow Telemetry Opt-Out?​

How to opt-out?​

Scope of the setting​

How to validate telemetry is disabled?​

How to opt-out for your organization?​

What are we doing with this data?​