mlflow
The mlflow
module provides an API for starting and managing MLflow runs.
For example:
import mlflow
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()
You can also use syntax like this:
with mlflow.start_run() as run:
...
which automatically terminates the run at the end of the block.
The tracking API is not currently threadsafe. Any concurrent callers to the tracking API must implement mutual exclusion manually.
For a lower level API, see the mlflow.tracking
module.
-
class
mlflow.
ActiveRun
(run) Bases:
mlflow.entities.run.Run
Wrapper around
mlflow.entities.Run
to enable using Pythonwith
syntax.
-
mlflow.
log_param
(key, value) Log a parameter under the current run, creating a run if necessary.
Parameters: - key – Parameter name (string)
- value – Parameter value (string, but will be string-ified if not)
-
mlflow.
log_metric
(key, value) Log a metric under the current run, creating a run if necessary.
Parameters: - key – Metric name (string).
- value – Metric value (float).
-
mlflow.
set_tag
(key, value) Set a tag under the current run, creating a run if necessary.
Parameters: - key – Tag name (string)
- value – Tag value (string, but will be string-ified if not)
-
mlflow.
log_artifacts
(local_dir, artifact_path=None) Log all the contents of a local directory as artifacts of the run.
Parameters: - local_dir – Path to the directory of files to write.
- artifact_path – If provided, the directory in
artifact_uri
to write to.
-
mlflow.
log_artifact
(local_path, artifact_path=None) Log a local file or directory as an artifact of the currently active run.
Parameters: - local_path – Path to the file to write.
- artifact_path – If provided, the directory in
artifact_uri
to write to.
-
mlflow.
start_run
(run_uuid=None, experiment_id=None, source_name=None, source_version=None, entry_point_name=None, source_type=None, run_name=None, nested=False) Start a new MLflow run, setting it as the active run under which metrics and parameters will be logged. The return value can be used as a context manager within a
with
block; otherwise, you must callend_run()
to terminate the current run.If you pass a
run_uuid
or theMLFLOW_RUN_ID
environment variable is set,start_run
attempts to resume a run with the specified run ID and other parameters are ignored.run_uuid
takes precedence overMLFLOW_RUN_ID
.Parameters: - run_uuid – If specified, get the run with the specified UUID and log parameters
and metrics under that run. The run’s end time is unset and its status
is set to running, but the run’s other attributes (
source_version
,source_type
, etc.) are not changed. - experiment_id – ID of the experiment under which to create the current run (applicable
only when
run_uuid
is not specified). Ifexperiment_id
argument is unspecified, will look for valid experiment in the following order: activated usingset_experiment
,MLFLOW_EXPERIMENT_ID
env variable, or the default experiment. - source_name – Name of the source file or URI of the project to be associated with the run. If none provided defaults to the current file.
- source_version – Optional Git commit hash to associate with the run.
- entry_point_name – Optional name of the entry point for the current run.
- source_type – Integer
mlflow.entities.SourceType
describing the type of the run (“local”, “project”, etc.). Defaults tomlflow.entities.SourceType.LOCAL
(“local”). - run_name – Name of new run. Used only when
run_uuid
is unspecified. - nested – Parameter which must be set to
True
to create nested runs.
Returns: mlflow.ActiveRun
object that acts as a context manager wrapping the run’s state.- run_uuid – If specified, get the run with the specified UUID and log parameters
and metrics under that run. The run’s end time is unset and its status
is set to running, but the run’s other attributes (
-
mlflow.
get_artifact_uri
() Get the artifact URI of the currently active run. Calls to
log_artifact
andlog_artifacts
write artifact(s) to subdirectories of the returned URI.
-
mlflow.
set_tracking_uri
(uri) Set the tracking server URI. This does not affect the currently active run (if one exists), but takes effect for successive runs.
Parameters: uri – - An empty string, or a local file path, prefixed with
file:/
. Data is stored locally at the provided file (or./mlruns
if empty). - An HTTP URI like
https://my-tracking-server:5000
. - A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.
- An empty string, or a local file path, prefixed with
-
mlflow.
create_experiment
(name, artifact_location=None) Create an experiment.
Parameters: - name – The experiment name. Must be unique.
- artifact_location – The location to store run artifacts. If not provided, the server picks an appropriate default.
Returns: Integer ID of the created experiment.
-
mlflow.
set_experiment
(experiment_name) Set given experiment as active experiment. If experiment does not exist, create an experiment with provided name.
Parameters: experiment_name – Name of experiment to be activated.
-
mlflow.
run
(uri, entry_point='main', version=None, parameters=None, experiment_id=None, mode=None, cluster_spec=None, git_username=None, git_password=None, use_conda=True, storage_dir=None, block=True, run_id=None) Run an MLflow project. The project can be local or stored at a Git URI.
You can run the project locally or remotely on a Databricks.
For information on using this method in chained workflows, see Building Multistep Workflows.
Raises: ExecutionException – If a run launched in blocking mode is unsuccessful.
Parameters: - uri – URI of project to run. A local filesystem path or a Git repository URI (e.g. https://github.com/mlflow/mlflow-example) pointing to a project directory containing an MLproject file.
- entry_point – Entry point to run within the project. If no entry point with the specified
name is found, runs the project file
entry_point
as a script, using “python” to run.py
files and the default shell (specified by environment variable$SHELL
) to run.sh
files. - version – For Git-based projects, either a commit hash or a branch name.
- experiment_id – ID of experiment under which to launch the run.
- mode – Execution mode of the run: “local” or “databricks”. If running against Databricks, will run against a Databricks workspace determined as follows: if a Databricks tracking URI of the form ‘databricks://profile’ has been set (e.g. by setting the MLFLOW_TRACKING_URI environment variable), will run against the workspace specified by <profile>. Otherwise, runs against the workspace specified by the default Databricks CLI profile.
- cluster_spec – When
mode
is “databricks”, dictionary or path to a JSON file containing a Databricks cluster specification to use when launching a run. - git_username – Username for HTTP(S) authentication with Git.
- git_password – Password for HTTP(S) authentication with Git.
- use_conda – If True (the default), create a new Conda environment for the run and install project dependencies within that environment. Otherwise, run the project in the current environment without installing any project dependencies.
- storage_dir – Used only if
mode
is “local”. MLflow downloads artifacts from distributed URIs passed to parameters of typepath
to subdirectories ofstorage_dir
. - block – Whether to block while waiting for a run to complete. Defaults to True.
Note that if
block
is False and mode is “local”, this method will return, but the current process will block when exiting until the local run completes. If the current process is interrupted, any asynchronous runs launched via this method will be terminated. - run_id – Note: this argument is used internally by the MLflow project APIs and should not be specified. If specified, the run ID will be used instead of creating a new run.
Returns: mlflow.projects.SubmittedRun
exposing information (e.g. run ID) about the launched run.