mlflow.projects
The mlflow.projects
module provides an API for running MLflow projects locally or remotely.
-
mlflow.projects.
run
(uri, entry_point='main', version=None, parameters=None, experiment_id=None, mode=None, cluster_spec=None, git_username=None, git_password=None, use_conda=True, storage_dir=None, block=True, run_id=None) Run an MLflow project. The project can be local or stored at a Git URI.
You can run the project locally or remotely on a Databricks.
For information on using this method in chained workflows, see Building Multistep Workflows.
Raises: ExecutionException – If a run launched in blocking mode is unsuccessful.
Parameters: - uri – URI of project to run. A local filesystem path or a Git repository URI (e.g. https://github.com/mlflow/mlflow-example) pointing to a project directory containing an MLproject file.
- entry_point – Entry point to run within the project. If no entry point with the specified
name is found, runs the project file
entry_point
as a script, using “python” to run.py
files and the default shell (specified by environment variable$SHELL
) to run.sh
files. - version – For Git-based projects, a commit hash.
- experiment_id – ID of experiment under which to launch the run.
- mode – Execution mode of the run: “local” or “databricks”.
- cluster_spec – When
mode
is “databricks”, path to a JSON file containing a Databricks cluster specification to use when launching a run. - git_username – Username for HTTP(S) authentication with Git.
- git_password – Password for HTTP(S) authentication with Git.
- use_conda – If True (the default), create a new Conda environment for the run and install project dependencies within that environment. Otherwise, run the project in the current environment without installing any project dependencies.
- storage_dir – Used only if
mode
is “local”. MLflow downloads artifacts from distributed URIs passed to parameters of typepath
to subdirectories ofstorage_dir
. - block – Whether to block while waiting for a run to complete. Defaults to True.
Note that if
block
is False and mode is “local”, this method will return, but the current process will block when exiting until the local run completes. If the current process is interrupted, any asynchronous runs launched via this method will be terminated. - run_id – Note: this argument is used internally by the MLflow project APIs and should not be specified. If specified, the run ID will be used instead of creating a new run.
Returns: mlflow.projects.SubmittedRun
exposing information (e.g. run ID) about the launched run.
-
class
mlflow.projects.
SubmittedRun
Bases:
object
Wrapper around an MLflow project run (e.g. a subprocess running an entry point command or a Databricks job run) and exposing methods for waiting on and cancelling the run. This class defines the interface that the MLflow project runner uses to manage the lifecycle of runs launched in different environments (e.g. runs launched locally or on Databricks).
SubmittedRun
is not thread-safe. That is, concurrent calls to wait() / cancel() from multiple threads may inadvertently kill resources (e.g. local processes) unrelated to the run.Note
Subclasses of
SubmittedRun
must expose arun_id
member containing the run’s MLflow run ID.