mlflow.sklearn

The mlflow.sklearn module provides an API for logging and loading scikit-learn models. This module exports scikit-learn models with the following flavors:

Python (native) pickle format
This is the main flavor that can be loaded back into scikit-learn.
mlflow.pyfunc
Produced for use by generic pyfunc-based deployment tools and batch inference.
mlflow.sklearn.load_model(path, run_id=None)

Load a scikit-learn model from a local file (if run_id is None) or a run.

Parameters:
  • path – Local filesystem path or run-relative artifact path to the model saved by mlflow.sklearn.save_model().
  • run_id – Run ID. If provided, combined with path to identify the model.
>>> import mlflow.sklearn
>>> sk_model = mlflow.sklearn.load_model("sk_models", run_id="96771d893a5e46159d9f3b49bf9013e2")
>>> #use Pandas DataFrame to make predictions
>>> pandas_df = ...
>>> predictions = sk_model.predict(pandas_df)
mlflow.sklearn.log_model(sk_model, artifact_path, conda_env=None, serialization_format='pickle')

Log a scikit-learn model as an MLflow artifact for the current run.

Parameters:
  • sk_model – scikit-learn model to be saved.
  • artifact_path – Run-relative artifact path.
  • conda_env – Path to a Conda environment file. If provided, this decribes the environment this model should be run in. At minimum, it should specify python, scikit-learn, and mlflow with appropriate versions.
  • serialization_format – The format in which to serialize the model. This should be one of the following: mlflow.sklearn.SERIALIZATION_FORMAT_PICKLE, mlflow.sklearn.SERIALIZATION_FORMAT_CLOUDPICKLE.
>>> import mlflow
>>> import mlflow.sklearn
>>> from sklearn.datasets import load_iris
>>> from sklearn import tree
>>> iris = load_iris()
>>> sk_model = tree.DecisionTreeClassifier()
>>> sk_model = sk_model.fit(iris.data, iris.target)
>>> #set the artifact_path to location where experiment artifacts will be saved
>>> #log model params
>>> mlflow.log_param("criterion", sk_model.criterion)
>>> mlflow.log_param("splitter", sk_model.splitter)
>>> #log model
>>> mlflow.sklearn.log_model(sk_model, "sk_models")
mlflow.sklearn.save_model(sk_model, path, conda_env=None, mlflow_model=<mlflow.models.Model object>, serialization_format='cloudpickle')

Save a scikit-learn model to a path on the local file system.

Parameters:
  • sk_model – scikit-learn model to be saved.
  • path – Local path where the model is to be saved.
  • conda_env – Path to a Conda environment file. If provided, this decribes the environment this model should be run in. At minimum, it should specify python, scikit-learn, and mlflow with appropriate versions.
  • mlflow_modelmlflow.models.Model this flavor is being added to.
  • serialization_format – The format in which to serialize the model. This should be one of the formats listed in mlflow.sklearn.SUPPORTED_SERIALIZATION_FORMATS. The Cloudpickle format, mlflow.sklearn.SERIALIZATION_FORMAT_CLOUDPICKLE, provides better cross-system compatibility by identifying and packaging code dependencies with the serialized model.
>>> import mlflow.sklearn
>>> from sklearn.datasets import load_iris
>>> from sklearn import tree
>>> iris = load_iris()
>>> sk_model = tree.DecisionTreeClassifier()
>>> sk_model = sk_model.fit(iris.data, iris.target)
>>> #Save the model in cloudpickle format
>>> #set path to location for persistence
>>> sk_path_dir_1 = ...
>>> mlflow.sklearn.save_model(
>>>         sk_model, sk_path_dir_1,
>>>         serialization_format=mlflow.sklearn.SERIALIZATION_FORMAT_CLOUDPICKLE)
>>>
>>> #Save the model in pickle format
>>> #set path to location for persistence
>>> sk_path_dir_2 = ...
>>> mlflow.sklearn.save_model(sk_model, sk_path_dir_2,
>>>                           serialization_format=mlflow.sklearn.SERIALIZATION_FORMAT_PICKLE)