Export and import of generic Python models.

This module defines generic filesystem format for Python models and provides utilities for saving and loading to and from this format. The format is self contained in the sense that it includes all necessary information for anyone to load it and use it. Dependencies are either stored directly with the model or referenced via a Conda environment.

The convention for pyfunc models is to have a predict method or function with the following signature:

predict(data: pandas.DataFrame) -> numpy.ndarray | pandas.Series | pandas.DataFrame

This convention is relied on by other MLflow components.

Pyfunc model format is defined as a directory structure containing all required data, code, and configuration:

        ./MLmodel: configuration
        <code>: code packaged with the model (specified in the MLmodel file)
        <data>: data packaged with the model (specified in the MLmodel file)
        <env>: Conda environment definition (specified in the MLmodel file)

It must contain MLmodel file in its root with “python_function” format with the following parameters:

  • loader_module [required]:

    Python module that can load the model. Expected as module identifier e.g. mlflow.sklearn, it will be imported via importlib.import_module. The imported module must contain function with the following signature:

    load_pyfunc(path: string) -> <pyfunc model>

    The path argument is specified by the data parameter and may refer to a file or directory.

  • code [optional]:

    Relative path to a directory containing the code packaged with this model. All files and directories inside this directory are added to the Python path prior to importing the model loader.

  • data [optional]:

    Relative path to a file or directory containing model data. The path is passed to the model loader.

  • env [optional]:

    Relative path to an exported Conda environment. If present this environment should be activated prior to running the model.


>tree example/sklearn_iris/mlruns/run1/outputs/linear-lr
├── MLmodel
├── code
│   ├──
├── data
│   └── model.pkl
└── mlflow_env.yml

>cat example/sklearn_iris/mlruns/run1/outputs/linear-lr/MLmodel
  code: code
  data: data/model.pkl
  loader_module: mlflow.sklearn
  env: mlflow_env.yml
  main: sklearn_iris
mlflow.pyfunc.add_to_model(model, loader_module, data=None, code=None, env=None)

Add a pyfunc spec to the model configuration.

Defines pyfunc configuration schema. Caller can use this to create a valid pyfunc model flavor out of an existing directory structure. For example, other model flavors can use this to specify how to use their output as a pyfunc.

NOTE: all paths are relative to the exported model root directory.

  • loader_module – The module to be used to load the model.
  • model – Existing model.
  • data – Path to the model data.
  • code – Path to the code dependencies.
  • env – Conda environment.

Updated model configuration.

mlflow.pyfunc.get_module_loader_src(src_path, dst_path)

Generate Python source of the model loader.

Model loader contains load_pyfunc method with no parameters. It hardcodes model loading of the given model into a Python source. This is done so that the exported model has no unnecessary dependencies on MLflow or any other configuration file format or parsing library.

  • src_path – Current path to the model.
  • dst_path – Relative or absolute path where the model will be stored in the deployment

environment. :return: Python source code of the model loader as string.

mlflow.pyfunc.load_pyfunc(path, run_id=None)

Load a model stored in Python function format.

mlflow.pyfunc.log_model(artifact_path, **kwargs)

Export model in Python function form and log it with current MLflow tracking service.

Model is exported by calling @save_model and logging the result with @tracking.log_output_files.

mlflow.pyfunc.save_model(dst_path, loader_module, data_path=None, code_path=(), conda_env=None, model=<mlflow.models.Model object>)

Export model as a generic Python function model.

  • dst_path – Path where the model is stored.
  • loader_module – The module to be used to load the model.
  • data_path – Path to a file or directory containing model data.
  • code_path – List of paths (file or dir) contains code dependencies not present in

the environment. Every path in the code_path is added to the Python path before the model is loaded. :param conda_env: Path to the Conda environment definition. This environment is activated prior to running model code. :return: Model configuration containing model info.

mlflow.pyfunc.spark_udf(spark, path, run_id=None, result_type='double')

Return a Spark UDF that can be used to invoke the Python function formatted model.

Parameters passed to the UDF are forwarded to the model as a DataFrame where the names are simply ordinals (0, 1, …).


predict = mlflow.pyfunc.spark_udf(spark, "/my/local/model")
df.withColumn("prediction", predict("name", "age")).show()
  • spark – A SparkSession object.
  • path – A path containing a pyfunc model.
  • run_id – ID of the run that produced this model. If provided, run_id is used to

retrieve the model logged with MLflow. :param result_type: Spark UDF type returned by the model’s prediction method. Default double.