Export / Import of generic python models.
This module defines generic filesystem format for python models and provides utilities for saving and loading to and from this format. The format is self contained in a sense that it includes all necessary information for anyone to load it and use it. Dependencies are either stored directly with the model or referenced via a conda environment.
The convention for pyfunc models is to have a predict method or function with the following signature
predict(data: pandas.DataFrame) -> numpy.ndarray | pandas.Series | pandas.DataFrame
This convention is relied upon by other mlflow components.
Pyfunc model format is defined as a directory structure containing all required data, code and configuration:
- ./MLmodel - config <code> - any code packaged with the model (specified in the conf file, see below) <data> - any data packaged with the model (specified in the conf file, see below) <env> - conda environment definition (specified in the conf file, see below)
It must contain MLmodel file in its root with “python_function” format with the following :param - loader_module [required]:
- Python module that can load the model. Expected as module identifier
mlflow.sklearn, it will be imported via importlib.import_module.
The imported module must contain function with the following signature:load_pyfunc(path: string) -> <pyfunc model>
The path argument is specified by the data parameter and may refer to a file or directory.
|param - code [optional]:|
|relative path to a directory containing the code packaged with this model. All files and directories inside this directory are added to the python path prior to importing the model loader.|
|param - data [optional]:|
|relative path to a file or directory containing model data. the path is passed to the model loader.|
|param - env [optional]:|
|relative path to an exported conda environment. If present this environment should be activated prior to running the model.|
``` >tree example/sklearn_iris/mlruns/run1/outputs/linear-lr ├── MLmodel ├── code │ ├── sklearn_iris.py │ ├── data │ └── model.pkl └── mlflow_env.yml
>cat example/sklearn_iris/mlruns/run1/outputs/linear-lr/MLmodel python_function:
code: code data: data/model.pkl env: mlflow_env.yml main: sklearn_iris
``` Todo: * Get default conda_env of the project.
add_to_model(model, loader_module, data=None, code=None, env=None)
Add pyfunc spec to the model configuration.
Defines pyfunc configuration schema. Caller can use this to create a valid pyfunc model flavor out of an existing directory structure. For example, other model flavors can use this to specify how to use their output as a pyfunc.
NOTE: all paths are relative to the exported model root directory.
- loader_module –
- model – Existing servable
- data – to the model data
- code – path to the code dependencies
- env – conda environment
updated model configuration.
Generate python source of the model loader.
Model loader contains load_pyfunc method with no parameters. It basically hardcodes model loading of the given model into a python source. This is done so that the exported model has no unnecessary dependencies on mlflow or any other configuration file format or parsing library.
- src_path – current path to the model
- dst_path – relative or absolute path where the model will be stored in the deployment environment
python source code of the model loader as string.
Export the model in python-function form and log it with current mlflow tracking service.
Model is exported by calling @save_model and logs the result with @tracking.log_output_files
save_model(dst_path, loader_module, data_path=None, code_path=(), conda_env=None, model=<mlflow.models.Model object>)
Export model as a generic python-function model.
- dst_path (str) – path where the model is gonna be stored.
- loader_module (str) – the module to be used to load the model.
- data_path (str) – path to a file or directory containing model data.
- code_path (list[str]) – list of paths (file or dir) contains code dependencies not present in the environment. every path in the code_path is added to the python path before the model is loaded.
- conda_env (str) – path to the conda environment definition (.yml). This environment will be activated prior to running model code.
model config (Servable) containing model info. :param dst_path: :param loader_module: :param data_path: :param code_path: :param conda_env: :param model:
spark_udf(spark, path, run_id=None, result_type='double')
Returns a Spark UDF that can be used to invoke the python-function formatted model.
Note that parameters passed to the UDF will be forwarded to the model as a DataFrame where the names are simply ordinals (0, 1, …).
predict = mlflow.pyfunc.spark_udf(spark, “/my/local/model”) df.withColumn(“prediction”, predict(“name”, “age”)).show()
- spark (SparkSession) – a SparkSession object
- path (str) – A path containing a pyfunc model.
- run_id – Id of the run that produced this model.
- provided, run_id is used to retrieve the model logged with mlflow. (If) –
- result_type (str) – Spark UDF type returned by the model’s prediction method. Default double