mlflow.spark

MLflow integration for Spark MLlib models.

Spark MLlib models are saved and loaded using native Spark MLlib persistence. The models can be exported as pyfunc for out-of Spark deployment or it can be loaded back as Spark Transformer in order to score it in Spark. The pyfunc flavor instantiates SparkContext internally and reads the input data as Spark DataFrame prior to scoring.

mlflow.spark.load_model(path, run_id=None)

Load the Spark MLlib model from the path.

Parameters:
  • run_id – Run ID. If provided, combined with path to identify the model.
  • path – Local filesystem path or run-relative artifact path to the model.
Returns:

SparkML model.

Return type:

pyspark.ml.pipeline.PipelineModel

mlflow.spark.load_pyfunc(path)

Load a Python Function model from a local file.

Parameters:path – Local path.
Returns:The model as PyFunc.
mlflow.spark.log_model(spark_model, artifact_path, conda_env=None, jars=None)

Log a Spark MLlib model as an MLflow artifact for the current run.

Parameters:
  • spark_model – PipelineModel to be saved.
  • artifact_path – Run relative artifact path.
  • conda_env – Path to a Conda environment file. If provided, defines environment for the

model. At minimum, it should specify python, pyspark, and mlflow with appropriate versions. :param jars: List of JARs needed by the model.

mlflow.spark.save_model(spark_model, path, mlflow_model=<mlflow.models.Model object>, conda_env=None, jars=None)

Save Spark MLlib PipelineModel at given local path.

Uses Spark MLlib persistence mechanism.

Parameters:
  • spark_model – Spark PipelineModel to be saved. Can save only PipelineModels.
  • path – Local path where the model is to be saved.
  • mlflow_model – MLflow model config this flavor is being added to.
  • conda_env – Conda environment this model depends on.
  • jars – List of JARs needed by the model.