mlflow.spark

MLflow integration for Spark MLlib models.

Spark MLlib models are saved and loaded using native Spark MLlib persistence. The models can be exported as pyfunc for out-of Spark deployment or it can be loaded back as Spark Transformer in order to score it in Spark. The pyfunc flavor instantiates SparkContext internally and reads the input data as Spark DataFrame prior to scoring.

mlflow.spark.load_model(path, run_id=None)

Load the Spark MLlib model from the given path. :param run_id: Run ID. If provided it is combined with path to identify the model. :param path: Local filesystem path or Run-relative artifact path to the model. :return: SparkML model. :rtype: pyspark.ml.pipeline.PipelineModel

mlflow.spark.load_pyfunc(path)

Load the model as PyFunc. :param path: Local path :return: The model as PyFunc.

mlflow.spark.log_model(spark_model, artifact_path, conda_env=None, jars=None)

Log a Spark MLlib model as an MLflow artifact for the current run.

Parameters:
  • spark_model – PipelineModel to be saved.
  • artifact_path – Run-relative artifact path.
  • conda_env – Path to a Conda environment file. If provided, this defines enrionment for the model. At minimum, it should specify python, pyspark and mlflow with appropriate versions.
  • jars – List of jars needed by the model.
mlflow.spark.save_model(spark_model, path, mlflow_model=<mlflow.models.Model object>, conda_env=None, jars=None)

Save Spark MLlib PipelineModel at given local path.

Uses Spark MLlib persistence mechanism.

Parameters:
  • spark_model – Spark PipelineModel to be saved. Currently can only save PipelineModels.
  • path – Local path where the model is to be saved.
  • mlflow_model – MLflow model config this flavor is being added to.
  • conda_env – Conda environment this model depends on.
  • jars – List of jars needed by the model.