Deploy MLflow Model to Kubernetes

Using MLServer as the Inference Server

By default, MLflow deployment uses Flask, a widely used WSGI web application framework for Python, to serve the inference endpoint. However, Flask is mainly designed for a lightweight application and might not be suitable for production use cases at scale. To address this gap, MLflow integrates with MLServer as an alternative deployment option, which is used as a core Python inference server in Kubernetes-native frameworks like Seldon Core and KServe (formerly known as KFServing). Using MLServer, you can take advantage of the scalability and reliability of Kubernetes to serve your model at scale. See Serving Framework for the detailed comparison between Flask and MLServer, and why MLServer is a better choice for ML production use cases.

Deployment Steps

Please refer to the following partner documentations for deploying MLflow Models to Kubernetes using MLServer:

Tutorial

You can also learn how to train a model in MLflow and deploy to Kubernetes in the following tutorial:

This tutorial walks you through the end-to-end ML development process from training a machine learning mdoel, compare the performance, and deploy the model to Kubernetes using KServe.