Deploy MLflow Model to Amazon SageMaker
Amazon SageMaker is a fully managed service designed for scaling ML inference containers. MLflow simplifies the deployment process by offering easy-to-use commands without the need for writing container definitions.
If you are new to MLflow model deployment, please read MLflow Deployment first to understand the basic concepts of MLflow models and deployments.
How it works​
SageMaker features a capability called Bring Your Own Container (BYOC), which allows you to run custom Docker containers on the inference endpoint. These containers must meet specific requirements, such as running a web server that exposes certain REST endpoints, having a designated container entrypoint, setting environment variables, etc. Writing a Dockerfile and serving script that meets these requirements can be a tedious task.
MLflow automates the process by building a Docker image from the MLflow Model on your behalf. Subsequently, it pushed the image to Elastic Container Registry (ECR) and creates a SageMaker endpoint using this image. It also uploads the model artifact to an S3 bucket and configures the endpoint to download the model from there.
The container provides the same REST endpoints as a local inference server. For instance, the /invocations endpoint accepts CSV and JSON input data and returns
prediction results. For more details on the endpoints, refer to Local Inference Server.
Deploying Model to SageMaker Endpoint​
This section outlines the process of deploying a model to SageMaker using the MLflow CLI. For Python API references and tutorials, see the Useful links section.
Step 0: Preparation​
Install Tools​
Ensure the installation of the following tools if not already done: