Model Signatures and Input Examples
Model signatures and input examples are foundational components that define how your models should be used, ensuring consistent and reliable interactions across MLflow's ecosystem.
What Are Model Signatures and Input Examples?
Model Signature - Defines the expected format for model inputs, outputs, and parameters. Think of it as a contract that specifies exactly what data your model expects and what it will return.
Model Input Example - Provides a concrete example of valid model input. This helps developers understand the required data format and validates that your model works correctly.
Why They Matter
Model signatures and input examples provide crucial benefits:
- Consistency: Ensure all model interactions follow the same data format
- Validation: Catch data format errors before they reach your model
- Documentation: Serve as living documentation for model usage
- Deployment Safety: Enable MLflow deployment tools to validate requests automatically
- UI Integration: Allow MLflow UI to display clear model requirements
Model signatures are REQUIRED for registering models in Databricks Unity Catalog. Unity Catalog enforces concrete type definitions for all registered models and will reject models without proper signatures. Always include a signature when logging models that you plan to register in Databricks environments.
# ✅ Required for Databricks registration
mlflow.sklearn.log_model(
model,
name="my_model",
input_example=X_sample, # Generates required signature
signature=signature, # Or provide explicit signature
)
# ❌ Will fail in Databricks Unity Catalog
mlflow.sklearn.log_model(model, name="my_model") # No signature
Quick Start: Adding Signatures to Your Models
The easiest way to add a signature is to provide an input example when logging your model:
import mlflow
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
# Load data and train model
iris = load_iris(as_frame=True)
X = iris.data
y = iris.target
model = RandomForestClassifier().fit(X, y)
with mlflow.start_run():
# The input example automatically generates a signature
mlflow.sklearn.log_model(
model, name="iris_model", input_example=X.iloc[[0]] # First row as example
)
MLflow automatically:
- Infers the signature from your input example
- Validates the model works with the example
- Stores both signature and example with your model
MLflow automatically generates model signatures when you provide an input_example
during model logging. This works for all model flavors and is the recommended approach for most use cases.
Understanding Model Signatures
Model signatures consist of three components:
- Inputs Schema
- Outputs Schema
- Parameters Schema
Defines the structure and types of data your model expects:
# Column-based signature (DataFrames)
input_schema = Schema(
[
ColSpec("double", "sepal_length"),
ColSpec("double", "sepal_width"),
ColSpec("string", "species", required=False), # Optional field
]
)
# Tensor-based signature (NumPy arrays)
input_schema = Schema(
[TensorSpec(np.dtype(np.float32), (-1, 28, 28, 1))] # Batch of 28x28 images
)
Key Features:
Support for both tabular (DataFrame) and tensor (NumPy) data, optional fields using required=False
, and rich data type support including arrays and objects.
Specifies what your model returns:
# Single prediction column
output_schema = Schema([ColSpec("long", "prediction")])
# Multiple outputs
output_schema = Schema(
[
ColSpec("double", "probability"),
ColSpec("string", "predicted_class"),
ColSpec("long", "confidence_score"),
]
)
# Tensor output
output_schema = Schema(
[TensorSpec(np.dtype(np.float32), (-1, 10))] # 10-class probabilities
)
Defines optional inference parameters (like temperature, max_length):
# Define inference parameters
params_schema = ParamSchema(
[
ParamSpec("temperature", "double", 0.7), # Default temperature
ParamSpec("max_tokens", "long", 100), # Default max tokens
ParamSpec("stop_words", "string", [".", "!"], (-1,)), # List parameter
]
)
# Use in model signature
signature = ModelSignature(
inputs=input_schema, outputs=output_schema, params=params_schema
)
Common Parameters:
temperature
controls randomness in generation, max_length
/max_tokens
limits output length, top_k
and top_p
control sampling strategies, and repetition_penalty
reduces repetitive outputs.
Signature Types Overview
MLflow supports two primary signature types:
Column-Based Signatures - For tabular data (DataFrames, dictionaries):
# Perfect for traditional ML models
{"feature_1": 1.5, "feature_2": "category_a", "feature_3": [1, 2, 3]}
Tensor-Based Signatures - For array data (images, audio, embeddings):
# Perfect for deep learning models
np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [1, 2, 3]]]) # Shape: (2, 2, 3)
Type Hints for Model Signatures
Type hint support was introduced in MLflow 2.20.0. If you are using an earlier version of MLflow, see the Working with Signatures section.
You can use Python type hints to automatically define model signatures and enable data validation. This provides a more Pythonic way to specify your model's interface while getting automatic validation and schema inference.
- Overview & Benefits
- Supported Type Hints
- Pydantic Models
- Validation & Conversion
- Special Type Hints
- Serving & Deployment
Quick Start with Type Hints
import mlflow
from typing import List, Dict, Optional
import pydantic
class Message(pydantic.BaseModel):
role: str
content: str
metadata: Optional[Dict[str, str]] = None
class CustomModel(mlflow.pyfunc.PythonModel):
def predict(self, model_input: List[Message]) -> List[str]:
# Signature automatically inferred from type hints!
return [msg.content for msg in model_input]
# Log model - signature is auto-generated from type hints
with mlflow.start_run():
mlflow.pyfunc.log_model(
name="chat_model",
python_model=CustomModel(),
input_example=[
{"role": "user", "content": "Hello"}
], # Validates against type hints
)
Key Benefits
- Automatic Validation: Input data validated against type hints at runtime
- Schema Inference: Model signature automatically generated from type annotations
- Type Safety: Catch type mismatches before they reach your model
- IDE Support: Better autocomplete and error detection during development
- Documentation: Type hints serve as self-documenting code
- Consistency: Same validation for PythonModel instances and loaded PyFunc models
When to Use Type Hints
✅ Recommended for: Complex data structures (chat messages, tool definitions, nested objects), models requiring strict input validation, teams using modern Python development practices, and GenAI and LLM applications with structured inputs.
⚠️ Consider alternatives for: Simple tabular data (DataFrames work fine with input examples), legacy codebases without type hint adoption, and models with highly dynamic input structures.
Input Type Requirements
Input signatures must be List[...]
since PythonModel expects batch data:
# ✅ Correct - Always use List wrapper
def predict(self, model_input: List[str]) -> List[str]:
...
def predict(self, model_input: List[Message]) -> List[Dict]:
...
# ❌ Incorrect - Missing List wrapper
def predict(self, model_input: str) -> str:
...
def predict(self, model_input: Message) -> Dict:
...
Primitive Types
List[str] # String inputs
List[int] # Integer inputs
List[float] # Float inputs
List[bool] # Boolean inputs
List[bytes] # Binary data
List[datetime.datetime] # Timestamps
Collection Types
List[List[str]] # Nested lists
List[Dict[str, int]] # Dictionaries
List[Dict[str, List[str]]] # Complex nested structures
Union and Optional Types
List[Union[int, str]] # Multiple possible types (becomes AnyType)
List[Optional[str]] # Optional fields (in Pydantic models only)
List[Any] # Any type (no validation)
Pydantic Models (Recommended)
class UserData(pydantic.BaseModel):
name: str
age: int
email: Optional[str] = None # Optional with default
preferences: List[str] = [] # List with default
List[UserData] # Clean, validated structure
Type Hint to Schema Mapping
Type Hint | Generated Schema |
---|---|
List[str] | Schema([ColSpec(type=DataType.string)]) |
List[List[str]] | Schema([ColSpec(type=Array(DataType.string))]) |
List[Dict[str, str]] | Schema([ColSpec(type=Map(DataType.string))]) |
List[Union[int, str]] | Schema([ColSpec(type=AnyType())]) |
List[Message] | Schema([ColSpec(type=Object(...))]) |
Basic Pydantic Usage
import pydantic
from typing import Optional, List, Dict
class Message(pydantic.BaseModel):
role: str
content: str
timestamp: Optional[str] = None
class CustomModel(mlflow.pyfunc.PythonModel):
def predict(self, model_input: List[Message]) -> List[str]:
return [f"{msg.role}: {msg.content}" for msg in model_input]
# Both work - automatic conversion
model.predict([Message(role="user", content="Hi")]) # Pydantic object
model.predict([{"role": "user", "content": "Hi"}]) # Dict (auto-converted)