Model Providers
MLflow AI Gateway supports 100+ model providers through the LiteLLM integration. This page covers the major providers, their capabilities, and how to use their passthrough APIs.
Supported Providers
The AI Gateway supports providers across these categories:
Major Cloud Providers
| Provider | Chat | Embeddings | Passthrough API |
|---|---|---|---|
| OpenAI | Yes | Yes | /gateway/openai/v1/... |
| Anthropic | Yes | No | /gateway/anthropic/v1/... |
| Google Gemini | Yes | Yes | /gateway/gemini/v1beta/... |
| Azure OpenAI | Yes | Yes | Via OpenAI passthrough |
| AWS Bedrock | Yes | Yes | - |
| Vertex AI | Yes | Yes | - |
Additional Providers
| Provider | Chat | Embeddings | Notes |
|---|---|---|---|
| Cohere | Yes | Yes | Command and Embed models |
| Mistral | Yes | Yes | Mistral AI models |
| Groq | Yes | No | Open-source models |
| Together AI | Yes | Yes | Open-source models |
| Fireworks AI | Yes | Yes | Open-source models |
| Ollama | Yes | Yes | Local models |
| Databricks | Yes | Yes | Foundation Model APIs |
For a complete list of supported providers, view the provider dropdown when creating an endpoint or see the LiteLLM documentation.
Provider-Specific Passthrough APIs
OpenAI
The OpenAI passthrough exposes the full OpenAI API:
Base URL: http://localhost:5000/gateway/openai/v1
Supported Endpoints:
POST /chat/completions- Chat completionsPOST /embeddings- Text embeddingsPOST /responses- Responses API (multi-turn conversations)
- Python SDK
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy", # Not needed, configured server-side
)
# Chat completion
response = client.chat.completions.create(
model="my-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)
# Embeddings
embeddings = client.embeddings.create(
model="my-embeddings-endpoint",
input="Text to embed",
)
# Responses API
response = client.responses.create(
model="my-endpoint",
input="Hello!",
)
# Chat completion
curl -X POST http://localhost:5000/gateway/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "my-endpoint", "messages": [{"role": "user", "content": "Hello!"}]}'
# Embeddings
curl -X POST http://localhost:5000/gateway/openai/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "my-embeddings-endpoint", "input": "Text to embed"}'
# Responses API
curl -X POST http://localhost:5000/gateway/openai/v1/responses \
-H "Content-Type: application/json" \
-d '{"model": "my-endpoint", "input": "Hello!"}'
See OpenAI API Reference for complete documentation.
Anthropic
Access Claude models through the Anthropic passthrough:
Base URL: http://localhost:5000/gateway/anthropic
Supported Endpoints:
POST /v1/messages- Messages API
- Python SDK
- cURL
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:5000/gateway/anthropic",
api_key="dummy", # Not needed, configured server-side
)
response = client.messages.create(
model="my-endpoint",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)
curl -X POST http://localhost:5000/gateway/anthropic/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "my-endpoint",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
See Anthropic API Reference for complete documentation.
Google Gemini
Access Gemini models through Google's API format:
Base URL: http://localhost:5000/gateway/gemini
Supported Endpoints:
POST /v1beta/models/{model}:generateContent- Content generationPOST /v1beta/models/{model}:streamGenerateContent- Streaming generation
- Python SDK
- cURL
from google import genai
client = genai.Client(
api_key="dummy",
http_options={
"base_url": "http://localhost:5000/gateway/gemini",
},
)
response = client.models.generate_content(
model="my-endpoint",
contents={"text": "Hello!"},
)
client.close()
print(response.candidates[0].content.parts[0].text)
curl -X POST http://localhost:5000/gateway/gemini/v1beta/models/my-endpoint:generateContent \
-H "Content-Type: application/json" \
-d '{"contents": [{"parts": [{"text": "Hello!"}]}]}'
See Google Gemini API Reference for complete documentation.
Azure OpenAI
Azure OpenAI uses the same passthrough as OpenAI with additional configuration:
Base URL: http://localhost:5000/gateway/openai/v1
When creating an Azure OpenAI endpoint:
- Select Azure OpenAI as the provider
- Enter your Azure endpoint URL
- Enter your Azure API key
- Specify your deployment name
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy",
)
response = client.chat.completions.create(
model="my-azure-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)
Databricks Foundation Models
Databricks Foundation Models APIs are OpenAI-compatible:
Base URL: http://localhost:5000/gateway/openai/v1
When creating a Databricks endpoint:
- Select Databricks as the provider
- Enter your Databricks workspace URL
- Enter your Databricks personal access token
- Specify the model endpoint name
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy",
)
response = client.chat.completions.create(
model="my-databricks-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)
Model Capabilities
When creating endpoints, the model selector shows capability badges:
| Badge | Description |
|---|---|
| Tools | Model supports function/tool calling |
| Reasoning | Model has enhanced reasoning capabilities |
| Caching | Model supports prompt caching for efficiency |
| Vision | Model can process images |
Additional information displayed:
- Context window: Maximum tokens the model can process
- Token costs: Input and output pricing per million tokens