Skip to main content

Model Providers

MLflow AI Gateway supports 100+ model providers through the LiteLLM integration. This page covers the major providers, their capabilities, and how to use their passthrough APIs.

Supported Providers

The AI Gateway supports providers across these categories:

Major Cloud Providers

ProviderChatEmbeddingsPassthrough API
OpenAIYesYes/gateway/openai/v1/...
AnthropicYesNo/gateway/anthropic/v1/...
Google GeminiYesYes/gateway/gemini/v1beta/...
Azure OpenAIYesYesVia OpenAI passthrough
AWS BedrockYesYes-
Vertex AIYesYes-

Additional Providers

ProviderChatEmbeddingsNotes
CohereYesYesCommand and Embed models
MistralYesYesMistral AI models
GroqYesNoOpen-source models
Together AIYesYesOpen-source models
Fireworks AIYesYesOpen-source models
OllamaYesYesLocal models
DatabricksYesYesFoundation Model APIs

For a complete list of supported providers, view the provider dropdown when creating an endpoint or see the LiteLLM documentation.

Provider-Specific Passthrough APIs

OpenAI

The OpenAI passthrough exposes the full OpenAI API:

Base URL: http://localhost:5000/gateway/openai/v1

Supported Endpoints:

  • POST /chat/completions - Chat completions
  • POST /embeddings - Text embeddings
  • POST /responses - Responses API (multi-turn conversations)
python
from openai import OpenAI

client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy", # Not needed, configured server-side
)

# Chat completion
response = client.chat.completions.create(
model="my-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)

# Embeddings
embeddings = client.embeddings.create(
model="my-embeddings-endpoint",
input="Text to embed",
)

# Responses API
response = client.responses.create(
model="my-endpoint",
input="Hello!",
)

See OpenAI API Reference for complete documentation.

Anthropic

Access Claude models through the Anthropic passthrough:

Base URL: http://localhost:5000/gateway/anthropic

Supported Endpoints:

  • POST /v1/messages - Messages API
python
import anthropic

client = anthropic.Anthropic(
base_url="http://localhost:5000/gateway/anthropic",
api_key="dummy", # Not needed, configured server-side
)

response = client.messages.create(
model="my-endpoint",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)

See Anthropic API Reference for complete documentation.

Google Gemini

Access Gemini models through Google's API format:

Base URL: http://localhost:5000/gateway/gemini

Supported Endpoints:

  • POST /v1beta/models/{model}:generateContent - Content generation
  • POST /v1beta/models/{model}:streamGenerateContent - Streaming generation
python
from google import genai

client = genai.Client(
api_key="dummy",
http_options={
"base_url": "http://localhost:5000/gateway/gemini",
},
)

response = client.models.generate_content(
model="my-endpoint",
contents={"text": "Hello!"},
)
client.close()
print(response.candidates[0].content.parts[0].text)

See Google Gemini API Reference for complete documentation.

Azure OpenAI

Azure OpenAI uses the same passthrough as OpenAI with additional configuration:

Base URL: http://localhost:5000/gateway/openai/v1

When creating an Azure OpenAI endpoint:

  1. Select Azure OpenAI as the provider
  2. Enter your Azure endpoint URL
  3. Enter your Azure API key
  4. Specify your deployment name
python
from openai import OpenAI

client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy",
)

response = client.chat.completions.create(
model="my-azure-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)

Databricks Foundation Models

Databricks Foundation Models APIs are OpenAI-compatible:

Base URL: http://localhost:5000/gateway/openai/v1

When creating a Databricks endpoint:

  1. Select Databricks as the provider
  2. Enter your Databricks workspace URL
  3. Enter your Databricks personal access token
  4. Specify the model endpoint name
python
from openai import OpenAI

client = OpenAI(
base_url="http://localhost:5000/gateway/openai/v1",
api_key="dummy",
)

response = client.chat.completions.create(
model="my-databricks-endpoint",
messages=[{"role": "user", "content": "Hello!"}],
)

Model Capabilities

When creating endpoints, the model selector shows capability badges:

BadgeDescription
ToolsModel supports function/tool calling
ReasoningModel has enhanced reasoning capabilities
CachingModel supports prompt caching for efficiency
VisionModel can process images

Additional information displayed:

  • Context window: Maximum tokens the model can process
  • Token costs: Input and output pricing per million tokens