Skip to main content

AI Gateway Configuration

Configure providers, endpoints, and advanced settings for your MLflow AI Gateway.

Provider Configurations

Configure endpoints for different LLM providers using these YAML examples:

yaml
endpoints:
- name: gpt4-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY
openai_api_base: https://api.openai.com/v1 # Optional
openai_organization: your_org_id # Optional
note

MosaicML PaLM, and Cohere providers are deprecated, will be removed in a future MLflow version.

Environment Variables

Store API keys as environment variables for security:

bash
# OpenAI
export OPENAI_API_KEY=sk-...

# Azure OpenAI
export AZURE_OPENAI_API_KEY=your-azure-key
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# AWS Bedrock
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1

# Cohere
export COHERE_API_KEY=...

Advanced Configuration

Rate Limiting

Configure rate limits per endpoint:

yaml
endpoints:
- name: rate-limited-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
limit:
renewal_period: minute
calls: 100 # max calls per renewal period

Model Parameters

Set default model parameters:

yaml
endpoints:
- name: configured-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY
temperature: 0.7
max_tokens: 1000
top_p: 0.9

Multiple Endpoints

Configure multiple endpoints for different use cases:

yaml
endpoints:
# Fast, cost-effective endpoint
- name: fast-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-3.5-turbo
config:
openai_api_key: $OPENAI_API_KEY

# High-quality endpoint
- name: quality-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY

# Embeddings endpoint
- name: embeddings
endpoint_type: llm/v1/embeddings
model:
provider: openai
name: text-embedding-ada-002
config:
openai_api_key: $OPENAI_API_KEY

Traffic route

Add the routes configuration to split incoming traffic to multiple endpoints:

yaml
endpoints:
- name: chat1
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-5
config:
openai_api_key: $OPENAI_API_KEY

- name: chat2
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-4.1
config:
openai_api_key: $OPENAI_API_KEY

routes:
- name: chat-route
task_type: llm/v1/chat
destinations:
- name: chat1
traffic_percentage: 80
- name: chat2
traffic_percentage: 20
routing_strategy: TRAFFIC_SPLIT

Currently, MLflow only support the TRAFFIC_SPLIT strategy which randomly route incoming requests based on the configured percentage.

Dynamic Configuration Updates

The AI Gateway supports hot-reloading of configurations without server restart. Simply update your config.yaml file and changes are detected automatically.

Security Best Practices

API Key Management

  1. Never commit API keys to version control
  2. Use environment variables for all sensitive credentials
  3. Rotate keys regularly and update environment variables
  4. Use separate keys for development and production

Network Security

  1. Use HTTPS in production with proper TLS certificates
  2. Implement authentication and authorization layers
  3. Configure firewalls to restrict access to the gateway
  4. Monitor and log all gateway requests for audit trails

Configuration Security

yaml
# Secure configuration example
endpoints:
- name: production-chat
endpoint_type: llm/v1/chat
model:
provider: openai
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY # From environment
limit:
renewal_period: minute
calls: 1000

Next Steps

Now that your providers are configured, learn how to use and integrate your gateway: