Skip to main content

Tracing Databricks AI Gateway

Databricks AI Gateway (formerly Mosaic AI Gateway) is the Databricks solution for governing and monitoring access to generative AI models and their associated model serving endpoints. It is a centralized service that brings governance, monitoring, and production readiness to model serving endpoints.

Looking for Databricks Foundation Model APIs?

This guide covers tracing LLM calls through Databricks AI Gateway. If you're using Databricks Foundation Model APIs directly, see the Databricks Integration guide instead.

What is Databricks AI Gateway?

Databricks AI Gateway streamlines the usage and management of generative AI models and agents within an organization. Key capabilities include:

  • Governance: Control access with permissions and rate limiting
  • Monitoring: Track usage and costs with system tables, audit requests with payload logging
  • Safety: Configure AI guardrails to prevent harmful content and detect PII
  • Reliability: Minimize outages with fallbacks and load balance traffic across models

All data is logged into Delta tables in Unity Catalog.

Integration Options

There are two ways to trace LLM calls through Databricks AI Gateway:

ApproachDescriptionBest For
Server-side (Inference Tables)Databricks automatically logs all requests to inference tablesCentralized tracing for all requests through the gateway
Client-side TracingUse OpenAI SDK with MLflow autologCombining LLM traces with your agent or application traces
note

With server-side tracing, all requests through the gateway are captured in inference tables, regardless of which client made them. For application-specific tracing where you want to combine LLM calls with your application logic, use client-side tracing.

Option 1: Server-side Tracing (Inference Tables)

Databricks AI Gateway can automatically log all requests and responses to inference tables in Unity Catalog. This provides centralized monitoring without any client-side code changes.

See the Databricks Inference Tables documentation for setup instructions.

Option 2: Client-side Tracing

If you want to trace LLM calls as part of your application traces, you can use MLflow's automatic tracing with the OpenAI SDK or other supported SDKs.

1

Install Dependencies

bash
pip install 'mlflow[genai]' openai
2

Set tracking URI and experiment

To save traces to Databricks-managed MLflow, set the tracking URI to databricks and the experiment to the name of your Databricks workspace.

python
import mlflow

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("<your-databricks-workspace>")

When you want to self-host MLflow outside Databricks, follow the Self-hosting guide to set up your MLflow server and set the tracking URI accordingly.

3

Enable Tracing and Make API Calls

Since Databricks AI Gateway exposes an OpenAI-compatible API, enable tracing with mlflow.openai.autolog() and configure the OpenAI client to use your Databricks serving endpoint.

python
import mlflow
from openai import OpenAI

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Point OpenAI client to Databricks AI Gateway
client = OpenAI(
base_url="https://<databricks-workspace>/serving-endpoints",
api_key="<DATABRICKS_TOKEN>",
)

# Make API calls - traces will be captured automatically
response = client.chat.completions.create(
model="<your-endpoint-name>", # Your Databricks serving endpoint name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
)
print(response.choices[0].message.content)
4

View Traces

  • On Databricks: Navigate to the MLflow Experiments page in your workspace
  • Local MLflow: Open the MLflow UI at http://localhost:5000

Next Steps