Skip to main content

Gemini CLI + MLflow AI Gateway

Route Gemini CLI through the MLflow AI Gateway to get centralized tracing and observability, while each developer authenticates with their own Google subscription.

Prerequisites

  • MLflow server running with a SQL backend (mlflow server --port 5000)
  • Gemini CLI installed (npm install -g @google/gemini-cli)

Step 1: Create a Gemini Endpoint

Navigate to the AI Gateway tab at http://localhost:5000/#/gateway and click Create Endpoint.

  • Provider: Gemini
  • Model: gemini-3.0-flash (or your preferred model)
  • Endpoint name: choose a name, e.g. my-gemini-endpoint
  • Authentication: select API Key based authentication
  • LLM Connection: select an existing connection or create a new one (see Create an LLM Connection)
tip

The server-side API key in the LLM Connection can be set to a dummy value (e.g. dummy). The gateway detects Gemini CLI's User-Agent and forwards the client's own GEMINI_API_KEY to the upstream provider instead.

Step 2: Configure Environment Variables

Set the following environment variables so Gemini CLI routes through the gateway and uses your endpoint:

bash
export GOOGLE_GEMINI_BASE_URL="http://localhost:5000/gateway/gemini"
export GEMINI_MODEL="my-gemini-endpoint" # your endpoint name from Step 1
export GEMINI_API_KEY="your-google-api-key"

Step 3: Run Gemini CLI

bash
gemini

Gemini CLI authenticates using your existing Google credentials and all requests are proxied through the gateway.

What You Get

Every conversation is captured as an MLflow trace. Open the Logs tab in the MLflow UI to inspect inputs, outputs, token usage, and latency for every request.

Gemini CLI trace in MLflow