Coding Agents & Long-Running Agents
The MLflow AI Gateway supports popular AI coding agents such as Claude Code, OpenAI Codex, and Gemini CLI, as well as long-running agent runtimes like Hermes Agent.
Coding agents and long-running agent runtimes can make dozens or hundreds of LLM calls per session, often running autonomously for extended periods while also invoking tools and managing their own state. Without visibility or controls in place, it's easy to lose track of what they're doing, how much they're spending, and whether they're operating within your organization's policies. Routing these agents through the gateway gives your team three key capabilities:
Observability
Every request is automatically captured as an MLflow trace, giving you full visibility into inputs, outputs, token counts, and latency, without any code changes.
Budget Control
Set spending limits globally or per workspace to prevent runaway costs. Configure alerts and hard limits to keep coding agent sessions within budget.
Guardrails
Enforce content policies on all coding agent requests. Block sensitive topics, redact PII, and ensure responses meet your organization's standards.
Get Started

Claude Code
Route Claude Code through the gateway using your Anthropic subscription

OpenAI Codex
Route Codex through the gateway using your OpenAI subscription
Gemini CLI
Route Gemini CLI through the gateway using your Google subscription
Hermes Agent
Route Hermes Agent through the gateway using Hermes's custom provider support