MLflow Agent Server

Agent Server Features

Simple FastAPI server to host agents at /invocations endpoint
Decorator-based function registration (@invoke, @stream) for easy agent development
Automatic request and response validation for Responses API schema agents
Automatic MLflow tracing integration and aggregation

Full Example

In this example, we'll use the openai-agents-sdk to define our Responses API compatible agent. See the openai-agents-sdk quickstart for more information.

Install the openai-agents-sdk and mlflow, and set your OpenAI API key:

bash
pip install -U openai-agents 'mlflow>=3.6.0'
export OPENAI_API_KEY=sk-...

Define your agent in agent.py and create methods to annotate with @invoke:

python
from agents import Agent, Runner
from mlflow.genai.agent_server import invoke, stream
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse

agent = Agent(
    name="Math Tutor",
    instructions="You provide help with math problems. Explain your reasoning and include examples",
)


@invoke()
async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
    msgs = [i.model_dump() for i in request.input]
    result = await Runner.run(agent, msgs)
    return ResponsesAgentResponse(output=[item.to_input_item() for item in result.new_items])


# You can also optionally register a @stream function to support streaming responses

Define a start_server.py file to start the AgentServer:

python
# Need to import the agent to register the functions with the server
import agent  # noqa: F401
from mlflow.genai.agent_server import (
    AgentServer,
    setup_mlflow_git_based_version_tracking,
)

agent_server = AgentServer("ResponsesAgent")
app = agent_server.app

# Optionally, set up MLflow git-based version tracking
# to correspond your agent's traces to a specific git commit
setup_mlflow_git_based_version_tracking()


def main():
    # To support multiple workers, pass the app as an import string
    agent_server.run(app_import_string="start_server:app")


if __name__ == "__main__":
    main()

Deploying and Testing Your Agent

Run your agent server with the --reload flag to automatically reload the server on code changes:

bash
python3 start_server.py --reload
# Pass in a number of workers to support multiple concurrent requests
# python3 start_server.py --workers 4
# Pass in a port to run the server on
# python3 start_server.py --reload --port 8000

Send a request to the server to test your agent out:

bash
curl -X POST http://localhost:8000/invocations \
   -H "Content-Type: application/json" \
   -d '{ "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}]}'

After testing your agent, you can view the traces in the MLflow UI by clicking on "Traces" tab.

If you have registered a @stream function, you can send a streaming request to the server by passing in "stream": true:

bash
curl -X POST http://localhost:8000/invocations \
   -H "Content-Type: application/json" \
   -d '{
    "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
    "stream": true
    }'

Evaluating Your Agent

You can use mlflow.genai.evaluate() to evaluate your agent. See the Evaluating Agents guide and Scorer documentation for more information.

Define a file like eval_agent.py to evaluate your agent:

python
import asyncio

import mlflow

# need to import agent for our @invoke-registered function to be found
from agent import agent  # noqa: F401
from mlflow.genai.agent_server import get_invoke_function
from mlflow.genai.scorers import RelevanceToQuery, Safety
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse

eval_dataset = [
    {
        "inputs": {
            "request": {"input": [{"role": "user", "content": "What's the 15th Fibonacci number"}]}
        },
        "expected_response": "The 15th Fibonacci number is 610.",
    }
]


def sync_invoke_fn(request: dict) -> ResponsesAgentResponse:
    # Get the invoke function that was registered via @invoke decorator in your agent
    invoke_fn = get_invoke_function()
    return asyncio.run(invoke_fn(ResponsesAgentRequest(**request)))


mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=sync_invoke_fn,
    scorers=[RelevanceToQuery(), Safety()],
)

Run the evaluation:
bash
```
python3 eval_agent.py
```
You should see the evaluation results and MLflow run information in the console output. In the MLflow UI, you can find the resulting runs from the evaluation on the experiment page. Click the run name to view the aggregated metrics and metadata in the overview pane.

Agent Server Features​

Full Example​

Deploying and Testing Your Agent​

Evaluating Your Agent​

Agent Server Features

Full Example

Deploying and Testing Your Agent

Evaluating Your Agent