Skip to main content

MLflow Agent Server

Agent Server Features

note

The MLflow Agent Server was released with MLflow 3.6.0. It is currently under active development and is marked as Experimental. Public APIs are subject to change, and new features are being added to enhance its functionality.

  • Simple FastAPI server to host agents at /invocations endpoint
  • Decorator-based function registration (@invoke, @stream) for easy agent development
  • Automatic request and response validation for Responses API schema agents
  • Automatic MLflow tracing integration and aggregation

Full Example

In this example, we'll use the openai-agents-sdk to define our Responses API compatible agent. See the openai-agents-sdk quickstart for more information.

  1. Install the openai-agents-sdk and mlflow, and set your OpenAI API key:

    bash
    pip install -U openai-agents mlflow>=3.6.0
    export OPENAI_API_KEY=sk-...
  2. Define your agent in agent.py and create methods to annotate with @invoke:

    python
    from agents import Agent, Runner
    from mlflow.genai.agent_server import invoke, stream
    from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse

    agent = Agent(
    name="Math Tutor",
    instructions="You provide help with math problems. Explain your reasoning and include examples",
    )


    @invoke()
    async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
    msgs = [i.model_dump() for i in request.input]
    result = await Runner.run(agent, msgs)
    return ResponsesAgentResponse(
    output=[item.to_input_item() for item in result.new_items]
    )


    # You can also optionally register a @stream function to support streaming responses
  3. Define a start_server.py file to start the AgentServer:

    python
    # Need to import the agent to register the functions with the server
    import agent # noqa: F401
    from mlflow.genai.agent_server import (
    AgentServer,
    setup_mlflow_git_based_version_tracking,
    )

    agent_server = AgentServer("ResponsesAgent")
    app = agent_server.app

    # Optionally, set up MLflow git-based version tracking
    # to correspond your agent's traces to a specific git commit
    setup_mlflow_git_based_version_tracking()


    def main():
    # To support multiple workers, pass the app as an import string
    agent_server.run(app_import_string="start_server:app")


    if __name__ == "__main__":
    main()

Deploying and Testing Your Agent

Run your agent server with the --reload flag to automatically reload the server on code changes:

bash
python3 start_server.py --reload
# Pass in a number of workers to support multiple concurrent requests
# python3 start_server.py --workers 4
# Pass in a port to run the server on
# python3 start_server.py --reload --port 8000

Send a request to the server to test your agent out:

bash
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{ "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}]}'

After testing your agent, you can view the traces in the MLflow UI by clicking on "Traces" tab.

If you have registered a @stream function, you can send a streaming request to the server by passing in "stream": true:

bash
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
"stream": true
}'

Evaluating Your Agent

You can use mlflow.genai.evaluate() to evaluate your agent. See the Evaluating Agents guide and Scorer documentation for more information.

  1. Define a file like eval_agent.py to evaluate your agent:

    python
    import asyncio

    import mlflow

    # need to import agent for our @invoke-registered function to be found
    from agent import agent # noqa: F401
    from mlflow.genai.agent_server import get_invoke_function
    from mlflow.genai.scorers import RelevanceToQuery, Safety
    from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse

    eval_dataset = [
    {
    "inputs": {
    "request": {
    "input": [
    {"role": "user", "content": "What's the 15th Fibonacci number"}
    ]
    }
    },
    "expected_response": "The 15th Fibonacci number is 610.",
    }
    ]


    def sync_invoke_fn(request: dict) -> ResponsesAgentResponse:
    # Get the invoke function that was registered via @invoke decorator in your agent
    invoke_fn = get_invoke_function()
    return asyncio.run(invoke_fn(ResponsesAgentRequest(**request)))


    mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=sync_invoke_fn,
    scorers=[RelevanceToQuery(), Safety()],
    )
  2. Run the evaluation:

    bash
    python3 eval_agent.py

    You should see the evaluation results and MLflow run information in the console output. In the MLflow UI, you can find the resulting runs from the evaluation on the experiment page. Click the run name to view the aggregated metrics and metadata in the overview pane.