MLflow Agent Server
Agent Server Features
- Simple FastAPI server to host agents at
/invocationsendpoint - Decorator-based function registration (
@invoke,@stream) for easy agent development - Automatic request and response validation for Responses API schema agents
- Automatic MLflow tracing integration and aggregation
Full Example
In this example, we'll use the openai-agents-sdk to define our Responses API compatible agent. See the openai-agents-sdk quickstart for more information.
-
Install the openai-agents-sdk and mlflow, and set your OpenAI API key:
bashpip install -U openai-agents 'mlflow>=3.6.0'export OPENAI_API_KEY=sk-... -
Define your agent in
agent.pyand create methods to annotate with@invoke:pythonfrom agents import Agent, Runnerfrom mlflow.genai.agent_server import invoke, streamfrom mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponseagent = Agent(name="Math Tutor",instructions="You provide help with math problems. Explain your reasoning and include examples",)@invoke()async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:msgs = [i.model_dump() for i in request.input]result = await Runner.run(agent, msgs)return ResponsesAgentResponse(output=[item.to_input_item() for item in result.new_items])# You can also optionally register a @stream function to support streaming responses -
Define a
start_server.pyfile to start theAgentServer:python# Need to import the agent to register the functions with the serverimport agent # noqa: F401from mlflow.genai.agent_server import (AgentServer,setup_mlflow_git_based_version_tracking,)agent_server = AgentServer("ResponsesAgent")app = agent_server.app# Optionally, set up MLflow git-based version tracking# to correspond your agent's traces to a specific git commitsetup_mlflow_git_based_version_tracking()def main():# To support multiple workers, pass the app as an import stringagent_server.run(app_import_string="start_server:app")if __name__ == "__main__":main()
Deploying and Testing Your Agent
Run your agent server with the --reload flag to automatically reload the server on code changes:
python3 start_server.py --reload
# Pass in a number of workers to support multiple concurrent requests
# python3 start_server.py --workers 4
# Pass in a port to run the server on
# python3 start_server.py --reload --port 8000
Send a request to the server to test your agent out:
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{ "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}]}'
After testing your agent, you can view the traces in the MLflow UI by clicking on "Traces" tab.
If you have registered a @stream function, you can send a streaming request to the server by passing in "stream": true:
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
"stream": true
}'
Evaluating Your Agent
You can use mlflow.genai.evaluate() to evaluate your agent. See the Evaluating Agents guide and Scorer documentation for more information.
-
Define a file like
eval_agent.pyto evaluate your agent:pythonimport asyncioimport mlflow# need to import agent for our @invoke-registered function to be foundfrom agent import agent # noqa: F401from mlflow.genai.agent_server import get_invoke_functionfrom mlflow.genai.scorers import RelevanceToQuery, Safetyfrom mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponseeval_dataset = [{"inputs": {"request": {"input": [{"role": "user", "content": "What's the 15th Fibonacci number"}]}},"expected_response": "The 15th Fibonacci number is 610.",}]def sync_invoke_fn(request: dict) -> ResponsesAgentResponse:# Get the invoke function that was registered via @invoke decorator in your agentinvoke_fn = get_invoke_function()return asyncio.run(invoke_fn(ResponsesAgentRequest(**request)))mlflow.genai.evaluate(data=eval_dataset,predict_fn=sync_invoke_fn,scorers=[RelevanceToQuery(), Safety()],) -
Run the evaluation:
bashpython3 eval_agent.pyYou should see the evaluation results and MLflow run information in the console output. In the MLflow UI, you can find the resulting runs from the evaluation on the experiment page. Click the run name to view the aggregated metrics and metadata in the overview pane.