MLflow Agent Server
Agent Server Features
The MLflow Agent Server was released with MLflow 3.6.0. It is currently under active development and is marked as Experimental. Public APIs are subject to change, and new features are being added to enhance its functionality.
- Simple FastAPI server to host agents at
/invocationsendpoint - Decorator-based function registration (
@invoke,@stream) for easy agent development - Automatic request and response validation for Responses API schema agents
- Automatic MLflow tracing integration and aggregation
Full Example
In this example, we'll use the openai-agents-sdk to define our Responses API compatible agent. See the openai-agents-sdk quickstart for more information.
-
Install the openai-agents-sdk and mlflow, and set your OpenAI API key:
bashpip install -U openai-agents mlflow>=3.6.0
export OPENAI_API_KEY=sk-... -
Define your agent in
agent.pyand create methods to annotate with@invoke:pythonfrom agents import Agent, Runner
from mlflow.genai.agent_server import invoke, stream
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse
agent = Agent(
name="Math Tutor",
instructions="You provide help with math problems. Explain your reasoning and include examples",
)
@invoke()
async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
msgs = [i.model_dump() for i in request.input]
result = await Runner.run(agent, msgs)
return ResponsesAgentResponse(
output=[item.to_input_item() for item in result.new_items]
)
# You can also optionally register a @stream function to support streaming responses -
Define a
start_server.pyfile to start theAgentServer:python# Need to import the agent to register the functions with the server
import agent # noqa: F401
from mlflow.genai.agent_server import (
AgentServer,
setup_mlflow_git_based_version_tracking,
)
agent_server = AgentServer("ResponsesAgent")
app = agent_server.app
# Optionally, set up MLflow git-based version tracking
# to correspond your agent's traces to a specific git commit
setup_mlflow_git_based_version_tracking()
def main():
# To support multiple workers, pass the app as an import string
agent_server.run(app_import_string="start_server:app")
if __name__ == "__main__":
main()
Deploying and Testing Your Agent
Run your agent server with the --reload flag to automatically reload the server on code changes:
python3 start_server.py --reload
# Pass in a number of workers to support multiple concurrent requests
# python3 start_server.py --workers 4
# Pass in a port to run the server on
# python3 start_server.py --reload --port 8000
Send a request to the server to test your agent out:
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{ "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}]}'
After testing your agent, you can view the traces in the MLflow UI by clicking on "Traces" tab.
If you have registered a @stream function, you can send a streaming request to the server by passing in "stream": true:
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
"stream": true
}'
Evaluating Your Agent
You can use mlflow.genai.evaluate() to evaluate your agent. See the Evaluating Agents guide and Scorer documentation for more information.
-
Define a file like
eval_agent.pyto evaluate your agent:pythonimport asyncio
import mlflow
# need to import agent for our @invoke-registered function to be found
from agent import agent # noqa: F401
from mlflow.genai.agent_server import get_invoke_function
from mlflow.genai.scorers import RelevanceToQuery, Safety
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse
eval_dataset = [
{
"inputs": {
"request": {
"input": [
{"role": "user", "content": "What's the 15th Fibonacci number"}
]
}
},
"expected_response": "The 15th Fibonacci number is 610.",
}
]
def sync_invoke_fn(request: dict) -> ResponsesAgentResponse:
# Get the invoke function that was registered via @invoke decorator in your agent
invoke_fn = get_invoke_function()
return asyncio.run(invoke_fn(ResponsesAgentRequest(**request)))
mlflow.genai.evaluate(
data=eval_dataset,
predict_fn=sync_invoke_fn,
scorers=[RelevanceToQuery(), Safety()],
) -
Run the evaluation:
bashpython3 eval_agent.pyYou should see the evaluation results and MLflow run information in the console output. In the MLflow UI, you can find the resulting runs from the evaluation on the experiment page. Click the run name to view the aggregated metrics and metadata in the overview pane.