Skip to main content

Google ADK

Google Agent Development Kit (ADK) is an open-source framework from Google for building and evaluating AI agents. MLflow's Google ADK integration allows you to use ADK's deterministic evaluators as MLflow scorers for assessing tool call trajectories and response similarity.

Prerequisites

Google ADK scorers require the google-adk package:

bash
pip install google-adk

Quick Start

You can call Google ADK scorers directly:

python
from mlflow.genai.scorers.google_adk import ToolTrajectory

scorer = ToolTrajectory(match_type="EXACT", threshold=0.5)
feedback = scorer(
inputs="Book a flight to Paris",
outputs="Booked flight AA123 to Paris",
expectations={
"expected_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
{"name": "book_flight", "args": {"flight_id": "AA123"}},
],
"actual_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
{"name": "book_flight", "args": {"flight_id": "AA123"}},
],
},
)

print(feedback.value) # "yes" or "no"
print(feedback.metadata["score"]) # 1.0

Or use them in mlflow.genai.evaluate:

python
import mlflow
from mlflow.genai.scorers.google_adk import ToolTrajectory, ResponseMatch

eval_dataset = [
{
"inputs": {"query": "Book a flight to Paris"},
"outputs": "Booked flight AA123 to Paris",
"expectations": {
"expected_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
{"name": "book_flight", "args": {"flight_id": "AA123"}},
],
"actual_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
{"name": "book_flight", "args": {"flight_id": "AA123"}},
],
"expected_response": "Successfully booked flight AA123 to Paris.",
},
},
]

results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[
ToolTrajectory(match_type="EXACT", threshold=0.5),
ResponseMatch(threshold=0.5),
],
)

Available Google ADK Scorers

Google ADK scorers provide deterministic evaluation without requiring an LLM judge:

ScorerWhat does it evaluate?ADK Docs
ToolTrajectoryDoes the agent call the correct tools in the expected order?Link
ResponseMatchHow similar is the agent's response to a reference answer (ROUGE-1)?Link

Creating Scorers by Name

You can also create Google ADK scorers dynamically using get_scorer:

python
from mlflow.genai.scorers.google_adk import get_scorer

scorer = get_scorer(
metric_name="ToolTrajectory",
match_type="IN_ORDER",
threshold=0.5,
)

feedback = scorer(
inputs="Search for flights to Paris",
outputs="Found 3 flights to Paris",
expectations={
"expected_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
],
"actual_tool_calls": [
{"name": "search_flights", "args": {"destination": "Paris"}},
],
},
)

Configuration

Google ADK scorers accept parameters that control evaluation behavior:

python
from mlflow.genai.scorers.google_adk import ToolTrajectory, ResponseMatch

# ToolTrajectory supports three matching strategies:
# - "EXACT": tools must match in exact order and count (default)
# - "IN_ORDER": expected tools must appear in order, extra tools allowed
# - "ANY_ORDER": expected tools must all appear, order does not matter
trajectory_scorer = ToolTrajectory(
match_type="IN_ORDER",
threshold=0.5,
)

# ResponseMatch computes ROUGE-1 F-measure between output and reference
rouge_scorer = ResponseMatch(
threshold=0.6, # Minimum ROUGE-1 score to pass
)

Refer to the Google ADK documentation for details on evaluation metrics.

Next Steps