Fix: Add optional detailed logging to AgentEvaluator #1619
+14
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a
log_detailed_results
parameter toAgentEvaluator.evaluate()
andAgentEvaluator.evaluate_eval_set()
insrc/google/adk/evaluation/agent_evaluator.py
.Description
This feature enables users to see detailed, per-invocation results during agent evaluations. When
log_detailed_results
is set toTrue
, the evaluator will log the actual invocation, expected invocation, score, and evaluation status for each step.Motivation
Without this change, users only see a final, aggregated evaluation result (either successful or failed through an
assert
). When an evaluation fails, they don't have a breakdown of the results, which makes debugging difficult. This forces them to "fly blind" or resort to lower-level evaluation methods to get more details, which requires extra work.While the ADK UI offers detailed result observation, having this capability directly within
pytest
runs is beneficial, especially for CI/CD environments where the UI is not available. This change improves the developer experience by providing a direct and convenient way to debug evaluation tests from the logs.