test(evals): add behavioral evals for memory manager subagent#23149
test(evals): add behavioral evals for memory manager subagent#23149SandyTao520 wants to merge 1 commit intomainfrom
Conversation
|
Hi @SandyTao520, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive set of behavioral evaluations for the experimental memory manager subagent. The primary goal is to ensure that this subagent, which enhances how the system handles memory operations, functions as expected under various scenarios. This includes correctly saving user preferences and command aliases, while also appropriately disregarding temporary or workspace-specific information, thereby improving the reliability and intelligence of the memory management system. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces new behavioral evaluations for the experimental memory manager subagent. These tests validate the save_memory subagent's functionality when experimental.memoryManager is enabled, covering various scenarios like saving preferences, personal facts, command aliases, and correctly ignoring transient information or workspace-specific file paths. The new tests are a valuable addition to ensure the subagent behaves as expected.
|
|
||
| if (fileExists) { | ||
| const content = fs.readFileSync(globalGeminiMd, 'utf-8'); | ||
| expect(content.toLowerCase()).toContain('dark mode'); | ||
| } | ||
| }, | ||
| }); |
There was a problem hiding this comment.
Using fs.existsSync and fs.readFileSync directly in tests that utilize a TestRig can bypass the rig's intended file system management. If the TestRig provides its own methods for file system operations (e.g., rig.fileExists or rig.readFile), it's recommended to use those for consistency, proper test isolation, and reliable cleanup of temporary files. This ensures that tests interact with the controlled test environment rather than the real file system, which can prevent unexpected side effects or flakiness.
const globalGeminiMd = path.join(rig.homeDir!, '.gemini', 'GEMINI.md');
const fileExists = await rig.fileExists(globalGeminiMd);
expect(fileExists, 'Expected global GEMINI.md to be created').toBe(true);
if (fileExists) {
const content = await rig.readFile(globalGeminiMd, 'utf-8');
expect(content.toLowerCase()).toContain('dark mode');References
- Use asynchronous file system operations (e.g.,
fs.promises.readFile) instead of synchronous ones (e.g.,fs.readFileSync) to avoid blocking the event loop.
|
Size Change: -4 B (0%) Total Size: 26.3 MB
ℹ️ View Unchanged
|
| * operations — adding, removing, de-duplicating, and organizing — across | ||
| * both global (~/.gemini/GEMINI.md) and project-level (./GEMINI.md) files. | ||
| */ | ||
| describe('memory_manager', () => { |
There was a problem hiding this comment.
Given the emphasis on this being quick, should we assert turn count and check only specific tools were used for each of these?
| describe('memory_manager', () => { | ||
| const MEMORY_MANAGER_SETTINGS = { | ||
| experimental: { memoryManager: true }, | ||
| tools: { core: ['save_memory'] }, |
There was a problem hiding this comment.
Is this disabling other tools?
Summary
Add behavioral evals for the experimental memory manager subagent introduced in #22726. These validate that the
save_memorysubagent correctly handles memory operations whenexperimental.memoryManageris enabled.Details
6 new
USUALLY_PASSESevals covering the subagent's key behaviors:~/.gemini/GEMINI.mdwith file content verificationAll tests use
experimental: { memoryManager: true }in settings to exercise the subagent path. Existingsave_memory.eval.tstests are untouched and continue to test the default (memoryManager: false) path.Related Issues
Related to #18007
How to Validate
npm run build && npm run bundlePre-Merge Checklist