A2A integration #625

TerminallyLazy · 2025-07-27T03:03:09Z

Fix A2A Subordinate Stability and Resource Management

Problem

The A2A subordinate system had several critical issues preventing reliable multi-agent coordination:

Duplicate Spawning: Multiple subordinates with same role due to poor status checking
Port Conflicts: All subordinates trying to use port 8100, causing connection failures
Memory Exhaustion: Subordinates killed by OOM without graceful handling
UI Pollution: Crashed subordinates leaving duplicate entries in agent management UI
Poor Error Reporting: Unclear error messages making debugging difficult

Solution

Subordinate Manager (`a2a_subordinate_manager.py`)

✅ Unique Port Allocation: Incremental port assignment (8100, 8101, 8102...)
✅ Duplicate Prevention: Role normalization + proper existing subordinate detection
✅ Resource Cleanup: Immediate port/context release on subordinate death
✅ Better Status Handling: Wait for "starting" subordinates, cleanup failed ones
✅ OOM Detection: Detect exit codes 9/137 and provide clear OOM error messages

Subordinate Runner (`a2a_subordinate_runner.py`)

✅ Memory Limits: Configurable RLIMIT_AS (default 8GB, via SUBORDINATE_RAM_GB)
✅ Graceful OOM: MemoryError handling instead of kernel SIGKILL
✅ Better Logging: Detailed error reporting for agent creation failures

Task Handler (`a2a_handler.py`)

✅ Memory Monitoring: Track memory usage during task execution
✅ OOM Recovery: Catch MemoryError during monologue and return error response
✅ Progress Tracking: Better timeout and progress monitoring

UI Context Management

✅ Duplicate Prevention: Check/remove existing proxy contexts before registration
✅ Crash Cleanup: Unregister contexts when subordinates crash or become unreachable
✅ Status Priority: Keep best-status context when duplicates detected

Testing

✅ Simple 2-subordinate coordination works reliably
✅ Retry mechanism functions when subordinates crash
✅ No UI duplicates after subordinate respawn
✅ Clear error messages for OOM/connection issues
✅ Port allocation scales properly

Impact

Reliability: Subordinates now survive memory pressure and connection issues
Debuggability: Clear error messages for common failures
UI Cleanliness: No more duplicate/orphaned agent entries
Scalability: Multiple subordinates can run simultaneously without port conflicts
Resource Management: Proper cleanup prevents resource leaks

Breaking Changes

None - all changes are backwards compatible.

Configuration

Set SUBORDINATE_RAM_GB=16 (or higher) for memory-intensive workflows.

Add comprehensive Agent-to-Agent (A2A) Protocol support enabling Agent Zero to communicate with other A2A-compliant agents while maintaining full backward compatibility. Supports peer discovery, task delegation, and multiple interaction patterns (polling, SSE, webhook). Key features: - A2A Protocol v1.1.0 compliance with JSON-RPC 2.0 - AgentCard discovery via /.well-known/agent.json - TaskState management (SUBMITTED, WORKING, COMPLETED, FAILED) - Multiple interaction patterns: polling, SSE, webhook - Enterprise authentication (Bearer, API Key, OAuth2) - Dynamic tool discovery and wrapping - Async/await architecture with proper event coordination - Full backward compatibility with existing Agent Zero functionality Implementation includes: - Core A2A communication tools and handlers - Starlette/FastAPI ASGI server with all required endpoints - Robust client with retry logic and authentication - Extended AgentContext and AgentConfig for A2A capabilities - Peer-to-peer communication layer - Tool registry for dynamic capability discovery - Comprehensive test suite (50+ test cases) - Multi-agent workflow examples - A2A-specific prompt templates Files added: - python/tools/a2a_communication.py - A2A communication tool - python/helpers/a2a_handler.py - Core A2A protocol handler - python/helpers/a2a_server.py - A2A server implementation - python/helpers/a2a_client.py - A2A client with proper async handling - python/helpers/a2a_agent.py - Peer-to-peer communication layer - python/helpers/a2a_tool_wrapper.py - Dynamic tool discovery - examples/a2a_multi_agent_workflow.py - Multi-agent demo - tests/test_a2a_integration.py - Comprehensive test suite - prompts/default/agent.system.a2a.*.md - A2A prompt templates Resolves enterprise requirements for agent collaboration and scalability.

…boration Replace traditional hierarchical subordinates with A2A-based peer-to-peer communication enabling parallel processing, direct user interaction, and scalable multi-agent workflows while maintaining full backward compatibility. Key Features: - True parallel processing with independent subordinate processes - Direct user communication with any subordinate agent - Auto port allocation and process lifecycle management - Agent hierarchy visualization and management - Fault tolerance through process isolation - Scalable architecture supporting distributed agent networks Implementation: - A2ASubordinateManager: Complete subordinate lifecycle management - A2ASubordinate Tool: Enhanced tool replacing call_subordinate - A2ASubordinateRunner: Independent process runner for subordinates - Enhanced AgentContext: Multi-agent registry and message routing - Extended AgentConfig: Subordinate-specific configuration options Benefits over traditional subordinates: - Parallel execution instead of sequential processing - Direct user access to subordinates via A2A protocol - Process isolation prevents cascading failures - Horizontal scalability across multiple machines - Rich interaction patterns between all participants Files added: - python/helpers/a2a_subordinate_manager.py - Subordinate lifecycle management - python/helpers/a2a_subordinate_runner.py - Independent subordinate processes - python/tools/a2a_subordinate.py - Enhanced subordinate communication tool - examples/a2a_enhanced_subordinates_demo.py - Complex workflow demonstration - tests/test_a2a_subordinates.py - Comprehensive test suite - docs/a2a_subordinates.md - Complete documentation and migration guide - prompts/default/agent.system.tool.a2a_subordinate.md - Tool documentation Enables sophisticated multi-agent workflows while preserving Agent Zero's tool-based simplicity and maintaining backward compatibility.

- Fix duplicate subordinate spawning via role normalization and better status checking - Implement unique port allocation (8100, 8101, 8102...) instead of port reuse conflicts - Add configurable memory limits (SUBORDINATE_RAM_GB) with graceful OOM handling - Enhance memory monitoring during task execution using psutil - Prevent UI duplicate contexts with proper cleanup on subordinate crash/failure - Improve error reporting and logging for better debugging - Add proper resource cleanup (ports, contexts) when subordinates exit prematurely Fixes issues where subordinates would: - Spawn duplicates due to poor existing subordinate detection - Fail to connect due to port 8100 conflicts - Crash with SIGKILL due to memory exhaustion - Leave orphaned UI contexts after crashes

TerminallyLazy · 2025-07-27T03:03:59Z

I think I might need some help with this one... If anyone feels up to it.

Omni-NexusAI · 2025-07-27T16:00:06Z

Would this seemingly fix the ballooning memory usage that the agents like to use? I noticed that the container's memory usage keeps increasing steadily until it reaches the limit, and presumably wouldn't eventually lead to a crash. I have a lot of RAM, so I hadn't managed to reach the max yet. but it is obvious a big problem for long-term usage.

Also, I was wondering if the agents can truly run in parallel, or if they are sequential. Appears that they are sequential as of now, but would be useful if they could somehow run in parallel within the same instance, or the agent can dynamically create a new envs for it's agents to perform operations in parallel with each other.

Either way, I've been looking into how these two things could be improved or implemented, and see what I can do.

- Added a2a_server_token to Settings TypedDict - Updated default settings to auto-generate A2A tokens - Added token clearing in sensitive settings removal - Implemented token update logic with deferred tasks 2. DynamicA2AProxy Class (a2a_server.py:574-678): - Similar to DynamicMcpProxy but for A2A protocol - Supports token-based URL routing: /t-{token}/endpoint - Thread-safe reconfiguration of routes - ASGI-compatible proxy implementation 3. A2A Client Updates (a2a_client.py): - Added url_token parameter to constructor - _build_token_url() method for token-based URL construction - Updated all client methods to use token-based URLs when available 4. Subordinate Integration: - Updated subordinate manager to pass A2A tokens to clients - Modified subordinate runner to use DynamicA2AProxy - Added _start_uvicorn_with_proxy() method for token-based server startup 5. UI Settings Panel: - Added complete A2A settings section with 6 configuration fields - Token field is hidden (like MCP) but managed automatically - Includes settings for port, subordinate management, and protocol options Token-Based URL Structure: Standard A2A URLs: - /.well-known/agent.json - /tasks/submit - /message/stream Token-Based URLs: - /t-{token}/.well-known/agent.json - /t-{token}/tasks/submit - /t-{token}/message/stream This implementation provides the same level of security and URL-based authentication that MCP uses, ensuring that A2A communications are properly authenticated via URL tokens while maintaining backward compatibility with standard A2A protocol endpoints.

- Subordinates inherit complete tool system from parent agent - All tools (code execution, web search, file operations, etc.) are available - MCP tools are properly inherited and configured - Tool discovery and execution works identically to main agent ✅ Prompt Configuration Verified: - Profile-based prompt system works correctly for subordinates - Agent-specific prompts override defaults when available - System prompts (behavior, communication, tools) are automatically loaded - Template variables and placeholders are properly processed ✅ Architecture Integration: - Token-based authentication system implemented - Subordinate-to-subordinate communication via peer discovery - Agent Management UI integration with activity drawer - Complete flowchart workflow implemented in a2a_task_delegator.py - Independent operation with separate context windows - Results integration back to main conversation

…ystem as explicitly required by the user for Agent Zero framework development. 2. Identified Root Cause: The issue was that the Agent Zero Docker container was missing required Python dependencies and modules for the file handling APIs. 3. Implemented Comprehensive Solution: - Installed Dependencies: Added essential Python packages (attrs, Flask, nest-asyncio, aiohttp) to the container - Created Minimal Modules: Deployed lightweight file_info.py and get_work_dir_files.py modules at /a0/ in the container - Enhanced RFC Handler: Modified /a0/python/helpers/rfc.py with fallback logic that: - First tries to import full module paths (python.api.file_info) - Falls back to simplified module names (file_info) when dependencies are missing - Ensures /a0 is in the Python path for module discovery 4. Verified Functionality: - Both minimal modules work correctly in the container - RFC fallback mechanism successfully routes calls to the appropriate modules - File operations (browsing directories, getting file info) now function properly

Kironkeys · 2025-08-23T05:43:10Z

this looks fire bro...im going to test it right now. I added a couple of your pulls too, they are badass appreciate it brotha fireee

TerminallyLazy added 6 commits July 23, 2025 15:18

Enhanced subordinate

5120df7

Merge branch 'feat-A2A-integration' into A2A-integration-clean

8a7391e

TerminallyLazy added 3 commits July 30, 2025 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

A2A integration #625

A2A integration #625

Uh oh!

TerminallyLazy commented Jul 27, 2025

Uh oh!

TerminallyLazy commented Jul 27, 2025

Uh oh!

Omni-NexusAI commented Jul 27, 2025

Uh oh!

Kironkeys commented Aug 23, 2025

Uh oh!

Uh oh!

Uh oh!

A2A integration #625

Are you sure you want to change the base?

A2A integration #625

Uh oh!

Conversation

TerminallyLazy commented Jul 27, 2025

Fix A2A Subordinate Stability and Resource Management

Problem

Solution

Subordinate Manager (a2a_subordinate_manager.py)

Subordinate Runner (a2a_subordinate_runner.py)

Task Handler (a2a_handler.py)

UI Context Management

Testing

Impact

Breaking Changes

Configuration

Uh oh!

TerminallyLazy commented Jul 27, 2025

Uh oh!

Omni-NexusAI commented Jul 27, 2025

Uh oh!

Kironkeys commented Aug 23, 2025

Uh oh!

Uh oh!

Subordinate Manager (`a2a_subordinate_manager.py`)

Subordinate Runner (`a2a_subordinate_runner.py`)

Task Handler (`a2a_handler.py`)