Open
Conversation
Critical fixes: - Add configurable retry_prompt parameter to Reflector class (English default) - Add configurable retry_prompt parameter to Curator class (English default) - Replace hardcoded Chinese retry prompts with configurable system - All three roles (Generator, Reflector, Curator) now consistent - Update .gitignore to exclude checkpoint and evaluation result JSON files This completes the refactoring started in commit 087e2ed where we fixed Generator's Chinese prompt. Now all three ACE roles use the same pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update 4 example files to use prompts_v2_1 instead of deprecated prompts_v2: - examples/helicone/convex_training.py - examples/advanced_prompts_v2.py - examples/helicone/offline_training_replay.py - examples/browser-use/ace_domain_checker.py Note: compare_v1_v2_prompts.py and compare_v2_v2_1_prompts.py intentionally keep prompts_v2 imports since they explicitly compare prompt versions. Examples now demonstrate current best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive documentation for recent improvements: - Configurable retry_prompt parameter (Generator, Reflector, Curator) - Checkpoint saving during training (checkpoint_interval, checkpoint_dir) - Prompt version guidance (v1.0 simple, v2.0 deprecated, v2.1 recommended) - Feature detection utilities (ace/features.py) - Updated test coverage section (mention integration tests) Also update module structure to reflect: - prompts_v2.py marked as DEPRECATED - prompts_v2_1.py marked as RECOMMENDED - New features.py module CLAUDE.md now serves as complete developer reference. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix Python version requirement in SETUP_GUIDE.md (3.9 → 3.11) - Fix async test decorator in test_litellm_client.py - Export DataLoader from benchmarks/loaders for API consistency - Update examples to use recommended prompts_v2_1 instead of deprecated prompts_v2 - Remove unnecessary sys.path manipulation from all example files All 79 tests passing. Resolves critical documentation inconsistencies and improves code quality across examples and test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Simplified README quickstart from 55 lines to ~35 lines - Added built-in SimpleEnvironment class for easy getting started - Removed need for custom environment class in quickstart - Made the quickstart more progressive: basic usage → learning - Added SimpleEnvironment to ace exports - Added links to full examples for users who want more The new quickstart is much more approachable for beginners while still showing the core value of ACE (learning from examples). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Updates to core ACE components: - Enhanced delta operations and playbook functionality - Improved prompts v2.1 with better role implementations - Updated browser automation examples for domain checking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
feat: Enhance browser-use demos and core ACE framework
- Added new demo section showcasing ACE vs baseline browser automation - Includes performance metrics: 30% → 100% success rate, 38.8 → 6.9 avg steps - Added demo results image with detailed comparison data - Shows ACE's autonomous learning and optimization capabilities 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixes JSON serialization error when Sample objects are passed via kwargs to LLM completion calls. The 'sample' parameter is used by ReplayGenerator but cannot be serialized when LiteLLM attempts to log it to Opik tracing. Changes: - Generator._generate_impl(): Filter 'sample' from kwargs before llm.complete() - Reflector._reflect_impl(): Filter 'sample' from kwargs before llm.complete() - Curator.curate(): Filter 'sample' from kwargs before llm.complete() This preserves ReplayGenerator functionality while preventing serialization errors when Opik observability is enabled. Based on LiteLLM best practices for handling custom metadata in kwargs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Allows manual triggering of tests via GitHub UI or CLI - Helps diagnose why automatic workflow triggers stopped after Oct 22 - Updates workflow registration with GitHub Actions
- Add proper type casts and annotations in delta.py - Fix missing Any import in adaptation.py - Add proper Optional type handling for playbook.py - Fix None return types in prompts_v2.py and prompts_v2_1.py - Fix optional dependency type annotations across all modules - Add TYPE_CHECKING guards for conditional imports - Fix decorator signature inconsistencies in roles.py - Resolve dict.get() type issues - Fix Router type assignments in litellm_client.py All 46 mypy errors have been addressed.
- Fix no-redef errors by declaring type annotations before assignments - Add missing List import in prompts_v2_1.py - Fix Dict[str, Any] type annotation for comparisons dict - Add proper cast for int() in playbook.py - Add Optional[Any] type annotation for router in litellm_client.py - Use type: ignore[assignment] for conditional type assignments All mypy errors should now be resolved.
- Check if OpikLogger is None before calling constructor - Add type: ignore[misc] for the instantiation - Ensures mypy passes with all optional dependency scenarios mypy now reports: Success: no issues found in 16 source files
Set up automated code quality checks with pre-commit: - Added pre-commit dependency to dev requirements - Created .pre-commit-config.yaml with Black (formatter) and Mypy (type checker) - Added Black and Mypy configuration to pyproject.toml - Formatted all Python files with Black (42 files reformatted) Pre-commit hooks now automatically: - Format code with Black on every commit - Type-check with Mypy (checking ace/ directory only) This ensures consistent code style and catches type errors before they reach CI/CD. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Rename common.py → shared.py with enhanced docs - Rename utils.py → debug.py for clarity - Create form-filler/form_utils.py for consistency - Update all imports across examples - Add template function documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add workflow diagram to README showing ACE data flow - Simplify folder structure documentation - Enhance TEMPLATE.py with better error handling and output capture - Fix method name in ace_form_filler.py (to_file → save_to_file) - Reduce test domains to 2 for faster testing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- baseline-grocery-price-comparison.py: Full 3-store comparison (Migros, Coop, Aldi) - test-baseline-grocery-price-comparison.py: Single-store test version (Migros only) Features: - Automated grocery shopping for 5 essential items across Swiss stores - Price comparison with basket totals and item details - Performance metrics tracking (steps, browser-use tokens) - Regex parsing for structured agent output - Console-only output following domain-checker demo pattern - Claude Anthropic 4.5 model integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… raw logs passing
Changes: - Pass only raw browser-use logs to reflector (no analysis/metrics) - Clean up execution log collection (remove commentary) - Increase max_tokens to 8192 for all ACE roles (Generator, Reflector, Curator) - Fix AttributeError: bullet.helpful_count → bullet.helpful - Prevents JSON truncation errors with large browser automation logs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Clean up online shopping demo by removing obsolete example files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove old migros-specific demo files - Add new consolidated ace-online-shopping.py and baseline-online-shopping.py demos - Include results screenshot showing performance comparison 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- ACEBrowserUse → ACEAgent (actual class name) - SimpleAgent → ACELiteLLM (actual class name) - Add Out-of-Box Integrations section to README - Remove broken OUT_OF_BOX_INTEGRATIONS.md link Files updated: - docs/INTEGRATION_GUIDE.md (2 fixes) - docs/INTEGRATION_PATTERNS.md (1 fix + broken link) - ace/integrations/base.py (2 fixes) - ace/integrations/litellm.py (2 fixes) - README.md (new section showcasing all 3 integrations)
Update all documentation to use "ACEAgent (browser-use)" naming for clarity. This makes the purpose immediately clear alongside ACELiteLLM and ACELangChain. Changes: - README.md: Updated out-of-box integrations section - INTEGRATION_GUIDE.md: Updated decision tree and references - INTEGRATION_PATTERNS.md: Updated see also section - ace/integrations/base.py: Updated module docstring - ace/integrations/litellm.py: Updated class docstring references No code changes, only documentation clarifications. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add navigation READMEs to improve example discoverability: - examples/README.md: Central index of all examples - examples/starter-templates/README.md: Copy-paste templates guide - examples/prompts/README.md: Prompt comparison guide Update documentation to link to examples: - README.md: Add specific example links in Documentation section - INTEGRATION_GUIDE.md: Add Runnable Examples section This solves the discoverability problem - users can now easily find the right example to adapt for their use case. Changes: - New: examples/README.md (navigation hub) - New: examples/starter-templates/README.md - New: examples/prompts/README.md - Updated: README.md (Documentation section) - Updated: docs/INTEGRATION_GUIDE.md (added Examples section) Total: 3 new files, 2 updated files (~180 lines total) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Update seahorse_emoji_ace.py to use ACELiteLLM integration - Remove deprecated starter templates (langchain, ollama) - Update quickstart_litellm.py with modern ACELiteLLM approach - Add new litellm/ and ollama/ example directories - Align examples with v0.4+ integration patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Delete 5 internal planning files (~2,500 lines) - TODO.md, DEMO_TODO.md, ACE_ROADMAP.md - FINAL_ACE_SUMMARY.md, ACE_V2_1_IMPROVEMENTS.md - Agents.md (duplicate of QUICK_START) - Simplify core documentation (-549 lines net) - COMPLETE_GUIDE_TO_ACE: 360→196 lines (-46%) - QUICK_START: 296→207 lines (-30%) - SETUP_GUIDE: 464→246 lines (-47%) - TESTING_GUIDE: 629→413 lines (-34%) - Merge INTEGRATION_PATTERNS into INTEGRATION_GUIDE - Add 8 detailed integration patterns - Remove duplicate section - Update cross-references - Enhance API documentation - Add integrations section (ACELiteLLM, ACEAgent, ACELangChain) - Update prompts_v2 → prompts_v2_1 references - Mark v2.0 prompts as deprecated - Add version comparison table - Fix broken cross-references - Update 3 links to deleted files - Verify all example file references Total: -626 lines, cleaner structure, up-to-date content
- Delete RELEASE_NOTES.md (outdated, duplicates CHANGELOG) - Delete generated artifacts (hn_expert.json, custom_agent_learned.json, ace_example_output.log) - Delete research folders (comparison_analysis/, other_ace_repos/) - Update .gitignore: add *.log pattern to prevent future commits
- Add ace/integrations/ module documentation (key pattern) - Clarify dual architecture: Full ACE vs Integration Pattern - Add TOON format context (16-62% token savings) - Document pre-commit hooks (Black + MyPy auto-run) - Add concrete benchmark command examples - Streamline prompt version guidance (v1.0 vs v2.1) - Remove deprecated v2.0 references
- Delete empty research folders (comparative_analysis, real_ace_analysis, true_ace_comparison) - Delete example artifact files (learned playbooks, checkpoints) - Delete internal dev guide (rework-demo-guide.md) - Clean build artifacts (.pyc, __pycache__, .DS_Store) All changes properly documented in CHANGELOG.md [Unreleased] section.
- ACELiteLLM and ACELangChain integrations - Integration exports from ace package root - Documentation cleanup and examples reorganization - SimpleAgent renamed to ACELiteLLM
- Update Quick Start to use ACELiteLLM (17 lines vs 52) - Remove internal cleanup items from CHANGELOG - Focus on user-facing features only
Matches common LangChain prompt template patterns
- Remove remaining README.md from starter-templates directory - Remove quickstart_litellm.py (replaced by litellm/ examples) - Complete example restructuring aligned with v0.5+ patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Update image reference to point to correct location in domain-checker folder. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Restore LMstudio starter template with updated configuration - Add comprehensive README with setup and troubleshooting guide - Include LM Studio integration examples for ACE framework 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive, production-ready agent examples using local Ollama models: 1. Code Review Agent - Reviews code for bugs, security issues, and best practices - SQL injection, resource leaks, exception handling - Learns team-specific coding patterns 2. Data Analysis Agent - Analyzes data and generates actionable insights - Sales trends, anomaly detection, business metrics - Learns domain-specific analysis patterns 3. SQL Query Generator - Natural language to SQL translation - Complex joins, aggregations, subqueries - Learns database-specific query patterns 4. Troubleshooting Assistant - Diagnoses system issues - Memory leaks, performance issues, network problems - Learns environment-specific issues 5. Technical Writer Agent - Converts code to documentation - API docs, README files, changelogs - Learns company documentation style Features: - Each agent includes 6+ training samples - Custom TaskEnvironment for domain-specific evaluation - Before/after learning comparisons - Real-world test cases - Persistent playbooks for knowledge reuse - Comprehensive README with setup and best practices All examples use ACE learning to improve over time, demonstrating: - Offline training with evaluation - Learned strategy persistence - Model recommendations (qwen2.5:7b, llama3.1:8b) - Production-ready error handling Run all demos: uv run python examples/ollama/run_all_demos.py
Add 5 additional production-ready agent examples using Ollama: **New Agents:** 1. Test Case Generator (test_generator_agent.py) - Generates unit tests - Edge case detection - Pytest patterns - Mocking strategies - Learns team testing conventions 2. Email/Ticket Classifier (email_classifier_agent.py) - Support automation - Priority classification - Department routing - Intent recognition - Learns routing rules 3. Bug Report Analyzer (bug_report_agent.py) - Issue triage - Severity classification - Component assignment - Duplicate detection - Required information extraction 4. Git Commit Message Generator (commit_message_agent.py) - Conventional commits - Semantic versioning - Scope detection - Breaking change identification - Learns project conventions 5. Security Log Analyzer (security_log_agent.py) - Threat detection - Attack pattern recognition - False positive reduction - Incident severity - Response procedures **Updates:** - README now organized by category (Dev/Ops/Data/Security/Docs) - run_all_demos.py updated to run all 10 agents - All agents include 6-7 training samples - Custom evaluation environments for each domain - Real-world test cases demonstrating practical usage **Total Agent Collection:** - 10 production-ready agents across 5 domains - Comprehensive examples for different industries - ACE learning patterns for various use cases - Complete with persistent playbooks Run all: uv run python examples/ollama/run_all_demos.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reviews code for security vulnerabilities and best practices:
SQL injection detection
Resource leak identification
Exception handling analysis
Mutable default arguments
XSS vulnerabilities
Race conditions
Learns: Team-specific coding patterns and security standards
Analyzes data and generates actionable insights:
Sales trend analysis
Anomaly detection
User engagement metrics
System performance monitoring
Customer satisfaction patterns
Learns: What types of insights are valuable for different data domains
Translates natural language to SQL queries:
Complex joins and aggregations
Subqueries and window functions
Query optimization patterns
Database-specific syntax
Learns: Schema-specific patterns and business query requirements
Diagnoses system issues from logs and symptoms:
Memory leaks and resource exhaustion
Network timeouts and connectivity issues
Configuration problems
Performance bottlenecks
Learns: Environment-specific issues and resolution patterns
Converts technical content to clear documentation:
API documentation
README files
Changelog entries
Configuration guides
Tutorial introductions
Learns: Company documentation style and best practices
Development & DevOps (5 agents)
🔍 Code Review Agent - Security vulnerabilities and best practices
🧪 Test Case Generator ⭐ NEW - Comprehensive unit test generation
🗄️ SQL Query Generator - Natural language to SQL
📝 Git Commit Message Generator ⭐ NEW - Conventional commit messages
🔧 Troubleshooting Assistant - System diagnostics
Operations & Support (2 agents)
📧 Email/Ticket Classifier ⭐ NEW - Support ticket automation
🐛 Bug Report Analyzer ⭐ NEW - Issue triage and severity
Data & Analytics (1 agent)
📊 Data Analysis Agent - Data insights and patterns
Security (1 agent)
🔐 Security Log Analyzer ⭐ NEW - Threat detection and response
Documentation (1 agent)
📝 Technical Writer Agent - Code to documentation
📊 What Each New Agent Does
Learns:
Training: 6 code samples covering various scenarios
Learns:
Training: 7 real support tickets
Learns:
Training: 6 bug reports with various severities
Learns:
Training: 6 code diffs with commits
Learns:
Training: 7 security log scenarios
🎯 Quick Start
Run individual agent
uv run python examples/ollama/test_generator_agent.py
uv run python examples/ollama/email_classifier_agent.py
uv run python examples/ollama/bug_report_agent.py
uv run python examples/ollama/commit_message_agent.py
uv run python examples/ollama/security_log_agent.py
Or run all 10 agents sequentially (20-30 minutes)
uv run python examples/ollama/run_all_demos.py