Evidence-Based Multi-Agent Development: A SAFe Framework Implementation with Claude Code
π€ LLM Context: Get the entire repository as LLM-ready context β GitIngest
Perfect for loading this methodology into Claude, ChatGPT, or any LLM to understand the complete SAFe multi-agent workflow.
A comprehensive methodology for software development using multi-agent orchestration with Claude Code's Task tool. Based on 5 months of production experience (169 issues, 9 cycles, 2,193 commits) implementing the Scaled Agile Framework (SAFe) with AI agents.
Key Innovation: Treating AI agents like specialized team members (11 roles: BSA, System Architect, Data Engineer, Backend Dev, Frontend Dev, QAS, RTE, DevOps, Security, Technical Writer, TDM) instead of "better autocomplete."
This isn't just "AI-assisted development" - it's a fundamentally different approach to human-AI collaboration:
Equal Voice for All Contributors - Human and AI input have equal weight in technical discussions. No hierarchy, just expertise.
- β Mutual Respect: All perspectives valued, regardless of source
- β Shared Responsibility: Everyone owns project success
- β Transparent Decision-Making: Decisions made openly with input from all
- β Constructive Disagreement: Disagreement welcomed when it leads to better solutions
Why This Matters: Traditional AI tools are "assistants" - this methodology treats AI as collaborative team members with agency and expertise.
AI Agents Can Halt Work - Any agent can exercise "stop-the-line" authority for architectural or security concerns.
- π¨ Architectural Integrity: Flag issues that compromise system design
- π Security Concerns: Highlight potential vulnerabilities
- π Performance Implications: Note potential bottlenecks
- π§ Maintainability Issues: Identify future maintenance problems
When Exercised:
- Agent clearly explains the concern with specific examples
- Proposes alternative approaches
- Documents decision in an ADR (Architecture Decision Record)
- Updates Linear ticket with architectural discussion
Real Example: System Architect blocked a 710-line deployment script (WOR-321) due to complexity concerns, leading to a complete redesign with proper error handling and rollback capabilities.
True Multi-Agent Delegation - Claude Code's Task tool enables one agent to delegate work to another while preserving context, maintaining quality gates, and enabling parallel development.
The Innovation:
// Agent A delegates to Agent B with full context transfer
Task({
targetAgent: "data-engineer",
taskDescription: "Design migration validation pipeline",
context: {
linearTicket: "WOR-321",
dependencies: ["existing migration scripts", "RLS patterns"],
acceptance: ["migration safety verified", "SQL validation queries created"]
},
expectedArtifacts: ["validation scripts", "safety documentation"]
})
What This Enables:
- π Context Transfer: Full project context, ticket requirements, and dependencies passed between agents
- π Role Specialization: Each agent operates within its specialized expertise and tool access
- β‘ Independent Execution: Agents work autonomously without blocking each other
- π Quality Gates: Multiple specialized checkpoints catch different issue types
- π Evidence Trail: Complete audit trail with artifacts at each stage
Real Example (WOR-321):
BSA β Planning Spec (45 min)
ββ Data Engineer β Schema Design (1.5 hrs)
ββ Backend Dev β CI/CD Implementation (2 hrs)
ββ QAS β Test Validation (1 hr)
ββ RTE β Production Delivery (30 min)
Why This Matters: Traditional AI tools are single-threaded (Developer β AI β Code β Review). This enables parallel, specialized workflows with multiple quality gates - like having a real team, not just an assistant.
The Secret: Treating AI agents like specialized team members with clear roles, handoff protocols, and quality checkpoints - not like "better autocomplete."
"Search First, Reuse Always, Create Only When Necessary" - MANDATORY before any implementation.
4-Step Discovery Process:
- Search Specs Directory: Find similar implementations in past specs
- Search Codebase: Look for existing patterns and helpers
- Search Pattern Library: Check
patterns_library/
for reusable patterns - Propose to System Architect: Get approval before creating new patterns
Why This Works: Prevents reinventing the wheel, ensures consistency, and builds institutional knowledge over time.
Explicit Knowledge Transfer - Three tags for passing context from planning to execution:
#PATH_DECISION
- Documents why a particular approach was chosen over alternatives#PLAN_UNCERTAINTY
- Flags assumptions that require validation during implementation#EXPORT_CRITICAL
- Highlights non-negotiable requirements (security, compliance, architecture)
Example:
#PATH_DECISION: Chose REST over GraphQL due to existing API patterns
#PLAN_UNCERTAINTY: Assumed field is optional - verify with POPM
#EXPORT_CRITICAL: MUST use withAdminContext for all operations
Impact: Execution agents understand not just what to build, but why decisions were made and what cannot be compromised.
Single Source of Truth - Every feature starts with a comprehensive spec following SAFe hierarchy:
Epic (Strategic Initiative)
βββ Feature (Deliverable Capability)
βββ User Story (User-Facing Functionality)
βββ Enabler (Technical Foundation)
Workflow:
- BSA creates spec with acceptance criteria and testing strategy
- System Architect validates architectural approach
- Implementation agents execute with pattern discovery
- QAS validates against acceptance criteria
- Evidence attached to Linear ticket before POPM review
Key Insight: Separation of planning (BSA) from execution (developers) ensures thorough upfront thinking and consistent implementation.
Each Agent Has Specific Capabilities - Not all agents can do everything:
- Planning Agents (Opus model): BSA, System Architect - Slower but thorough
- Execution Agents (Sonnet model): Developers, Engineers - Faster implementation
- Tool Restrictions: Each agent only has access to tools needed for their role
Example: QAS (Quality Assurance) can only Read, Bash, and Grep - cannot Write or Edit code. This enforces role boundaries and prevents scope creep.
All Work Requires Verifiable Evidence - No "trust me, it works":
- β Test Results: All tests must pass before PR
- β Screenshots: Visual proof of UI changes
- β Validation Output: Command output showing success
- β Session IDs: Complete audit trail of agent work
Swimlane Workflow: Backlog β Ready β In Progress β Testing β Ready for Review β Done
POPM Approval: Product Owner/Product Manager has final approval on all deliverables with full evidence trail.
Iterative Problem Solving - Agents follow a clear loop until success or blocked:
- Clear Goal - BSA defines with acceptance criteria
- Pattern Discovery - Search codebase and sessions
- Iterative Problem Solving:
- Implement approach
- Run validation command
- If fails β analyze error, adjust, repeat
- If blocked β escalate to TDM with context
- Evidence Attachment - Session ID + validation results in Linear
No Over-Engineering: No file locks, circuit breakers, or arbitrary retry limits. Agents iterate until success or blocked, then escalate with context.
Metric | Value | Source |
---|---|---|
Sprint Cycles | 9 cycles (5 months) | Linear |
Issues Completed | 169 issues | Linear API |
Velocity Growth | 14Γ improvement | Cycle 3 (3) β Cycle 8 (42) |
Commits | 2,193 commits (10.3/day) | GitHub API |
PR Merge Rate | 90.9% (159/175) | GitHub |
Documentation | 136 docs, 36 specs, 208 Confluence pages | Repository |
All metrics are fully verifiable. See whitepaper/data/ for validation.
- Read: Executive Summary (5 min)
- Understand: Case Studies (15 min)
- Implement: Implementation Guide (30 min)
- Assess: Limitations (10 min)
- Data Validation: Real Production Data Synthesis
- Methodology: Background & Related Work
- Meta-Circular Validation: Validation Evidence
- Future Research: Open Questions
- ROI Analysis: Executive Summary
- Risk Assessment: Limitations
- Adoption Guide: Implementation Prerequisites
- Cost-Benefit: Cost Analysis
Want to use the 11-agent system in your project? Here's how to get started in 3 steps:
- Claude Code: https://docs.anthropic.com/claude/docs/claude-code
- Augment Code: https://www.augmentcode.com/
# Clone this repository
git clone https://github.com/ByBren-LLC/WTFB-SAFe-Agentic-Workflow
cd WTFB-SAFe-Agentic-Workflow
# Install agents (choose one)
./scripts/install-prompts.sh # For Claude Code (user install)
./scripts/install-prompts.sh --team # For team sharing (in-project)
./scripts/install-prompts.sh --augment # For Augment Code
@bsa Create a spec for a simple "Hello World" API endpoint
That's it! The BSA agent will create a user story with acceptance criteria and testing strategy.
Next Steps:
- π Detailed Setup: Agent Setup Guide
- β Day 1 Checklist: Complete First Workflow
- π― Meta-Prompts: Copy-Paste Prompts for Common Tasks
- π Agent Reference: AGENTS.md - All 11 agent roles
WTFB-SAFe-Agentic-Workflow/
βββ whitepaper/ # Complete whitepaper (12 sections, ~270KB)
β βββ data/ # Supporting data and metrics (6 files)
β βββ validation/ # Meta-circular validation evidence (19 files)
βββ specs/ # Implementation specifications
βββ examples/ # Coming in v1.1
βββ patterns/ # Whitepaper patterns (see also patterns_library/)
βββ templates/ # Coming in v1.1
βββ patterns_library/ # Existing production patterns (11 patterns)
βββ agent_providers/ # Claude Code & Augment configurations
βββ project_workflow/ # SAFe workflow templates
βββ specs_templates/ # Specification templates
Download: CITATION.bib | CITATION.cff
Graham, J. S., & WTFB Development Team. (2025). Evidence-based multi-agent
development: A SAFe framework implementation with Claude Code [White paper].
https://github.com/ByBren-LLC/WTFB-SAFe-Agentic-Workflow
This is version 1.0 of an emerging methodology, not a proven standard:
- Production use: 5 months tracked (June-October 2025), 2+ years methodology evolution
- Sample size: 169 issues, 2,193 commits, single-developer validation
- Context: Single-developer context limits multi-team scalability validation
- Not universal: Only valuable for complex/high-risk work (see Section 7)
Honest limitations documented in Section 7.
We welcome contributions:
- Patterns: Share production-tested patterns
- Case Studies: Document your implementation experience
- Research: Explore open questions from Section 10
- Improvements: Suggest methodology enhancements
See CONTRIBUTING.md for guidelines.
MIT License - See LICENSE for details.
- Website: WordsToFilmBy.com
- Email: scott@wordstofilmby.com
- Author: J. Scott Graham (cheddarfox)
- Historical Context: Evolved from Auggie's Architect Handbook
This methodology was validated by itself: 7 SAFe agents performed meta-circular validation of the whitepaper and caught critical fabricated data before publication.
See whitepaper/validation/VALIDATION-SUMMARY.md for the complete story of how the methodology prevented academic fraud by validating its own documentation.
The methodology caught its own problems. That's the proof it works.
- Executive Summary
- Introduction
- Background & Related Work
- Innovation: Subagent Communication
- Architecture & Implementation
- Case Studies
- Limitations: Honest Assessment
- Agile Retrospective Advantage
- Implementation Guide
- Future Work & Community
- Conclusion
- Appendices
Version: 1.0 (October 2025)
Status: Production-validated, academically honest, publication-ready
π This repository contains both the whitepaper AND the complete working template for implementing the methodology!