Skip to content

Production style autonomous agent framework in Python + React that plans multistep tasks, persists state with SQLite, and recovers from failures using exponential backoff retry logic. Includes real time monitoring dashboard, deterministic recovery demo, and API for task orchestration.

License

Notifications You must be signed in to change notification settings

DavidShableski/multistep-reasoning-agent-with-state-and-recovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MRASR — Multistep Reasoning Agent with State & Recovery

▶️ Watch the demo video on YouTube

A Python-based autonomous agent system that decomposes complex tasks into subtasks, executes them through mock agents, and provides real-time monitoring via React dashboard.

Why it's different

Plans: Decomposes complex tasks into executable subtasks via PlanningAgent in agents/planning_agent.py:31

Remembers: Persists state across runs using SQLite backend in state/persistence.py:16 and in-memory context in state/state_manager.py:22

⚠️ Adapts: Partial - Supports task pause/resume/cancel operations but dynamic re-planning during execution not implemented

Recovers: Exponential backoff retry logic in services/retry_handler.py:27 with configurable global retry limits

Architecture

flowchart LR
  UI[React Frontend] --> API[FastAPI Server/run_server.py]
  API --> ORCH[TaskOrchestrator/orchestrator.py]
  ORCH --> PLAN[PlanningAgent/agents/planning_agent.py]
  ORCH --> EXEC[ExecutionAgent/agents/execution_agent.py]
  EXEC --> MOCK[MockExecutionAgent/agents/mock_execution_agent.py]
  ORCH --> VERIFY[VerifierAgent/agents/verifier_agent.py]
  ORCH <---> STATE[(StateManager/state/state_manager.py)]
  STATE <---> PERSIST[(SQLite DB/state/persistence.py)]
  ORCH --> QUEUE[(SubtaskQueue/state/subtask_queue.py)]
  ORCH --> RETRY[RetryHandler/services/retry_handler.py]
  API --> SSE[SSE Broadcaster/services/sse_broadcaster.py]
Loading

Key Components:

  • State Manager: SQLite persistence (mrasr_state.db) + in-memory context with session tracking
  • Retry Handler: Exponential backoff (base 1.0s, max 300s) with ±10% jitter and global retry limits
  • Mock Agents: Demo-ready execution/verification agents using mock responses or Ollama LLM
  • Task Orchestrator: Coordinates execution pipeline with state updates and recovery

Quick Start

One-Command Startup (Recommended)

# Start everything with Docker (recommended)
# On Windows: Use Command Prompt or PowerShell, not Git Bash
make dev  # Linux/macOS
# OR for all platforms:
python safe_startup.py

# Start with local processes (if Docker unavailable)
python safe_startup.py --local

That's it! The script will:

  • ✅ Check port availability (3000, 8000)
  • ✅ Start Docker containers or local processes
  • ✅ Wait for services to be ready
  • ✅ Open browser to http://localhost:3000

Verify your setup: python scripts/verify_setup.py - Checks all dependencies and configuration

Prerequisites

Option A: Docker (Recommended)

  • Docker Desktop or Docker Engine
  • Docker Compose

Option B: Local Development

  • Python 3.9+
  • Node.js 16+
  • Optional: Ollama for LLM features (defaults to mock mode)

Manual Setup (if needed)

# Docker approach (Linux/macOS or with make installed)
make up          # Start services
make down        # Stop services  
make logs        # View logs
make restart     # Restart all

# Docker approach (Windows without make)
docker compose up -d        # Start services
docker compose down         # Stop services
docker compose logs -f      # View logs

# Local approach
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux
pip install -r requirements.txt

cd frontend && npm install

Health checks

Backend health: GET /health returns {"status":"healthy","service":"mrasr-api"}

Frontend: Queue Status should show connection indicator (green=SSE active, red=polling)

Using the System

From the UI:

  1. Submit Task: Enter title/description in the Task Submission panel
  2. View Plan: Generated subtasks appear in Subtask Progress panel
  3. Monitor Progress: Queue Status tiles show pending/in-progress/completed/failed counts
  4. Manual Execution: Use "Run Next Subtask" button to process tasks step-by-step
  5. Recovery Demo: "Start Recovery Demo" creates a task that intentionally fails and retries
  6. Download Results: "Download Result" button generates comprehensive execution artifacts

Via API:

# Create a task
curl -X POST http://localhost:8000/api/v1/submit_task \
  -H "Content-Type: application/json" \
  -d '{"title":"Research X and produce summary","description":"Multi-step research task"}'

# Get task status + plan + steps
curl http://localhost:8000/api/v1/task_status/<TASK_ID>

# Get queue summary
curl http://localhost:8000/api/v1/queue/summary

# Stream real-time updates
curl http://localhost:8000/api/v1/stream/queue

Artifacts/Output: Task execution logs stored in SQLite mrasr_state.db. State persists across server restarts.

State location: SQLite database file mrasr_state.db in project root, no retention policy configured.

Verification: "Prove the four claims"

✅ STATUS: Core functionality verified, partial adaptation support

✅ VERIFIED: The system delivers on these capabilities

  • Plans: Task decomposition working ✅
  • Remembers: SQLite state persistence working ✅
  • Recovers: Exponential backoff retry logic working ✅
  • Adapts: Basic pause/resume working ⚠️ (dynamic re-planning not implemented)

Plans: Task Decomposition

# Submit a multi-part task
curl -X POST http://localhost:8000/api/v1/submit_task \
  -H "Content-Type: application/json" \
  -d '{"title":"Multi-step Research","description":"Research topic X, analyze results, create summary report"}'

# Expected: Response shows subtasks array with individual steps
# Check: GET /api/v1/task_status/<task_id> shows execution_order array

Remembers: State Persistence

# Run task, stop server with Ctrl+C, restart with python run_server.py
# Expected: Queue status maintains previous task states
# Check: Queue summary shows non-zero counts from previous session

Adapts: Task Control (Partial)

# Pause task mid-execution
curl -X POST http://localhost:8000/api/v1/tasks/<TASK_ID>/pause
# Resume task
curl -X POST http://localhost:8000/api/v1/tasks/<TASK_ID>/resume

# Expected: Task subtasks change status to BLOCKED/PENDING
# Note: Dynamic re-planning during execution not implemented

Recovers: Failure Handling

# Start recovery demo
curl -X POST http://localhost:8000/api/v1/demo/recovery

# Use "Run Next Subtask" repeatedly - step 2 will fail once then succeed
# Expected: Retry count increments, exponential backoff delay applied
# Check: Queue status shows failed->pending status transitions

Troubleshooting

API Connection Issues

404s on frontend: Wrong API base URL in frontend/src/api.js:3 - should be http://localhost:8000/api/v1

Queue Status red/polling: API server down or CORS issue. Check backend logs and curl http://localhost:8000/health

"Failed to refresh data" toast: Frontend polling failed. Check browser console for network errors.

CORS errors in browser:

  • Backend allows http://localhost:3000 by default
  • For custom frontend ports, update CORS origins in run_server.py:17
  • Restart backend after CORS changes

Port Conflicts

Port 8000 (backend) already in use:

# Windows
netstat -ano | findstr :8000
taskkill /PID <PID> /F

# Linux/macOS  
lsof -ti:8000 | xargs kill -9

# Or change port in run_server.py:57

Port 3000 (frontend) already in use:

# Set custom port
set PORT=3001 && npm start  # Windows
PORT=3001 npm start         # Linux/macOS

# Or edit frontend/package.json scripts

Windows-Specific Issues

Unicode/Emoji errors in startup script:

# Use Command Prompt or PowerShell, not Git Bash
# Or set encoding:
set PYTHONIOENCODING=utf-8
python safe_startup.py --local

Path issues with virtual environment:

# Use full paths on Windows if activation fails
C:\path\to\venv\Scripts\activate
# Or use Python directly:
C:\path\to\venv\Scripts\python.exe run_server.py

Docker on Windows:

  • Ensure Docker Desktop is running
  • Enable WSL2 backend if available
  • Use PowerShell as Administrator if permission issues

SSE (Server-Sent Events) Issues

Real-time updates not working:

  • Check if browser blocks SSE connections
  • Corporate firewalls may block streaming endpoints
  • Fallback: Frontend uses polling mode (red indicator)
  • Test: curl http://localhost:8000/api/v1/stream/queue

Development Environment

Missing dependencies:

# Python dependencies
pip install -r requirements.txt

# Frontend dependencies  
cd frontend && npm install

# Virtual environment not activated
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

Ollama connectivity:

  • Test with python -c "import ollama; print('OK')"
  • System gracefully falls back to mock mode if unavailable
  • Set USE_MOCK_LLM=true in .env to force mock mode

Configuration Reference

Variable Required Default Purpose
MODEL_NAME No phi3:latest Ollama model for LLM tasks
OLLAMA_BASE_URL No http://localhost:11434 Ollama server URL
USE_MOCK_LLM No false Force mock responses
MAX_GLOBAL_RETRIES No 10 Global retry limit per subtask
RETRY_BASE_DELAY No 1.0 Base retry delay in seconds
RETRY_MAX_DELAY No 300.0 Maximum retry delay
LLM_TIMEOUT_SECONDS No 30 LLM request timeout

API Reference

Endpoint Method Body Response Notes
/health GET - {"status":"healthy"} Health check
/api/v1/submit_task POST {"title":"X","description":"Y"} Task ID + subtasks Creates and queues task
/api/v1/task_status/{id} GET - Task status + subtasks Full task details
/api/v1/queue/summary GET - {"pending":N,"in_progress":N,...} Queue counts
/api/v1/run_next_subtask POST - Execution result Manual task processing
/api/v1/tasks/{id}/pause POST - Success message Pause task
/api/v1/tasks/{id}/resume POST - Success message Resume task
/api/v1/tasks/{id}/cancel POST - Success message Cancel task
/api/v1/demo/recovery POST - Demo task ID Recovery demo
/api/v1/stream/queue GET - SSE event stream Real-time updates
/api/v1/tasks/{id}/generate_artifact POST - Artifact metadata Generate execution report
/api/v1/artifacts/{id} GET - JSON file download Download task artifact
/api/v1/artifacts GET - Artifacts list List all available artifacts

Execution Artifacts

MRASR automatically generates comprehensive execution reports when tasks complete. These JSON artifacts contain:

Artifact Contents:

  • Task Overview: Title, description, overall status, completion metrics
  • Execution Summary: Success rates, retry statistics, timing analysis
  • Subtask Details: Individual execution results, attempt history, timestamps
  • Retry Analysis: Failure patterns, backoff progression, recovery metrics
  • Performance Metrics: Throughput, reliability scores, recovery times
  • System Information: Session data, version info, generation timestamp

Sample Artifact Structure:

{
  "artifact_metadata": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "generated_at": "2025-08-21T15:30:45.123Z",
    "generator": "MRASR Artifacts Service v1.0"
  },
  "execution_summary": {
    "total_subtasks": 4,
    "success_rate": 100.0,
    "total_retries": 2,
    "total_execution_time": 145.67
  },
  "retry_analysis": {
    "total_retrying_subtasks": 1,
    "retry_distribution": { "2": 1 },
    "successful_recoveries": 1
  }
}

Download: Click "Download Result" button when tasks complete, or use API endpoint /api/v1/artifacts/{task_id}

60-Second Demonstration: Watch deterministic failures, exponential backoff, and automatic recovery in action.

Demo Steps:

  1. Start services with make dev and open http://localhost:3000
  2. Click "Start Recovery Demo" - creates flaky task with predetermined failure
  3. Execute subtasks sequentially - first succeeds, second fails twice then recovers
  4. Watch retry behavior - exponential backoff (1s → 2s) with live countdown
  5. See successful recovery - third attempt succeeds, generates execution artifact
  6. Download results - comprehensive JSON report with retry analytics

What You'll Observe:

  • Deterministic failures: Second subtask always fails exactly twice
  • Exponential backoff: Delays increase from 1s to 2s to 4s
  • Real-time feedback: Live countdown timers and attempt tracking
  • State persistence: All execution data saved to SQLite
  • Comprehensive artifacts: Downloadable execution reports

Development

Tests: python -m pytest tests/ (unit tests), python test_mrasr_complete.py (integration)

Debug logging: Set DEBUG_MODE=true in .env

Repo layout:

  • agents/ - Task execution agents (planning, execution, verification, recovery)
  • api/ - FastAPI router and endpoints
  • services/ - Background processing, SSE, retry logic
  • state/ - State management and SQLite persistence
  • schemas/ - Pydantic models and core types
  • frontend/src/ - React dashboard components

License & Credits

Built by David Shableski - Senior CS/Math major focused on agentic workflows and LLM evaluation

MIT License (see LICENSE file)

About

Production style autonomous agent framework in Python + React that plans multistep tasks, persists state with SQLite, and recovers from failures using exponential backoff retry logic. Includes real time monitoring dashboard, deterministic recovery demo, and API for task orchestration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published