MRASR — Multistep Reasoning Agent with State & Recovery

A Python-based autonomous agent system that decomposes complex tasks into subtasks, executes them through mock agents, and provides real-time monitoring via React dashboard.

Why it's different

✅ Plans: Decomposes complex tasks into executable subtasks via PlanningAgent in agents/planning_agent.py:31

✅ Remembers: Persists state across runs using SQLite backend in state/persistence.py:16 and in-memory context in state/state_manager.py:22

⚠️ Adapts: Partial - Supports task pause/resume/cancel operations but dynamic re-planning during execution not implemented

✅ Recovers: Exponential backoff retry logic in services/retry_handler.py:27 with configurable global retry limits

Architecture

flowchart LR
  UI[React Frontend] --> API[FastAPI Server/run_server.py]
  API --> ORCH[TaskOrchestrator/orchestrator.py]
  ORCH --> PLAN[PlanningAgent/agents/planning_agent.py]
  ORCH --> EXEC[ExecutionAgent/agents/execution_agent.py]
  EXEC --> MOCK[MockExecutionAgent/agents/mock_execution_agent.py]
  ORCH --> VERIFY[VerifierAgent/agents/verifier_agent.py]
  ORCH <---> STATE[(StateManager/state/state_manager.py)]
  STATE <---> PERSIST[(SQLite DB/state/persistence.py)]
  ORCH --> QUEUE[(SubtaskQueue/state/subtask_queue.py)]
  ORCH --> RETRY[RetryHandler/services/retry_handler.py]
  API --> SSE[SSE Broadcaster/services/sse_broadcaster.py]

Key Components:

State Manager: SQLite persistence (mrasr_state.db) + in-memory context with session tracking
Retry Handler: Exponential backoff (base 1.0s, max 300s) with ±10% jitter and global retry limits
Mock Agents: Demo-ready execution/verification agents using mock responses or Ollama LLM
Task Orchestrator: Coordinates execution pipeline with state updates and recovery

Quick Start

One-Command Startup (Recommended)

# Start everything with Docker (recommended)
# On Windows: Use Command Prompt or PowerShell, not Git Bash
make dev  # Linux/macOS
# OR for all platforms:
python safe_startup.py

# Start with local processes (if Docker unavailable)
python safe_startup.py --local

That's it! The script will:

✅ Check port availability (3000, 8000)
✅ Start Docker containers or local processes
✅ Wait for services to be ready
✅ Open browser to http://localhost:3000

Verify your setup: python scripts/verify_setup.py - Checks all dependencies and configuration

Prerequisites

Option A: Docker (Recommended)

Docker Desktop or Docker Engine
Docker Compose

Option B: Local Development

Python 3.9+
Node.js 16+
Optional: Ollama for LLM features (defaults to mock mode)

Manual Setup (if needed)

# Docker approach (Linux/macOS or with make installed)
make up          # Start services
make down        # Stop services  
make logs        # View logs
make restart     # Restart all

# Docker approach (Windows without make)
docker compose up -d        # Start services
docker compose down         # Stop services
docker compose logs -f      # View logs

# Local approach
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux
pip install -r requirements.txt

cd frontend && npm install

Health checks

Backend health: GET /health returns {"status":"healthy","service":"mrasr-api"}

Frontend: Queue Status should show connection indicator (green=SSE active, red=polling)

Using the System

From the UI:

Submit Task: Enter title/description in the Task Submission panel
View Plan: Generated subtasks appear in Subtask Progress panel
Monitor Progress: Queue Status tiles show pending/in-progress/completed/failed counts
Manual Execution: Use "Run Next Subtask" button to process tasks step-by-step
Recovery Demo: "Start Recovery Demo" creates a task that intentionally fails and retries
Download Results: "Download Result" button generates comprehensive execution artifacts

Via API:

# Create a task
curl -X POST http://localhost:8000/api/v1/submit_task \
  -H "Content-Type: application/json" \
  -d '{"title":"Research X and produce summary","description":"Multi-step research task"}'

# Get task status + plan + steps
curl http://localhost:8000/api/v1/task_status/<TASK_ID>

# Get queue summary
curl http://localhost:8000/api/v1/queue/summary

# Stream real-time updates
curl http://localhost:8000/api/v1/stream/queue

Artifacts/Output: Task execution logs stored in SQLite mrasr_state.db. State persists across server restarts.

State location: SQLite database file mrasr_state.db in project root, no retention policy configured.

Verification: "Prove the four claims"

✅ STATUS: Core functionality verified, partial adaptation support

✅ VERIFIED: The system delivers on these capabilities

Plans: Task decomposition working ✅
Remembers: SQLite state persistence working ✅
Recovers: Exponential backoff retry logic working ✅
Adapts: Basic pause/resume working ⚠️ (dynamic re-planning not implemented)

Plans: Task Decomposition

# Submit a multi-part task
curl -X POST http://localhost:8000/api/v1/submit_task \
  -H "Content-Type: application/json" \
  -d '{"title":"Multi-step Research","description":"Research topic X, analyze results, create summary report"}'

# Expected: Response shows subtasks array with individual steps
# Check: GET /api/v1/task_status/<task_id> shows execution_order array

Remembers: State Persistence

# Run task, stop server with Ctrl+C, restart with python run_server.py
# Expected: Queue status maintains previous task states
# Check: Queue summary shows non-zero counts from previous session

Adapts: Task Control (Partial)

# Pause task mid-execution
curl -X POST http://localhost:8000/api/v1/tasks/<TASK_ID>/pause
# Resume task
curl -X POST http://localhost:8000/api/v1/tasks/<TASK_ID>/resume

# Expected: Task subtasks change status to BLOCKED/PENDING
# Note: Dynamic re-planning during execution not implemented

Recovers: Failure Handling

# Start recovery demo
curl -X POST http://localhost:8000/api/v1/demo/recovery

# Use "Run Next Subtask" repeatedly - step 2 will fail once then succeed
# Expected: Retry count increments, exponential backoff delay applied
# Check: Queue status shows failed->pending status transitions

Troubleshooting

API Connection Issues

404s on frontend: Wrong API base URL in frontend/src/api.js:3 - should be http://localhost:8000/api/v1

Queue Status red/polling: API server down or CORS issue. Check backend logs and curl http://localhost:8000/health

"Failed to refresh data" toast: Frontend polling failed. Check browser console for network errors.

CORS errors in browser:

Backend allows http://localhost:3000 by default
For custom frontend ports, update CORS origins in run_server.py:17
Restart backend after CORS changes

Port Conflicts

Port 8000 (backend) already in use:

# Windows
netstat -ano | findstr :8000
taskkill /PID <PID> /F

# Linux/macOS  
lsof -ti:8000 | xargs kill -9

# Or change port in run_server.py:57

Port 3000 (frontend) already in use:

# Set custom port
set PORT=3001 && npm start  # Windows
PORT=3001 npm start         # Linux/macOS

# Or edit frontend/package.json scripts

Windows-Specific Issues

Unicode/Emoji errors in startup script:

# Use Command Prompt or PowerShell, not Git Bash
# Or set encoding:
set PYTHONIOENCODING=utf-8
python safe_startup.py --local

Path issues with virtual environment:

# Use full paths on Windows if activation fails
C:\path\to\venv\Scripts\activate
# Or use Python directly:
C:\path\to\venv\Scripts\python.exe run_server.py

Docker on Windows:

Ensure Docker Desktop is running
Enable WSL2 backend if available
Use PowerShell as Administrator if permission issues

SSE (Server-Sent Events) Issues

Real-time updates not working:

Check if browser blocks SSE connections
Corporate firewalls may block streaming endpoints
Fallback: Frontend uses polling mode (red indicator)
Test: curl http://localhost:8000/api/v1/stream/queue

Development Environment

Missing dependencies:

# Python dependencies
pip install -r requirements.txt

# Frontend dependencies  
cd frontend && npm install

# Virtual environment not activated
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

Ollama connectivity:

Test with python -c "import ollama; print('OK')"
System gracefully falls back to mock mode if unavailable
Set USE_MOCK_LLM=true in .env to force mock mode

Configuration Reference

Variable	Required	Default	Purpose
`MODEL_NAME`	No	`phi3:latest`	Ollama model for LLM tasks
`OLLAMA_BASE_URL`	No	`http://localhost:11434`	Ollama server URL
`USE_MOCK_LLM`	No	`false`	Force mock responses
`MAX_GLOBAL_RETRIES`	No	`10`	Global retry limit per subtask
`RETRY_BASE_DELAY`	No	`1.0`	Base retry delay in seconds
`RETRY_MAX_DELAY`	No	`300.0`	Maximum retry delay
`LLM_TIMEOUT_SECONDS`	No	`30`	LLM request timeout

API Reference

Endpoint	Method	Body	Response	Notes
`/health`	GET	-	`{"status":"healthy"}`	Health check
`/api/v1/submit_task`	POST	`{"title":"X","description":"Y"}`	Task ID + subtasks	Creates and queues task
`/api/v1/task_status/{id}`	GET	-	Task status + subtasks	Full task details
`/api/v1/queue/summary`	GET	-	`{"pending":N,"in_progress":N,...}`	Queue counts
`/api/v1/run_next_subtask`	POST	-	Execution result	Manual task processing
`/api/v1/tasks/{id}/pause`	POST	-	Success message	Pause task
`/api/v1/tasks/{id}/resume`	POST	-	Success message	Resume task
`/api/v1/tasks/{id}/cancel`	POST	-	Success message	Cancel task
`/api/v1/demo/recovery`	POST	-	Demo task ID	Recovery demo
`/api/v1/stream/queue`	GET	-	SSE event stream	Real-time updates
`/api/v1/tasks/{id}/generate_artifact`	POST	-	Artifact metadata	Generate execution report
`/api/v1/artifacts/{id}`	GET	-	JSON file download	Download task artifact
`/api/v1/artifacts`	GET	-	Artifacts list	List all available artifacts

Execution Artifacts

MRASR automatically generates comprehensive execution reports when tasks complete. These JSON artifacts contain:

Artifact Contents:

Task Overview: Title, description, overall status, completion metrics
Execution Summary: Success rates, retry statistics, timing analysis
Subtask Details: Individual execution results, attempt history, timestamps
Retry Analysis: Failure patterns, backoff progression, recovery metrics
Performance Metrics: Throughput, reliability scores, recovery times
System Information: Session data, version info, generation timestamp

Sample Artifact Structure:

{
  "artifact_metadata": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "generated_at": "2025-08-21T15:30:45.123Z",
    "generator": "MRASR Artifacts Service v1.0"
  },
  "execution_summary": {
    "total_subtasks": 4,
    "success_rate": 100.0,
    "total_retries": 2,
    "total_execution_time": 145.67
  },
  "retry_analysis": {
    "total_retrying_subtasks": 1,
    "retry_distribution": { "2": 1 },
    "successful_recoveries": 1
  }
}

Download: Click "Download Result" button when tasks complete, or use API endpoint /api/v1/artifacts/{task_id}

60-Second Demonstration: Watch deterministic failures, exponential backoff, and automatic recovery in action.

Demo Steps:

Start services with make dev and open http://localhost:3000
Click "Start Recovery Demo" - creates flaky task with predetermined failure
Execute subtasks sequentially - first succeeds, second fails twice then recovers
Watch retry behavior - exponential backoff (1s → 2s) with live countdown
See successful recovery - third attempt succeeds, generates execution artifact
Download results - comprehensive JSON report with retry analytics

What You'll Observe:

Deterministic failures: Second subtask always fails exactly twice
Exponential backoff: Delays increase from 1s to 2s to 4s
Real-time feedback: Live countdown timers and attempt tracking
State persistence: All execution data saved to SQLite
Comprehensive artifacts: Downloadable execution reports

Development

Tests: python -m pytest tests/ (unit tests), python test_mrasr_complete.py (integration)

Debug logging: Set DEBUG_MODE=true in .env

Repo layout:

agents/ - Task execution agents (planning, execution, verification, recovery)
api/ - FastAPI router and endpoints
services/ - Background processing, SSE, retry logic
state/ - State management and SQLite persistence
schemas/ - Pydantic models and core types
frontend/src/ - React dashboard components

License & Credits

Built by David Shableski - Senior CS/Math major focused on agentic workflows and LLM evaluation

MIT License (see LICENSE file)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
agents		agents
api		api
artifacts		artifacts
data		data
frontend		frontend
schemas		schemas
scripts		scripts
services		services
state		state
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DEMO_INSTRUCTIONS.md		DEMO_INSTRUCTIONS.md
Dockerfile.api		Dockerfile.api
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RECOVERY_TESTING.md		RECOVERY_TESTING.md
STARTUP.md		STARTUP.md
api_demo.py		api_demo.py
demo.py		demo.py
demo_recovery.py		demo_recovery.py
docker-compose.yml		docker-compose.yml
main.py		main.py
mrasr_test_results.json		mrasr_test_results.json
orchestrator.py		orchestrator.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_recovery_tests.py		run_recovery_tests.py
run_server.py		run_server.py
safe_startup.py		safe_startup.py
simple_server_test.py		simple_server_test.py
simple_test_server.py		simple_test_server.py
test_frontend_integration.py		test_frontend_integration.py
test_mrasr_complete.py		test_mrasr_complete.py
test_ollama_connection.py		test_ollama_connection.py
test_plan_revision.py		test_plan_revision.py
test_priority_and_concurrency.py		test_priority_and_concurrency.py
test_simple.py		test_simple.py
test_system_fixed.py		test_system_fixed.py

License

DavidShableski/multistep-reasoning-agent-with-state-and-recovery

Folders and files

Latest commit

History

Repository files navigation