Skip to content

A complete hierarchical corrective multi-agent Retrieval-Augmented Generation (RAG) stack designed to ingest MathWorks documentation and power an interactive troubleshooting assistant for MATLAB and Simulink users.

Notifications You must be signed in to change notification settings

ThePredictiveDev/Hierarchical-Corrective-RAG-for-Troubleshooting

Β 
Β 

Repository files navigation

MATLAB / Simulink Real‑Time Troubleshooter

πŸ† 3rd Place Winner - IIT Mandi Deep Learning Hackathon 2025
A Multi-Agent Hierarchical Corrective RAG Architecture for MATLAB & Simulink Documentation

Python 3.8+ Redis FAISS License: MIT

πŸ“‘ Table of Contents

🌟 Overview

This repository contains a complete hierarchical corrective multi-agent Retrieval-Augmented Generation (RAG) stack designed to ingest MathWorks documentation and power an interactive troubleshooting assistant for MATLAB and Simulink users. The system provides full chain-of-thought transparency at every step of the reasoning process.

What makes this project unique:

  • Multi-Agent Architecture: Four distinct LLM calls orchestrated in a hierarchical pipeline
  • Explicit Chain-of-Thought: Full reasoning transparency at each processing stage
  • Novel Chunk Re-Ranking: LLM-driven semantic evaluation with explicit reasoning
  • Verification System: Adaptive verification thresholds with retry logic
  • Dual-Tier Memory: Short-term context and long-term Redis-backed memory

πŸ— Architecture

Architecture Diagram

4-Stage LLM Orchestration Pipeline

  1. Planner Agent (Llama-3-8B-8192)

    • Analyzes query to determine optimal k value for retrieval
    • Extracts salient domain-specific keywords
    • Filters off-topic questions outside MATLAB/Simulink domain
    • Produces planning chain-of-thought reasoning
  2. Chunk Re-Ranking Agent (Llama-3-8B-8192)

    • Re-scores vector-retrieved chunks for semantic relevance
    • Provides explicit reasoning for each ranking decision
    • Filters out irrelevant chunks despite vector similarity
  3. Writer Agent (DeepSeek-R1-Distill-Llama-70B)

    • Produces structured responses with three distinct sections:
      • THOUGHT: Internal reasoning summary
      • ACTION: Concrete troubleshooting steps
      • EVIDENCE: Citations with documentation links
  4. Verifier Agent (Llama-3.1-8B-Instant)

    • Validates writer output against original query
    • Ensures factual grounding in retrieved documentation
    • Implements adaptive verification thresholds with retry logic

Memory & Caching Architecture

  • Short-Term Memory: In-process deque of recent conversation turns
  • Long-Term Memory: Redis-backed storage with TTL-based eviction
  • Response Cache: Fast-path Redis cache (15-minute TTL) for frequent queries

Vector Search System

  • Embeddings: E5-small-v2 transformer model
  • Index: FAISS-HNSW (Hierarchical Navigable Small World)
  • Customizable Parameters: M (graph connectivity), efConstruction, efSearch

πŸ”¬ Technical Details

Data Processing Pipeline

  1. Document Cleaning (prep_matlab_docs.py)

    • Removes boilerplate content and navigation elements
    • Deduplicates sentences and normalizes whitespace
    • Extracts MathWorks-specific technical tags and error codes
    • Applies language detection to filter non-English content
    • Outputs JSON with content hash for deduplication
  2. Document Chunking (chunk_docs.py)

    • Tokenizes content using model-specific tokenizers
    • Implements window-chunking with 128-token segments
    • Applies 32-token stride for overlapping contexts
    • Manages token counts with either HuggingFace or TikToken
    • Preserves metadata including URLs and technical tags
  3. Index Building (build_index.py)

    • Embeds chunks with E5-small-v2 model
    • Constructs FAISS-HNSW index with configurable parameters
    • Implements normalization for cosine similarity
    • Supports both CPU and GPU acceleration
    • Saves embedding cache for rapid rebuilding

LLM Chain Orchestration

  1. Query Planning (planner_agent.py)

    • Implements few-shot examples for consistent output schema
    • Balances k between 4 (simple queries) and 8+ (complex queries)
    • Provides public and private reasoning chains
    • Handles off-topic detection with explicit rejection
    • Re-ranks chunks with explicit reasoning for each decision
  2. Response Generation (writer_agent.py)

    • Structured output with THOUGHT/ACTION/EVIDENCE sections
    • Token-based streaming for responsive UI
    • Explicit section markers for reliable parsing
    • Numbered evidence with source URL citations
    • Temperature controls for creativity vs. precision balance
  3. Response Verification (verifier_agent.py)

    • Iterative verification with increasing leniency
    • Timeout-based retry logic (minimum 5 iterations or 60 seconds)
    • JSON-structured verification decision and reasoning
    • Graceful fallback to best attempt if no verification succeeds

Memory & Caching System

  1. Short-Term Memory (memory.py)

    • Configurable maximum turn count (default: 10)
    • In-process deque with O(1) operations
    • Role-based storage (user/assistant)
    • Timestamp tracking for recency
  2. Long-Term Memory (memory.py)

    • Redis sorted set with score-based timestamp
    • TTL-based automatic eviction (default: 400 minutes)
    • Size-capped storage (default: 1000 entries maximum)
    • Content-hashed for deduplication
  3. Response Cache (cache.py)

    • Query-hashed lookup system
    • 15-minute TTL for freshness
    • Graceful fallback to in-memory cache on Redis failure
    • Minimal serialization overhead with direct JSON storage

Frontend & UI

  1. Gradio Interface (frontend.py)

    • Responsive chat-like interface with custom CSS
    • Collapsible panels for memory, chain-of-thought, and detailed logs
    • JSON logging for complete transparency
    • Memory and cache management controls
    • Section parsing and display formatting
  2. CLI Interface (chatbot_dep.py)

    • Terminal-based interactive chat
    • Complete pipeline visibility with stage-by-stage output
    • Raw JSON diagnostic output
    • Memory query intent detection
  3. Batch Evaluation (batch_chatbot_demo.py)

    • Automated testing of predefined queries
    • Structured output formatting
    • Performance logging
    • Results aggregation in text format

πŸ“‚ Directory Structure

/ (root)
β”œβ”€β”€ agents/                        # LLM agent implementations
β”‚   β”œβ”€β”€ planner_agent.py           # Query planning, chunk scoring
β”‚   β”œβ”€β”€ writer_agent.py            # Response generation
β”‚   └── verifier_agent.py          # Solution verification
β”‚
β”œβ”€β”€ data preprocessing/            # Preprocessing pipeline
β”‚   β”œβ”€β”€ prep_matlab_docs.py        # Document cleaning, tag extraction
β”‚   └── chunk_docs.py              # Tokenization, chunking, strides
β”‚
β”œβ”€β”€ index_tools_build_and_retrieve/ # Embedding and retrieval
β”‚   β”œβ”€β”€ build_index.py             # FAISS index construction
β”‚   └── retrieval.py               # Vector search + semantic reranking
β”‚
β”œβ”€β”€ stores_mem_and_cache/          # Storage subsystems
β”‚   β”œβ”€β”€ memory.py                  # STM/LTM implementation
β”‚   └── cache.py                   # Response caching
β”‚
β”œβ”€β”€ data (json+index+raw csv)/     # Data storage (gitignored)
β”‚   β”œβ”€β”€ raw_data.csv               # Source MathWorks documentation CSV
β”‚   β”œβ”€β”€ clean_docs.jsonl           # Clean, deduplicated documents
β”‚   β”œβ”€β”€ docs_chunks.jsonl          # Chunked documents with metadata
β”‚   β”œβ”€β”€ faiss.index                # FAISS vector index
β”‚   β”œβ”€β”€ metadata.jsonl             # Chunk metadata for retrieval
β”‚   └── embeddings.npy             # Cached embeddings for fast rebuilding
β”‚
β”œβ”€β”€ results/                       # Evaluation outputs
β”‚   └── batch_results.txt          # Results from batch testing
β”‚
β”œβ”€β”€ sample questions/              # Test data
β”‚   └── simulink_questions.txt     # Curated test questions
β”‚
β”œβ”€β”€ test scripts/                  # Diagnostics and validation
β”‚   β”œβ”€β”€ test_memory.py             # Memory system diagnostics
β”‚   └── test_verifier.py           # Verification system tests
β”‚
β”œβ”€β”€ frontend.py                    # Gradio UI implementation
β”œβ”€β”€ chatbot_dep.py                 # Core orchestration logic
β”œβ”€β”€ batch_chatbot_demo.py          # Batch evaluation runner
└── requirements.txt               # Dependencies

🧩 Prerequisites & Dependencies

System Requirements

  • Python 3.8+ (3.10+ recommended)
  • 8GB+ RAM (16GB+ recommended for larger indices)
  • CUDA-compatible GPU (optional, for faster embeddings)
  • Redis server 6.0+ (for memory and caching)

Core Dependencies

# LLM & HTTP clients
groq>=0.8.0          # Groq API client for LLM access
httpx>=0.23.0        # Async HTTP client

# Caching & memory
redis>=4.5.0         # Redis client for distributed memory/cache
langdetect>=1.0.9    # Language detection for preprocessing

# Retrieval & embeddings
faiss-cpu>=1.7.4     # Vector search (or faiss-gpu)
sentence-transformers>=2.2.2  # Embedding models

# Deep learning backends
torch>=2.0.0         # PyTorch for embeddings
tensorflow>=2.19.0   # TensorFlow (optional for some models)

# Frontend
gradio>=5.29.0       # UI framework

# Utilities
numpy>=1.24.0        # Numerical operations

API Keys

  • Groq API Key: Required for LLM access
    • Set as environment variable GROQ_API_KEY
    • Alternatively, hardcoded in agent files for demo purposes

πŸ”§ Installation & Setup

1. Clone Repository

git clone https://github.com/ThePredictiveDev/IITMandiHackathon-Group54.git
cd IITMandiHackathon-Group54

2. Create Virtual Environment

# Using venv
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Or using conda
conda create -n matlab-troubleshooter python=3.10
conda activate matlab-troubleshooter

3. Install Dependencies

pip install -r requirements.txt

# For GPU acceleration (optional)
pip uninstall -y faiss-cpu
pip install faiss-gpu

4. Set Up Redis

# Install Redis
# On Ubuntu/Debian
sudo apt-get install redis-server

# On macOS
brew install redis

# On Windows
# Download from https://github.com/microsoftarchive/redis/releases

# Start Redis server
redis-server

5. Set Environment Variables

# Linux/Mac
export GROQ_API_KEY="your_groq_api_key_here"
export REDIS_HOST="localhost"
export REDIS_PORT=6379
export REDIS_DB_MEMORY=1

# Windows
set GROQ_API_KEY=your_groq_api_key_here
set REDIS_HOST=localhost
set REDIS_PORT=6379
set REDIS_DB_MEMORY=1

6. Prepare Data & Build Index

# 1. Clean raw MathWorks documentation
python "data preprocessing/prep_matlab_docs.py" \
  --csv "data (json+index+raw csv)/raw_data.csv" \
  --out "data (json+index+raw csv)/clean_docs.jsonl" \
  --min-tokens 50

# 2. Chunk documents
python "data preprocessing/chunk_docs.py" \
  --input "data (json+index+raw csv)/clean_docs.jsonl" \
  --output "data (json+index+raw csv)/docs_chunks.jsonl" \
  --chunk-size 128 \
  --stride 32

# 3. Build FAISS index
python "index_tools_build_and_retrieve/build_index.py" \
  --chunks "data (json+index+raw csv)/docs_chunks.jsonl" \
  --index "data (json+index+raw csv)/faiss.index" \
  --meta "data (json+index+raw csv)/metadata.jsonl" \
  --cache "data (json+index+raw csv)/embeddings.npy" \
  --M 32 \
  --ef-constr 64 \
  --ef-search 128

πŸš€ Running the Application

Interactive CLI Mode

python chatbot_dep.py

The CLI interface allows interactive querying with detailed visibility into each pipeline stage:

  • Planner reasoning and query analysis
  • Chunk selection with scoring rationale
  • Writer drafting with THOUGHT/ACTION/EVIDENCE structure
  • Verification process and decisions

Enter questions at the prompt and type "exit" or "quit" to terminate.

Gradio Web Interface

python frontend.py

The web interface offers:

  • Chat-like interaction with the troubleshooter
  • Collapsible panels for memory inspection
  • Full chain-of-thought visibility
  • Detailed JSON logs for pipeline analysis
  • Memory and cache management controls

By default, the interface launches on http://localhost:7860

To share publicly (e.g., for demos):

python frontend.py --share=True

Batch Evaluation

python batch_chatbot_demo.py

Runs the system through 15 diverse Simulink/MATLAB queries and outputs results to batch_results.txt:

  • Standardized output format for consistent evaluation
  • Raw reasoning chains for comparison
  • ACTION steps for troubleshooting validation
  • EVIDENCE with citations to verify accuracy

βš™οΈ Configuration Parameters

Memory System

Parameter Default Description
STM_MAX_TURNS 10 Maximum number of turns in short-term memory
LTM_TTL_SECONDS 24000 (400 min) Time-to-live for long-term memory entries
LTM_MAX_ENTRIES 1000 Maximum number of entries in long-term memory

Index Parameters

Parameter Default Description
M 32 Graph connectivity in HNSW (higher = more accurate, slower)
efConstruction 64 Index build quality parameter (higher = better index, slower build)
efSearch 128 Search quality parameter (higher = more accurate, slower search)

LLM Configuration

Agent Model Temperature Max Tokens Purpose
Planner llama3-8b-8192 0.0 512 Query analysis, chunk scoring
Writer deepseek-r1-distill-llama-70b 0.3 2048 Response generation
Verifier llama-3.1-8b-instant 0.0 512 Solution verification

Chunking Parameters

Parameter Default Description
chunk_size 128 Token length of each chunk
stride 32 Overlap between consecutive chunks

πŸ” Troubleshooting

Common Issues

Redis Connection Failures

[cache] Redis error (Error 111 connecting to localhost:6379. Connection refused.); falling back to local cache

Solution:

# Check if Redis is running
redis-cli ping

# If not, start Redis
redis-server --daemonize yes

FAISS Index Loading Errors

OSError: Error loading FAISS index: <reason>

Solutions:

  • Ensure the index path is correct
  • Rebuild the index with compatible parameters
  • Check for GPU/CPU mismatch in FAISS installation

LLM API Rate Limiting

Error from Groq API: rate_limit_exceeded

Solutions:

  • Implement exponential backoff retry logic
  • Reduce batch sizes for evaluation
  • Use a different API key with higher limits

Vector Dimension Mismatch

RuntimeError: FAISS index has dimension X but embeddings have dimension Y

Solution: Rebuild the index with the same model used for query embeddings

Diagnostic Tools

# Test memory system
python "test scripts/test_memory.py"

# Test verification logic
python "test scripts/test_verifier.py"

# Test Redis connection
redis-cli ping

πŸš„ Performance Optimization

CPU vs. GPU Acceleration

The system supports both CPU and GPU modes:

# In build_index.py and retrieval.py:
EMBED_DEVICE = "cuda" if faiss.get_num_gpus() > 0 else "cpu"

For GPU acceleration:

pip uninstall -y faiss-cpu
pip install faiss-gpu

Memory Usage Considerations

  • Index Size: Scales linearly with document count and embedding dimension
  • Redis Memory: Configure maxmemory and eviction policies in redis.conf
  • Embedding Cache: Can be disabled for low-memory environments

Throughput Optimization

  • Batch Size: Adjust embedding batch size based on available memory
  • M Parameter: Lower values trade accuracy for speed in HNSW
  • Caching: Hot-path response cache significantly improves repeat query performance
  • Index Quantization: For very large indices, consider INT8 quantization

πŸ‘¨β€πŸ’» Contribution Guidelines

Code Style

  • Follow PEP 8 guidelines
  • Use type hints where appropriate
  • Document functions with docstrings
  • Use meaningful variable names
  • Implement proper error handling

Adding New Features

  1. Create a feature branch: git checkout -b feature/your-feature-name
  2. Implement your changes with tests
  3. Update documentation as needed
  4. Submit a pull request with a clear description

Testing

  • Add unit tests for new functionality
  • Ensure compatibility with the existing pipeline
  • Validate with real-world MATLAB/Simulink queries

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Competition: IIT Mandi Deep Learning Hackathon 2023
  • Model Providers: Groq for API access to powerful LLMs
  • Libraries: FAISS, Sentence Transformers, Redis, Gradio
  • Documentation: MathWorks for MATLAB/Simulink documentation corpus

Project ideated and designed by ThePredictiveDev and coded with vibes.

About

A complete hierarchical corrective multi-agent Retrieval-Augmented Generation (RAG) stack designed to ingest MathWorks documentation and power an interactive troubleshooting assistant for MATLAB and Simulink users.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%