Skip to content

Retrieval Augmented Generation (RAG) chatbots for electrical codes and standards, supporting both OpenAI and Google Gemini APIs.

Notifications You must be signed in to change notification settings

Norbera0/RAG-Chatbot-Suite-for-Philippine-Electrical-Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

RAG Chatbot Suite for Electrical Codes

A comprehensive suite of Retrieval Augmented Generation (RAG) chatbots for electrical codes and standards, supporting both OpenAI and Google Gemini APIs.

๐Ÿ“ Project Structure

RAG/
โ”œโ”€โ”€ json-chunked_rag/                       # JSON-based RAG system
โ”‚   โ”œโ”€โ”€ data/                          # JSON data files
โ”‚   โ”‚   โ”œโ”€โ”€ pec_text_chunks_prod.json # Text chunks (4.8MB)
โ”‚   โ”‚   โ”œโ”€โ”€ PEC_tables.json           # Table data (0.3MB)
โ”‚   โ”‚   โ””โ”€โ”€ PEC_tables_row_chunks.json # Table rows (1.0MB)
โ”‚   โ”œโ”€โ”€ src/                           # Source code
โ”‚   โ”‚   โ”œโ”€โ”€ json_rag_chatbot.py        # Streamlit web app
โ”‚   โ”‚   โ”œโ”€โ”€ simple_json_rag.py         # Command-line interface
โ”‚   โ”‚   โ”œโ”€โ”€ data_inspector.py          # Data analysis utility
โ”‚   โ”‚   โ””โ”€โ”€ ai_provider.py             # AI provider abstraction
โ”‚   โ”œโ”€โ”€ config.py                      # Configuration management
โ”‚   โ”œโ”€โ”€ requirements.txt               # Dependencies
โ”‚   โ”œโ”€โ”€ setup.py                       # Setup script
โ”‚   โ””โ”€โ”€ README.md                      # JSON RAG documentation
โ”œโ”€โ”€ pdf-chunked_rag/                           # PDF-based RAG system
โ”‚   โ”œโ”€โ”€ data/                          # PDF and related files
โ”‚   โ”‚   โ”œโ”€โ”€ PEC_Content_1-4_combined.pdf # Main PDF (56MB)
โ”‚   โ”‚   โ”œโ”€โ”€ pec_text_chunks_prod.json # Pre-processed chunks
โ”‚   โ”‚   โ””โ”€โ”€ ... (other data files)
โ”‚   โ”œโ”€โ”€ src/                           # Source code
โ”‚   โ”‚   โ”œโ”€โ”€ chatbot_rag.py             # Streamlit web app
โ”‚   โ”‚   โ”œโ”€โ”€ simple_rag.py              # Command-line interface
โ”‚   โ”‚   โ””โ”€โ”€ ai_provider.py             # AI provider abstraction
โ”‚   โ”œโ”€โ”€ config.py                      # Configuration management
โ”‚   โ”œโ”€โ”€ requirements.txt               # Dependencies
โ”‚   โ”œโ”€โ”€ setup.py                       # Setup script
โ”‚   โ””โ”€โ”€ README.md                      # PDF RAG documentation
โ”œโ”€โ”€ env_template.txt                   # Environment setup template
โ””โ”€โ”€ README.md                          # This file

๐Ÿš€ Features

Dual AI Provider Support

  • ๐Ÿค– OpenAI GPT Models - GPT-3.5-turbo, GPT-4, etc.
  • ๐Ÿ”ฎ Google Gemini - Gemini-1.5-flash and other models
  • ๐Ÿ”„ Easy switching between providers via configuration

Two Complete RAG Systems

  1. JSON RAG (chunked_app/) - Optimized for pre-processed JSON data
  2. PDF RAG (pdf_rag/) - Direct PDF processing with text extraction

Multiple Interfaces

  • ๐Ÿ’ฌ Web Interface - Beautiful Streamlit applications
  • ๐Ÿ–ฅ๏ธ Command Line - Simple CLI for testing and automation
  • ๐Ÿ“Š Data Tools - Analysis and inspection utilities

โšก Quick Start

1. Choose Your System

For JSON-based RAG (Recommended):

cd chunked_app

For PDF-based RAG:

cd pdf_rag

2. Setup Environment

Create a .env file in your chosen directory:

# Copy the template
cp ../env_template.txt .env
# Edit with your preferred editor
notepad .env  # or nano, vim, etc.

Example .env configuration:

AI_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo

3. Install Dependencies

python setup.py
# or manually:
pip install -r requirements.txt

4. Run the Application

Web Interface:

streamlit run src/json_rag_chatbot.py  # or src/chatbot_rag.py for PDF

Command Line:

python src/simple_json_rag.py  # or src/simple_rag.py for PDF

๐Ÿ”ง Configuration

AI Provider Selection

Set AI_PROVIDER in your .env file:

Provider Model Options API Key Variable
openai gpt-3.5-turbo, gpt-4, gpt-4o OPENAI_API_KEY
gemini gemini-1.5-flash, gemini-pro GEMINI_API_KEY

System Parameters

CHUNK_SIZE=1000              # Text chunk size
CHUNK_OVERLAP=200            # Overlap between chunks
MAX_RETRIEVAL_CHUNKS=5       # Number of chunks to retrieve
EMBEDDING_MODEL=all-MiniLM-L6-v2  # Sentence transformer model

๐Ÿ“Š System Comparison

Feature JSON RAG PDF RAG
Data Source Pre-processed JSON Direct PDF processing
Setup Speed Fast (data ready) Slower (PDF extraction)
Search Quality High (optimized chunks) Good (raw extraction)
Memory Usage Lower Higher
Customization High Medium
Data Size ~6MB JSON files 56MB PDF + chunks

๐ŸŽฏ Use Cases

JSON RAG - Best for:

  • โœ… Production deployments
  • โœ… High-performance search
  • โœ… Pre-processed data
  • โœ… Custom data structures

PDF RAG - Best for:

  • โœ… Quick prototyping
  • โœ… Direct PDF analysis
  • โœ… Simple setup
  • โœ… Document exploration

๐Ÿ” Example Queries

Both systems can answer questions like:

  • General: "What are the main electrical safety requirements?"
  • Specific: "What should be the distance between receptacles in walls?"
  • Technical: "Show me motor starting current calculations"
  • Codes: "What does NEC 210.12 require for AFCI protection?"

๐Ÿ› Troubleshooting

Common Issues

  1. API Key Not Configured

    Error: OPENAI_API_KEY must be set when using OpenAI
    
    • Create .env file with your API keys
    • Restart the application
  2. Module Import Errors

    ModuleNotFoundError: No module named 'openai'
    
    • Run pip install -r requirements.txt
    • Check you're in the correct directory
  3. Data Files Missing

    Error: File not found in data/
    
    • Verify data files are in the correct folders
    • Run setup script to check file locations

Performance Tips

  • Memory: Use JSON RAG for better memory efficiency
  • Speed: Pre-process data for faster startup
  • Accuracy: Adjust MAX_RETRIEVAL_CHUNKS for better context

๐Ÿ” Security

  • โœ… API keys stored in .env files (not in code)
  • โœ… Local processing (no data sent except to chosen AI provider)
  • โœ… ChromaDB local vector storage
  • โœ… No hardcoded credentials

๐Ÿ“ˆ Performance Metrics

JSON RAG Performance

  • Chunks: ~3,006 searchable chunks
  • Startup: <2 minutes (first run)
  • Query Response: 2-5 seconds
  • Memory: ~500MB RAM

PDF RAG Performance

  • Chunks: ~1,430 text chunks
  • Startup: 3-10 minutes (PDF processing)
  • Query Response: 2-5 seconds
  • Memory: ~800MB RAM

๐Ÿค Contributing

  1. Choose the appropriate system folder
  2. Follow the existing code structure
  3. Update configuration as needed
  4. Test with both AI providers
  5. Update documentation

๐Ÿ“„ License

This project is for educational and research purposes. Please comply with the terms of service for your chosen AI provider (OpenAI/Google).


Get Started: Navigate to either chunked_app/ or pdf_rag/ and follow their respective README files for detailed setup instructions.

About

Retrieval Augmented Generation (RAG) chatbots for electrical codes and standards, supporting both OpenAI and Google Gemini APIs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages