A comprehensive suite of Retrieval Augmented Generation (RAG) chatbots for electrical codes and standards, supporting both OpenAI and Google Gemini APIs.
RAG/
โโโ json-chunked_rag/ # JSON-based RAG system
โ โโโ data/ # JSON data files
โ โ โโโ pec_text_chunks_prod.json # Text chunks (4.8MB)
โ โ โโโ PEC_tables.json # Table data (0.3MB)
โ โ โโโ PEC_tables_row_chunks.json # Table rows (1.0MB)
โ โโโ src/ # Source code
โ โ โโโ json_rag_chatbot.py # Streamlit web app
โ โ โโโ simple_json_rag.py # Command-line interface
โ โ โโโ data_inspector.py # Data analysis utility
โ โ โโโ ai_provider.py # AI provider abstraction
โ โโโ config.py # Configuration management
โ โโโ requirements.txt # Dependencies
โ โโโ setup.py # Setup script
โ โโโ README.md # JSON RAG documentation
โโโ pdf-chunked_rag/ # PDF-based RAG system
โ โโโ data/ # PDF and related files
โ โ โโโ PEC_Content_1-4_combined.pdf # Main PDF (56MB)
โ โ โโโ pec_text_chunks_prod.json # Pre-processed chunks
โ โ โโโ ... (other data files)
โ โโโ src/ # Source code
โ โ โโโ chatbot_rag.py # Streamlit web app
โ โ โโโ simple_rag.py # Command-line interface
โ โ โโโ ai_provider.py # AI provider abstraction
โ โโโ config.py # Configuration management
โ โโโ requirements.txt # Dependencies
โ โโโ setup.py # Setup script
โ โโโ README.md # PDF RAG documentation
โโโ env_template.txt # Environment setup template
โโโ README.md # This file
- ๐ค OpenAI GPT Models - GPT-3.5-turbo, GPT-4, etc.
- ๐ฎ Google Gemini - Gemini-1.5-flash and other models
- ๐ Easy switching between providers via configuration
- JSON RAG (
chunked_app/
) - Optimized for pre-processed JSON data - PDF RAG (
pdf_rag/
) - Direct PDF processing with text extraction
- ๐ฌ Web Interface - Beautiful Streamlit applications
- ๐ฅ๏ธ Command Line - Simple CLI for testing and automation
- ๐ Data Tools - Analysis and inspection utilities
For JSON-based RAG (Recommended):
cd chunked_app
For PDF-based RAG:
cd pdf_rag
Create a .env
file in your chosen directory:
# Copy the template
cp ../env_template.txt .env
# Edit with your preferred editor
notepad .env # or nano, vim, etc.
Example .env
configuration:
AI_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo
python setup.py
# or manually:
pip install -r requirements.txt
Web Interface:
streamlit run src/json_rag_chatbot.py # or src/chatbot_rag.py for PDF
Command Line:
python src/simple_json_rag.py # or src/simple_rag.py for PDF
Set AI_PROVIDER
in your .env
file:
Provider | Model Options | API Key Variable |
---|---|---|
openai |
gpt-3.5-turbo , gpt-4 , gpt-4o |
OPENAI_API_KEY |
gemini |
gemini-1.5-flash , gemini-pro |
GEMINI_API_KEY |
CHUNK_SIZE=1000 # Text chunk size
CHUNK_OVERLAP=200 # Overlap between chunks
MAX_RETRIEVAL_CHUNKS=5 # Number of chunks to retrieve
EMBEDDING_MODEL=all-MiniLM-L6-v2 # Sentence transformer model
Feature | JSON RAG | PDF RAG |
---|---|---|
Data Source | Pre-processed JSON | Direct PDF processing |
Setup Speed | Fast (data ready) | Slower (PDF extraction) |
Search Quality | High (optimized chunks) | Good (raw extraction) |
Memory Usage | Lower | Higher |
Customization | High | Medium |
Data Size | ~6MB JSON files | 56MB PDF + chunks |
- โ Production deployments
- โ High-performance search
- โ Pre-processed data
- โ Custom data structures
- โ Quick prototyping
- โ Direct PDF analysis
- โ Simple setup
- โ Document exploration
Both systems can answer questions like:
- General: "What are the main electrical safety requirements?"
- Specific: "What should be the distance between receptacles in walls?"
- Technical: "Show me motor starting current calculations"
- Codes: "What does NEC 210.12 require for AFCI protection?"
-
API Key Not Configured
Error: OPENAI_API_KEY must be set when using OpenAI
- Create
.env
file with your API keys - Restart the application
- Create
-
Module Import Errors
ModuleNotFoundError: No module named 'openai'
- Run
pip install -r requirements.txt
- Check you're in the correct directory
- Run
-
Data Files Missing
Error: File not found in data/
- Verify data files are in the correct folders
- Run setup script to check file locations
- Memory: Use JSON RAG for better memory efficiency
- Speed: Pre-process data for faster startup
- Accuracy: Adjust
MAX_RETRIEVAL_CHUNKS
for better context
- โ
API keys stored in
.env
files (not in code) - โ Local processing (no data sent except to chosen AI provider)
- โ ChromaDB local vector storage
- โ No hardcoded credentials
- Chunks: ~3,006 searchable chunks
- Startup: <2 minutes (first run)
- Query Response: 2-5 seconds
- Memory: ~500MB RAM
- Chunks: ~1,430 text chunks
- Startup: 3-10 minutes (PDF processing)
- Query Response: 2-5 seconds
- Memory: ~800MB RAM
- Choose the appropriate system folder
- Follow the existing code structure
- Update configuration as needed
- Test with both AI providers
- Update documentation
This project is for educational and research purposes. Please comply with the terms of service for your chosen AI provider (OpenAI/Google).
Get Started: Navigate to either chunked_app/
or pdf_rag/
and follow their respective README files for detailed setup instructions.