**๐ OpenAI Open Model Hackathon 2025 Categories
- For Humanity
- Best Local Agent
- Most Useful Fine-Tune
- Wildcard
Autonomous AI agent that listens to live Sikh prayer services and autonomously displays synchronized Punjabi verses with English meanings, creating immersive spiritual experiences for 30M+ global devotees.
Younger generations attending Gurdwara (Sikh temple) services understand spoken Punjabi but struggle with:
- Reading Punjabi text in Gurmukhi script
- Understanding authentic spiritual meanings
- Active participation in 2-3 hour prayer services
Result: Passive listening without full spiritual engagement or language learning.
Seva Agent transforms prayer experiences by:
- Real-time ASR: Listens to live Gurbani recitation
- Autonomous Display: Synchronizes projector with original Punjabi text + English meanings
- Zero Operator: Eliminates need for manual control during services
- Educational Impact: Enhances Punjabi literacy while deepening spiritual connection
๐ค Live Audio โ ๐ง ASR Engine โ ๐ Ensemble Matching โ ๐ฅ๏ธ Desktop Control โ ๐บ Synchronized Display
Component | Technology | Purpose |
---|---|---|
ASR Engine | Fine-tuned SOTA ASR Models on Religious Texts | Gurmukhi speech recognition |
Verse Matching | Ensemble algorithms | Robust real-time alignment |
Desktop Control | OCR + Socket.IO | Autonomous SikhiToTheMax integration |
Navigation | Anchor/Paath modes | Smart positioning & drift detection |
- Python 3.8+
- macOS (for SikhiToTheMax integration)
- SikhiToTheMax Desktop App
- Microphone access
- Clone Repository
git clone https://github.com/yourusername/sttm-agent.git
cd sttm-agent
- Install Dependencies
pip install -r requirements.txt
- Download Models
python build_index.py # Builds local verse database
- Environment Setup
cp .env.example .env
# Add your HuggingFace token for model access
echo "HF_TOKEN=your_huggingface_token" >> .env
# Run the full autonomous agent
python orchestrator.py --mode agent
# Or run standalone sync mode for testing
python orchestrator.py --mode sync
# Direct agent execution
python agent_full.py
# Test UI automation
python sttm_ui_controller.py
- Fine-tuning: 60+ hours curated Gurbani dataset, 10+ epochs
- Custom Tokenizer and Vocabulary: Gurmukhi Unicode (U+0A00-U+0A7F)
- Real-time Processing: 16kHz, 2-second sliding windows, 1-second overlap
def ensemble_score(asr_text, ground_truth):
return weighted_average([
rapidfuzz.fuzz.partial_ratio(asr_text, ground_truth) * 0.4,
rapidfuzz.fuzz.token_set_ratio(asr_text, ground_truth) * 0.3,
difflib.SequenceMatcher(None, asr_text, ground_truth).ratio() * 0.3
])
- Latency: <300ms for ASR on chunk, <100ms for verse identification
- Accuracy: 99%+ on domain test set
- Throughput: Near Real-time Alignment
- โ Autonomous Operation: Zero human intervention required
- โ Real-time Sync: Sub-second verse identification and display
- โ Drift Detection: Automatic recovery from positioning errors
- โ Leading Prediction: Anticipates verses for seamless transitions
- โ Cultural Preservation: Maintains authentic sacred text integrity
- โ Educational Value: Enhances Punjabi literacy and spiritual engagement
SAMPLE_RATE = 16000
CHUNK_DURATION = 2.0
OVERLAP = 1.0
SLIDING_WORDS = 24
CONF_THRESHOLD = 72
PERSISTENCE_REQUIRED = 2
ANCHOR_STRONG_SCORE = 75
LEADING_TRIGGER_SCORE = 55
sttm-agent/
โโโ agent_full.py # Main ASR engine
โโโ orchestrator.py # System coordinator
โโโ sttm_ui_controller.py # Desktop app automation
โโโ sttm_sync_client.py # STTM integration wrapper
โโโ sttm_socketio.py # Socket.IO communication
โโโ verse_dataset.py # Verse-to-shabad mapping
โโโ build_index.py # Local database builder
โโโ fb_mms_1b_fine_tuning.py # Fine tune ASR model
โโโ local_banidb/ # Verse database
โ โโโ line_store.json # Verse content
โ โโโ inverted.json # Search index
โโโ requirements.txt # Dependencies
โโโ README.md # This file
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
python3 fb_mms_1b_fine_tuning_.py
- Global Reach: Serving 30M+ Sikh devotees worldwide
- Cultural Preservation: Digitizing and democratizing sacred text access
- Educational Value: Improving Punjabi literacy in younger generations
- Community Building: Creating inclusive spiritual experiences
- Technical Innovation: Advancing low-resource language ASR
- Mobile app integration
- Edge optimization for limited compute environments
- Federated learning across global deployments
- Multi-language translation (10+ languages)
- Custom ChatGPTs for personalized religious conversations
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI: For the Open Model Hackathon opportunity
- NVIDIA: For GPU's for ASR model fine tuning
- HuggingFace: For model hosting and datasets platform
- Khalis Foundation: For SikhiToTheMax desktop application
- Sikh Community: For inspiration and cultural guidance
- Project Lead: Jaspal Singh Saluja
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with โค๏ธ and AI
Seva (selfless service) through technology