Skip to content

jsaluja/sttm-agent

Repository files navigation

Seva Agent: Real-Time Autonomous Prayer Assistant

OpenAI Hackathon Python PyTorch License

**๐Ÿ† OpenAI Open Model Hackathon 2025 Categories

  • For Humanity
  • Best Local Agent
  • Most Useful Fine-Tune
  • Wildcard

Autonomous AI agent that listens to live Sikh prayer services and autonomously displays synchronized Punjabi verses with English meanings, creating immersive spiritual experiences for 30M+ global devotees.

๐ŸŽฏ Problem Statement

Younger generations attending Gurdwara (Sikh temple) services understand spoken Punjabi but struggle with:

  • Reading Punjabi text in Gurmukhi script
  • Understanding authentic spiritual meanings
  • Active participation in 2-3 hour prayer services

Result: Passive listening without full spiritual engagement or language learning.

๐Ÿš€ Solution

Seva Agent transforms prayer experiences by:

  • Real-time ASR: Listens to live Gurbani recitation
  • Autonomous Display: Synchronizes projector with original Punjabi text + English meanings
  • Zero Operator: Eliminates need for manual control during services
  • Educational Impact: Enhances Punjabi literacy while deepening spiritual connection

๐Ÿ—๏ธ Architecture

๐ŸŽค Live Audio โ†’ ๐Ÿง  ASR Engine โ†’ ๐Ÿ” Ensemble Matching โ†’ ๐Ÿ–ฅ๏ธ Desktop Control โ†’ ๐Ÿ“บ Synchronized Display

Core Components

Component Technology Purpose
ASR Engine Fine-tuned SOTA ASR Models on Religious Texts Gurmukhi speech recognition
Verse Matching Ensemble algorithms Robust real-time alignment
Desktop Control OCR + Socket.IO Autonomous SikhiToTheMax integration
Navigation Anchor/Paath modes Smart positioning & drift detection

๐Ÿ› ๏ธ Installation

Prerequisites

Setup

  1. Clone Repository
git clone https://github.com/yourusername/sttm-agent.git
cd sttm-agent
  1. Install Dependencies
pip install -r requirements.txt
  1. Download Models
python build_index.py  # Builds local verse database
  1. Environment Setup
cp .env.example .env
# Add your HuggingFace token for model access
echo "HF_TOKEN=your_huggingface_token" >> .env

๐ŸŽฎ Usage

Quick Start

# Run the full autonomous agent
python orchestrator.py --mode agent

# Or run standalone sync mode for testing
python orchestrator.py --mode sync

Manual Control

# Direct agent execution
python agent_full.py

# Test UI automation
python sttm_ui_controller.py

๐Ÿ“Š Technical Details

ASR Pipeline

  • Fine-tuning: 60+ hours curated Gurbani dataset, 10+ epochs
  • Custom Tokenizer and Vocabulary: Gurmukhi Unicode (U+0A00-U+0A7F)
  • Real-time Processing: 16kHz, 2-second sliding windows, 1-second overlap

Ensemble Matching

def ensemble_score(asr_text, ground_truth):
    return weighted_average([
        rapidfuzz.fuzz.partial_ratio(asr_text, ground_truth) * 0.4,
        rapidfuzz.fuzz.token_set_ratio(asr_text, ground_truth) * 0.3,
        difflib.SequenceMatcher(None, asr_text, ground_truth).ratio() * 0.3
    ])

Performance Metrics

  • Latency: <300ms for ASR on chunk, <100ms for verse identification
  • Accuracy: 99%+ on domain test set
  • Throughput: Near Real-time Alignment

๐ŸŽฏ Key Features

  • โœ… Autonomous Operation: Zero human intervention required
  • โœ… Real-time Sync: Sub-second verse identification and display
  • โœ… Drift Detection: Automatic recovery from positioning errors
  • โœ… Leading Prediction: Anticipates verses for seamless transitions
  • โœ… Cultural Preservation: Maintains authentic sacred text integrity
  • โœ… Educational Value: Enhances Punjabi literacy and spiritual engagement

๐Ÿ”ง Configuration

Audio Settings

SAMPLE_RATE = 16000
CHUNK_DURATION = 2.0
OVERLAP = 1.0
SLIDING_WORDS = 24

Matching Thresholds

CONF_THRESHOLD = 72
PERSISTENCE_REQUIRED = 2
ANCHOR_STRONG_SCORE = 75
LEADING_TRIGGER_SCORE = 55

๐Ÿ“ Project Structure

sttm-agent/
โ”œโ”€โ”€ agent_full.py              # Main ASR engine
โ”œโ”€โ”€ orchestrator.py            # System coordinator
โ”œโ”€โ”€ sttm_ui_controller.py      # Desktop app automation
โ”œโ”€โ”€ sttm_sync_client.py        # STTM integration wrapper
โ”œโ”€โ”€ sttm_socketio.py           # Socket.IO communication
โ”œโ”€โ”€ verse_dataset.py           # Verse-to-shabad mapping
โ”œโ”€โ”€ build_index.py             # Local database builder
โ”œโ”€โ”€ fb_mms_1b_fine_tuning.py   # Fine tune ASR model
โ”œโ”€โ”€ local_banidb/              # Verse database
โ”‚   โ”œโ”€โ”€ line_store.json        # Verse content
โ”‚   โ””โ”€โ”€ inverted.json          # Search index
โ”œโ”€โ”€ requirements.txt           # Dependencies
โ””โ”€โ”€ README.md                  # This file

๐ŸŽฌ Demo

๐ŸŽฅ Watch Demo Video

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

ASR Model fine tuning (Optional)

python3 fb_mms_1b_fine_tuning_.py

๐Ÿ“ˆ Impact

  • Global Reach: Serving 30M+ Sikh devotees worldwide
  • Cultural Preservation: Digitizing and democratizing sacred text access
  • Educational Value: Improving Punjabi literacy in younger generations
  • Community Building: Creating inclusive spiritual experiences
  • Technical Innovation: Advancing low-resource language ASR

๐Ÿ”ฎ Future Roadmap

  • Mobile app integration
  • Edge optimization for limited compute environments
  • Federated learning across global deployments
  • Multi-language translation (10+ languages)
  • Custom ChatGPTs for personalized religious conversations

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI: For the Open Model Hackathon opportunity
  • NVIDIA: For GPU's for ASR model fine tuning
  • HuggingFace: For model hosting and datasets platform
  • Khalis Foundation: For SikhiToTheMax desktop application
  • Sikh Community: For inspiration and cultural guidance

๐Ÿ“ž Contact


Built with โค๏ธ and AI

Seva (selfless service) through technology

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages