Real-Time Sign Language to Speech Translation

A comprehensive deep learning project that implements and compares multiple neural network architectures for real-time sign language gesture classification using MediaPipe hand keypoints. The project includes a GUI-based real-time analysis tool for testing and evaluation.

🚀 Features

Multiple Model Architectures: GRU+Attention, BiGRU+Attention, TCN, Lightweight Transformer
Real-Time Analysis: GUI application with webcam integration for live testing
MediaPipe Integration: Automatic hand keypoint extraction from video streams
Comprehensive Evaluation: Training, validation, and testing with detailed metrics
ONNX Export: Model optimization for deployment
Result Logging: Save and analyze test results
Cross-Platform: Works on Windows, macOS, and Linux

📊 Model Performance

Based on training results with 63,676 samples across 29 ASL classes:

Model	Test Accuracy	Test F1 Score	Inference Time (ms)
BiGRU + Attention	98.90%	98.90%	0.03
GRU + Attention	98.62%	98.62%	0.02
TCN	98.69%	98.69%	0.05
Transformer	98.48%	98.48%	0.04

🏗️ Project Structure

├── models/                     # Model architectures
│   ├── gru_attention.py       # GRU + Attention model
│   ├── bigru_attention.py     # BiGRU + Attention model
│   ├── tcn.py                # Temporal Convolutional Network
│   └── transformer.py        # Lightweight Transformer
├── data_utils.py              # Data loading and preprocessing
├── train_all_models.py        # Training and evaluation script
├── real_time_analysis.py      # GUI application for real-time testing
├── extract_keypoints.py       # MediaPipe keypoint extraction
├── deploy_onnx.py            # ONNX export and inference
├── requirements.txt          # Python dependencies
├── setup.py                  # Package setup
└── README.md                 # This file

🛠️ Installation

Prerequisites

Python 3.8+ (recommended: Python 3.10)
Webcam for real-time analysis
CUDA-compatible GPU (optional, for faster training)

Quick Start

Clone the repository

git clone https://github.com/yourusername/real-time-sign-language-translation.git
cd real-time-sign-language-translation

Install dependencies
```
pip install -r requirements.txt
```

Download ASL dataset (optional - synthetic data is provided)

import kagglehub
path = kagglehub.dataset_download("grassknoted/asl-alphabet")

📖 Usage

1. Training Models

Option A: Use synthetic data (recommended for testing)

python generate_synthetic_data.py
python train_all_models.py

Option B: Use real ASL dataset

python extract_keypoints.py
python train_all_models.py

2. Real-Time Analysis

Launch the GUI application for real-time testing:

python real_time_analysis_2.py

GUI Features:

Model selection dropdown
Live webcam feed with hand landmark visualization
Real-time prediction display
Recording functionality for result logging
Save/load test results

3. ONNX Export

Export the best model for deployment:

python deploy_onnx.py --model bigru_attention --model_path outputs/best_bigru_attention.pth --onnx_path outputs/bigru_attention.onnx --data_type npy --data_path asl_keypoints.npy --labels_path asl_labels.npy --num_classes 29

🎯 Real-Time Analysis Guide

Getting Started

Launch the application
```
python real_time_analysis_2.py
```
Select a model from the dropdown menu
- BiGRU + Attention (recommended for best accuracy)
- GRU + Attention (fastest inference)
- TCN (good balance)
- Transformer (most complex)
Start the camera by clicking "Start Camera"
Position your hand in front of the camera
- Ensure good lighting
- Keep hand clearly visible
- Maintain consistent distance
Make ASL gestures and observe real-time predictions

Recording and Analysis

Start recording to log predictions
Perform various gestures for comprehensive testing
Stop recording when finished
Save results to JSON file for analysis
Clear results to start fresh

Supported Gestures

The system recognizes 29 ASL classes:

Letters A-Z
Space
Nothing (no gesture)
Delete

🔧 Configuration

Model Parameters

Edit model configurations in the respective model files:

# Example: models/gru_attention.py
class GRUAttentionModel(nn.Module):
    def __init__(self, input_dim=63, hidden_dim=128, num_layers=2, num_classes=29, dropout=0.2):
        # Adjust parameters as needed

Training Parameters

Modify training settings in train_all_models.py:

# Training configuration
epochs = 20
batch_size = 64
learning_rate = 1e-3

📈 Performance Optimization

For Real-Time Use

Use CPU-optimized models: All models are optimized for CPU inference
Adjust camera resolution: Lower resolution for faster processing
Model selection: Choose based on accuracy vs. speed requirements
Batch processing: Process multiple frames for better accuracy

For Training

GPU acceleration: Use CUDA for faster training
Data augmentation: Increase dataset size with synthetic data
Hyperparameter tuning: Experiment with model architectures
Transfer learning: Use pre-trained models for better performance

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

MediaPipe for hand landmark detection
PyTorch for deep learning framework
ASL Alphabet Dataset for training data
OpenCV for computer vision utilities

📞 Support

For questions, issues, or contributions:

Create an issue on GitHub
Contact: chakrabortyanirban832@gmail.com
Documentation: readme & documentation provided

🔄 Version History

v1.0.0: Initial release with all model architectures and real-time GUI
v0.9.0: Beta version with basic functionality
v0.8.0: Alpha version with core models

Note: This project is designed for research and educational purposes. For production use, additional testing and validation is recommended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Sign Language to Speech Translation

🚀 Features

📊 Model Performance

🏗️ Project Structure

🛠️ Installation

Prerequisites

Quick Start

📖 Usage

1. Training Models

2. Real-Time Analysis

3. ONNX Export

🎯 Real-Time Analysis Guide

Getting Started

Recording and Analysis

Supported Gestures

🔧 Configuration

Model Parameters

Training Parameters

📈 Performance Optimization

For Real-Time Use

For Training

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Support

🔄 Version History

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
.gitignore		.gitignore
DEPLOYMENT_COMPLETE.md		DEPLOYMENT_COMPLETE.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
data_utils.py		data_utils.py
deploy_onnx.py		deploy_onnx.py
extract_keypoints.py		extract_keypoints.py
generate_synthetic_data.py		generate_synthetic_data.py
quick_start.py		quick_start.py
real_time_analysis.py		real_time_analysis.py
real_time_analysis_2.py		real_time_analysis_2.py
real_time_analysis_vid_save.py		real_time_analysis_vid_save.py
requirements.txt		requirements.txt
setup.py		setup.py
setup_github.py		setup_github.py
train_all_models.py		train_all_models.py
train_and_evaluate.py		train_and_evaluate.py

License

codewithanirban/sign-language-translation-ai

Folders and files

Latest commit

History

Repository files navigation

Real-Time Sign Language to Speech Translation

🚀 Features

📊 Model Performance

🏗️ Project Structure

🛠️ Installation

Prerequisites

Quick Start

📖 Usage

1. Training Models

2. Real-Time Analysis

3. ONNX Export

🎯 Real-Time Analysis Guide

Getting Started

Recording and Analysis

Supported Gestures

🔧 Configuration

Model Parameters

Training Parameters

📈 Performance Optimization

For Real-Time Use

For Training

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Support

🔄 Version History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages