A comprehensive deep learning project that implements and compares multiple neural network architectures for real-time sign language gesture classification using MediaPipe hand keypoints. The project includes a GUI-based real-time analysis tool for testing and evaluation.
- Multiple Model Architectures: GRU+Attention, BiGRU+Attention, TCN, Lightweight Transformer
- Real-Time Analysis: GUI application with webcam integration for live testing
- MediaPipe Integration: Automatic hand keypoint extraction from video streams
- Comprehensive Evaluation: Training, validation, and testing with detailed metrics
- ONNX Export: Model optimization for deployment
- Result Logging: Save and analyze test results
- Cross-Platform: Works on Windows, macOS, and Linux
Based on training results with 63,676 samples across 29 ASL classes:
| Model | Test Accuracy | Test F1 Score | Inference Time (ms) |
|---|---|---|---|
| BiGRU + Attention | 98.90% | 98.90% | 0.03 |
| GRU + Attention | 98.62% | 98.62% | 0.02 |
| TCN | 98.69% | 98.69% | 0.05 |
| Transformer | 98.48% | 98.48% | 0.04 |
βββ models/ # Model architectures
β βββ gru_attention.py # GRU + Attention model
β βββ bigru_attention.py # BiGRU + Attention model
β βββ tcn.py # Temporal Convolutional Network
β βββ transformer.py # Lightweight Transformer
βββ data_utils.py # Data loading and preprocessing
βββ train_all_models.py # Training and evaluation script
βββ real_time_analysis.py # GUI application for real-time testing
βββ extract_keypoints.py # MediaPipe keypoint extraction
βββ deploy_onnx.py # ONNX export and inference
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
βββ README.md # This file
- Python 3.8+ (recommended: Python 3.10)
- Webcam for real-time analysis
- CUDA-compatible GPU (optional, for faster training)
-
Clone the repository
git clone https://github.com/yourusername/real-time-sign-language-translation.git cd real-time-sign-language-translation -
Install dependencies
pip install -r requirements.txt
-
Download ASL dataset (optional - synthetic data is provided)
import kagglehub path = kagglehub.dataset_download("grassknoted/asl-alphabet")
Option A: Use synthetic data (recommended for testing)
python generate_synthetic_data.py
python train_all_models.pyOption B: Use real ASL dataset
python extract_keypoints.py
python train_all_models.pyLaunch the GUI application for real-time testing:
python real_time_analysis_2.pyGUI Features:
- Model selection dropdown
- Live webcam feed with hand landmark visualization
- Real-time prediction display
- Recording functionality for result logging
- Save/load test results
Export the best model for deployment:
python deploy_onnx.py --model bigru_attention --model_path outputs/best_bigru_attention.pth --onnx_path outputs/bigru_attention.onnx --data_type npy --data_path asl_keypoints.npy --labels_path asl_labels.npy --num_classes 29-
Launch the application
python real_time_analysis_2.py
-
Select a model from the dropdown menu
- BiGRU + Attention (recommended for best accuracy)
- GRU + Attention (fastest inference)
- TCN (good balance)
- Transformer (most complex)
-
Start the camera by clicking "Start Camera"
-
Position your hand in front of the camera
- Ensure good lighting
- Keep hand clearly visible
- Maintain consistent distance
-
Make ASL gestures and observe real-time predictions
- Start recording to log predictions
- Perform various gestures for comprehensive testing
- Stop recording when finished
- Save results to JSON file for analysis
- Clear results to start fresh
The system recognizes 29 ASL classes:
- Letters A-Z
- Space
- Nothing (no gesture)
- Delete
Edit model configurations in the respective model files:
# Example: models/gru_attention.py
class GRUAttentionModel(nn.Module):
def __init__(self, input_dim=63, hidden_dim=128, num_layers=2, num_classes=29, dropout=0.2):
# Adjust parameters as neededModify training settings in train_all_models.py:
# Training configuration
epochs = 20
batch_size = 64
learning_rate = 1e-3- Use CPU-optimized models: All models are optimized for CPU inference
- Adjust camera resolution: Lower resolution for faster processing
- Model selection: Choose based on accuracy vs. speed requirements
- Batch processing: Process multiple frames for better accuracy
- GPU acceleration: Use CUDA for faster training
- Data augmentation: Increase dataset size with synthetic data
- Hyperparameter tuning: Experiment with model architectures
- Transfer learning: Use pre-trained models for better performance
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- MediaPipe for hand landmark detection
- PyTorch for deep learning framework
- ASL Alphabet Dataset for training data
- OpenCV for computer vision utilities
For questions, issues, or contributions:
- Create an issue on GitHub
- Contact: chakrabortyanirban832@gmail.com
- Documentation: readme & documentation provided
- v1.0.0: Initial release with all model architectures and real-time GUI
- v0.9.0: Beta version with basic functionality
- v0.8.0: Alpha version with core models
Note: This project is designed for research and educational purposes. For production use, additional testing and validation is recommended.