🎙️SeamlessSpeech-Multilingual-Speech-Text-Translator

A powerful Gradio-based interface built on Facebook’s SeamlessM4T-v2 model for multilingual speech ↔ text translation and generation.

📝 This project supports:

1) ASR — Automatic Speech Recognition (Audio → Text)

2) S2TT — Speech → Translated Text

3) S2ST — Speech → Speech Translation

4) T2ST — Text → Speech Generation

5) T2TT — Text → Text Translation

🚀 Features

Multilingual speech & text translation
Upload or record audio
Convert speech to speech, speech to text, text to speech, and text to text
Automatic 16 kHz audio preprocessing
Supports GPU acceleration via PyTorch
Clean Gradio UI with task-based flow
Audio output saved as .wav

📦Install dependencies

    pip install torch torchvision torchaudio
    pip install transformers soundfile librosa gradio numpy

📘 Supported Tasks:

ASR (Audio → Text):
1. Upload audio
2. Click Run
3. Output displays recognized text
S2TT (Speech → Translated Text):
1. Upload audio
2. Select target language
3. Produces translated text
S2ST (Speech → Speech Translation):
1. Upload audio
2. Choose target language
3. Generates translated .wav output
T2ST (Text → Speech Translation):
1. Provide text
2. Choose source & target language
3. Generates spoken audio output
T2TT (Text → Text Translation):
1. Enter text
2. Select target language
3. Produces translated text

🌍 Available Languages

	English     → eng
	Hindi       → hin
	Tamil       → tam
	Telugu      → tel
	Malayalam   → mal
	Spanish     → spa
	French      → fra
	German      → deu
	Chinese     → zho
	Japanese    → jpn
	Arabic      → ara
	Russian     → rus

You can extend this list anytime.

⚙️ How It Works

Audio is always resampled to 16 kHz using librosa.
Model outputs either:
1. text tokens → decoded via processor.decode
2. audio tensors → saved as .wav
GPU is automatically used if available (torch.cuda.is_available())

🧪 Troubleshooting

Issue Fix

Slow processing Use GPU runtime (especially in Colab)
CUDA OOM Reduce input length or switch to CPU
Audio not playing Ensure file is written correctly at 16 kHz
Wrong translation Verify tgt_lang code

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
SeamlessSpeech.ipynb		SeamlessSpeech.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️SeamlessSpeech-Multilingual-Speech-Text-Translator

📝 This project supports:

1) ASR — Automatic Speech Recognition (Audio → Text)

2) S2TT — Speech → Translated Text

3) S2ST — Speech → Speech Translation

4) T2ST — Text → Speech Generation

5) T2TT — Text → Text Translation

🚀 Features

📦Install dependencies

📘 Supported Tasks:

🌍 Available Languages

⚙️ How It Works

🧪 Troubleshooting

Issue Fix

About

Uh oh!

Languages

Gowthamtj17/Gowthamtj17-SeamlessSpeech-Multilingual-Speech-Text-Translator

Folders and files

Latest commit

History

Repository files navigation

🎙️SeamlessSpeech-Multilingual-Speech-Text-Translator

📝 This project supports:

1) ASR — Automatic Speech Recognition (Audio → Text)

2) S2TT — Speech → Translated Text

3) S2ST — Speech → Speech Translation

4) T2ST — Text → Speech Generation

5) T2TT — Text → Text Translation

🚀 Features

📦Install dependencies

📘 Supported Tasks:

🌍 Available Languages

⚙️ How It Works

🧪 Troubleshooting

Issue Fix

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages