Skip to content

Gowthamtj17/Gowthamtj17-SeamlessSpeech-Multilingual-Speech-Text-Translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

🎙️SeamlessSpeech-Multilingual-Speech-Text-Translator

A powerful Gradio-based interface built on Facebook’s SeamlessM4T-v2 model for multilingual speech ↔ text translation and generation.

Screenshot from 2025-11-26 16-36-02

📝 This project supports:

1) ASR — Automatic Speech Recognition (Audio → Text)

Screenshot from 2025-11-26 16-28-16

2) S2TT — Speech → Translated Text

Screenshot from 2025-11-26 15-59-36

3) S2ST — Speech → Speech Translation

Screenshot from 2025-11-26 16-24-28

4) T2ST — Text → Speech Generation

Screenshot from 2025-11-26 16-12-08

5) T2TT — Text → Text Translation

Screenshot from 2025-11-26 16-08-53

🚀 Features

  1. Multilingual speech & text translation
  2. Upload or record audio
  3. Convert speech to speech, speech to text, text to speech, and text to text
  4. Automatic 16 kHz audio preprocessing
  5. Supports GPU acceleration via PyTorch
  6. Clean Gradio UI with task-based flow
  7. Audio output saved as .wav

📦Install dependencies

    pip install torch torchvision torchaudio
    pip install transformers soundfile librosa gradio numpy

📘 Supported Tasks:

  1. ASR (Audio → Text):

    1. Upload audio

    2. Click Run

    3. Output displays recognized text

  2. S2TT (Speech → Translated Text):

    1. Upload audio

    2. Select target language

    3. Produces translated text

  3. S2ST (Speech → Speech Translation):

    1. Upload audio

    2. Choose target language

    3. Generates translated .wav output

  4. T2ST (Text → Speech Translation):

    1. Provide text

    2. Choose source & target language

    3. Generates spoken audio output

  5. T2TT (Text → Text Translation):

    1. Enter text

    2. Select target language

    3. Produces translated text

🌍 Available Languages

	English     → eng
	Hindi       → hin
	Tamil       → tam
	Telugu      → tel
	Malayalam   → mal
	Spanish     → spa
	French      → fra
	German      → deu
	Chinese     → zho
	Japanese    → jpn
	Arabic      → ara
	Russian     → rus  

You can extend this list anytime.

⚙️ How It Works

  1. Audio is always resampled to 16 kHz using librosa.
  2. Model outputs either:
    1. text tokens → decoded via processor.decode
    2. audio tensors → saved as .wav
  3. GPU is automatically used if available (torch.cuda.is_available())

🧪 Troubleshooting

Issue Fix

  1. Slow processing Use GPU runtime (especially in Colab)
  2. CUDA OOM Reduce input length or switch to CPU
  3. Audio not playing Ensure file is written correctly at 16 kHz
  4. Wrong translation Verify tgt_lang code

About

Multilingual speech & text translator built with SeamlessM4T-v2 and Gradio. Supports ASR, S2ST, T2ST, S2TT, T2TT.

Topics

Resources

Stars

Watchers

Forks