Skip to content

【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

Notifications You must be signed in to change notification settings

Fsoft-AIC/RESOUND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

Paper License

Official repository for RESOUND, which reconstructs intelligible, expressive speech from silent talking-face videos via acoustic–semantic decomposed modeling.


📌 Citation

If you find this useful, please star 🌟 the repo and cite 📑:

@article{resound2025,
  title   = {RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling},
  author  = {Pham, Long-Khanh and Tran, Thanh V. T. and Pham, Minh-Tan and Nguyen, Van},
  journal = {Interspeech 2025},
  year    = {2025},
  url     = {https://arxiv.org/abs/2505.22024v1}
}

📕 Overview

RESOUND separates acoustic (prosody/timbre from a short speaker prompt) and semantic (linguistic content from visual cues) paths, then decodes mel-spectrograms + discrete units before vocoding to waveform. This disentanglement improves naturalness and intelligibility.


⚙️ Setup code environment

conda create -n resound python=3.10 -y
conda activate resound
pip install -r requirements.txt

📂 Data Preparation

Please follow the official pipeline from lip2speech-unit:
https://github.com/choijeongsoo/lip2speech-unit

This repository reuses the same directory structure, manifests, and features. No additional instructions are provided here.


🚀 Train

bash encoder/scripts/lrs3/train_avhubert_lrs3.sh

🔊 Inference

bash encoder/scripts/lrs3/inference_avhubert_lrs3-sh
bash vocoder/scripts/lrs3/inference.sh

🎗️ Acknowledgments

This repository is built using Fairseq, AV-HuBERT, ESPnet, speech-resynthesis. We appreciate the open source of the projects.

About

【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published