【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

Official repository for RESOUND, which reconstructs intelligible, expressive speech from silent talking-face videos via acoustic–semantic decomposed modeling.

📌 Citation

If you find this useful, please star 🌟 the repo and cite 📑:

@article{resound2025,
  title   = {RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling},
  author  = {Pham, Long-Khanh and Tran, Thanh V. T. and Pham, Minh-Tan and Nguyen, Van},
  journal = {Interspeech 2025},
  year    = {2025},
  url     = {https://arxiv.org/abs/2505.22024v1}
}

📕 Overview

RESOUND separates acoustic (prosody/timbre from a short speaker prompt) and semantic (linguistic content from visual cues) paths, then decodes mel-spectrograms + discrete units before vocoding to waveform. This disentanglement improves naturalness and intelligibility.

⚙️ Setup code environment

conda create -n resound python=3.10 -y
conda activate resound
pip install -r requirements.txt

📂 Data Preparation

Please follow the official pipeline from lip2speech-unit:
https://github.com/choijeongsoo/lip2speech-unit

This repository reuses the same directory structure, manifests, and features. No additional instructions are provided here.

🚀 Train

bash encoder/scripts/lrs3/train_avhubert_lrs3.sh

🔊 Inference

bash encoder/scripts/lrs3/inference_avhubert_lrs3-sh
bash vocoder/scripts/lrs3/inference.sh

🎗️ Acknowledgments

This repository is built using Fairseq, AV-HuBERT, ESPnet, speech-resynthesis. We appreciate the open source of the projects.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
encoder		encoder
espnet/nets/pytorch_backend		espnet/nets/pytorch_backend
utils		utils
vocoder		vocoder
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

📌 Citation

📕 Overview

⚙️ Setup code environment

📂 Data Preparation

🚀 Train

🔊 Inference

🎗️ Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Fsoft-AIC/RESOUND

Folders and files

Latest commit

History

Repository files navigation

【Interspeech'2025 🎧】RESOUND: Speech Reconstruction from Silent Videos via Acoustic–Semantic Decomposed Modeling

📌 Citation

📕 Overview

⚙️ Setup code environment

📂 Data Preparation

🚀 Train

🔊 Inference

🎗️ Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages