FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Training

Set data_path in hparams.py as the LJSpeech folder
Set teacher_dir in hparams.py as the data directory where the alignments and melspectrogram targets are saved
Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention

train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

You can hear the audio samples here

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
figures		figures
filelists		filelists
modules		modules
text		text
training_log		training_log
utils		utils
wavs		wavs
LICENSE		LICENSE
README.md		README.md
audio_processing.py		audio_processing.py
data_inspection.ipynb		data_inspection.ipynb
generate_samples.ipynb		generate_samples.ipynb
hparams.py		hparams.py
index.html		index.html
inference.ipynb		inference.ipynb
layers.py		layers.py
requirement.txt		requirement.txt
stft.py		stft.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

FastSpeech

Training

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention

train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

Deepest-Project/FastSpeech

Folders and files

Latest commit

History

Repository files navigation

FastSpeech

Training

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention

train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages