Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
- Set
data_pathinhparams.pyas the LJSpeech folder - Set
teacher_dirinhparams.pyas the data directory where the alignments and melspectrogram targets are saved - Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py
The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention
You can hear the audio samples here