loss issue?

Thank you for this amazing project! I'm training a voice model but encountered an issue where several loss values suddenly become NaN during training.
During training, I observe the following loss pattern:

Normal training state:
INFO:mi-test:loss_disc=3.322, loss_gen=3.670, loss_fm=10.674, loss_mel=24.860, loss_kl=9.000

After some steps, losses suddenly become NaN:
INFO:mi-test:loss_disc=nan, loss_gen=nan, loss_fm=nan, loss_mel=23.440, loss_kl=9.000

Is this a training collapse? Should I rollback to the previous checkpoint?
Could this be caused by training data quality issues? 
What's the recommended solution?Should I:
   - Lower the learning rate?
   - Reduce batch size?
   - Disable FP16 training?
   - Clean the training dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss issue? #135

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

loss issue? #135

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions