Skip to content

loss issue? #135

@ZephyrGo

Description

@ZephyrGo

Thank you for this amazing project! I'm training a voice model but encountered an issue where several loss values suddenly become NaN during training.
During training, I observe the following loss pattern:

Normal training state:
INFO:mi-test:loss_disc=3.322, loss_gen=3.670, loss_fm=10.674, loss_mel=24.860, loss_kl=9.000

After some steps, losses suddenly become NaN:
INFO:mi-test:loss_disc=nan, loss_gen=nan, loss_fm=nan, loss_mel=23.440, loss_kl=9.000

Is this a training collapse? Should I rollback to the previous checkpoint?
Could this be caused by training data quality issues?
What's the recommended solution?Should I:

  • Lower the learning rate?
  • Reduce batch size?
  • Disable FP16 training?
  • Clean the training dataset?

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions