Skip to content

Questions about training process #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
isjwdu opened this issue Mar 19, 2025 · 1 comment
Open

Questions about training process #8

isjwdu opened this issue Mar 19, 2025 · 1 comment

Comments

@isjwdu
Copy link

isjwdu commented Mar 19, 2025

Hi,

Thank you for your excellent work!

I am currently trying to reproduce ESC-base-no-adv but have encountered some issues.

  1. According to the README, training a base ESC model on 4×RTX 4090 GPUs takes approximately 12 hours for 250k steps using 3-second speech clips with a batch size of 36. Based on the provided config, the batch size per GPU should be 9, totaling 36 across 4 GPUs.
  • 1.a. I am using 2×A6000 (48GB) to train ESC-base-no-adv. To match the total batch size, I set 18 bs/GPU × 2 GPUs = 9 bs/GPU × 4 GPUs. However, the training speed appears significantly slower (~11x):
<<<<Experimental Setup: esc-base-non-adv>>>>
   BatchSize_per_Device: Train 18 Test 4    LearningRate: 0.0001
   Total_Training_Steps: 5000*50=250000
   Pre-Training_Steps: 5000*15=75000
   Optimizer: AdamW    Scheduler: constant
   Quantization_Dropout: 0.75
   Model #Parameters: 8.74M
TQDM: 23:29<129:24:34

I know the performance of A6000 is different with 4090, but the training speed will not be lost as so much (I guess?).

  • 1.b. I noticed that the train_data_path in config differs from the dns_training dataset you provided (it matches the one for ESC-large instead). Did you use a different dataset for ESC-base-no-adv?
  1. I anticipate that I will try to do some research on your ESC repo and may have some follow-up questions. It would be even better if you would be willing to email me your personal contact information (e.g. WeChat). My email is isjiawei.du@gmail.com.

Thank for your work again and I look forward to your reply.

@yzGuu830
Copy link
Owner

Hello @isjwdu,

Thank you for your questions!

1.a. I tested on my end using 4 RTX4090, and the estimated training time should be around 16 hours (so there is a mistake in the README). The accelerate configuration is set to the default.
Image
I guess the CPU also plays a role (mine is an Intel Xeon Platinum 8352V), as audio I/O can be quite time-consuming. You might try adjusting the num_workers parameter in the DataLoader to speed things up. Besides, maybe you can try a smaller batch_size, as it won't affect the results too much.

1.b. All models in this work were trained on the same dataset. The difference in path names is due to ESC-large and ablation models being added in a later version, during which we reorganized the dataset folder.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants