You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to reproduce ESC-base-no-adv but have encountered some issues.
According to the README, training a base ESC model on 4×RTX 4090 GPUs takes approximately 12 hours for 250k steps using 3-second speech clips with a batch size of 36. Based on the provided config, the batch size per GPU should be 9, totaling 36 across 4 GPUs.
1.a. I am using 2×A6000 (48GB) to train ESC-base-no-adv. To match the total batch size, I set 18 bs/GPU × 2 GPUs = 9 bs/GPU × 4 GPUs. However, the training speed appears significantly slower (~11x):
I know the performance of A6000 is different with 4090, but the training speed will not be lost as so much (I guess?).
1.b. I noticed that the train_data_path in config differs from the dns_training dataset you provided (it matches the one for ESC-large instead). Did you use a different dataset for ESC-base-no-adv?
I anticipate that I will try to do some research on your ESC repo and may have some follow-up questions. It would be even better if you would be willing to email me your personal contact information (e.g. WeChat). My email is isjiawei.du@gmail.com.
Thank for your work again and I look forward to your reply.
The text was updated successfully, but these errors were encountered:
1.a. I tested on my end using 4 RTX4090, and the estimated training time should be around 16 hours (so there is a mistake in the README). The accelerate configuration is set to the default.
I guess the CPU also plays a role (mine is an Intel Xeon Platinum 8352V), as audio I/O can be quite time-consuming. You might try adjusting the num_workers parameter in the DataLoader to speed things up. Besides, maybe you can try a smaller batch_size, as it won't affect the results too much.
1.b. All models in this work were trained on the same dataset. The difference in path names is due to ESC-large and ablation models being added in a later version, during which we reorganized the dataset folder.
Hi,
Thank you for your excellent work!
I am currently trying to reproduce ESC-base-no-adv but have encountered some issues.
I know the performance of A6000 is different with 4090, but the training speed will not be lost as so much (I guess?).
Thank for your work again and I look forward to your reply.
The text was updated successfully, but these errors were encountered: