Questions about training process #8

isjwdu · 2025-03-19T08:49:03Z

Hi,

Thank you for your excellent work!

I am currently trying to reproduce ESC-base-no-adv but have encountered some issues.

According to the README, training a base ESC model on 4×RTX 4090 GPUs takes approximately 12 hours for 250k steps using 3-second speech clips with a batch size of 36. Based on the provided config, the batch size per GPU should be 9, totaling 36 across 4 GPUs.

1.a. I am using 2×A6000 (48GB) to train ESC-base-no-adv. To match the total batch size, I set 18 bs/GPU × 2 GPUs = 9 bs/GPU × 4 GPUs. However, the training speed appears significantly slower (~11x):

<<<<Experimental Setup: esc-base-non-adv>>>>
   BatchSize_per_Device: Train 18 Test 4    LearningRate: 0.0001
   Total_Training_Steps: 5000*50=250000
   Pre-Training_Steps: 5000*15=75000
   Optimizer: AdamW    Scheduler: constant
   Quantization_Dropout: 0.75
   Model #Parameters: 8.74M
TQDM: 23:29<129:24:34

I know the performance of A6000 is different with 4090, but the training speed will not be lost as so much (I guess?).

1.b. I noticed that the train_data_path in config differs from the dns_training dataset you provided (it matches the one for ESC-large instead). Did you use a different dataset for ESC-base-no-adv?

I anticipate that I will try to do some research on your ESC repo and may have some follow-up questions. It would be even better if you would be willing to email me your personal contact information (e.g. WeChat). My email is isjiawei.du@gmail.com.

Thank for your work again and I look forward to your reply.

The text was updated successfully, but these errors were encountered:

yzGuu830 · 2025-03-20T00:09:17Z

Hello @isjwdu,

Thank you for your questions!

1.a. I tested on my end using 4 RTX4090, and the estimated training time should be around 16 hours (so there is a mistake in the README). The accelerate configuration is set to the default.

I guess the CPU also plays a role (mine is an Intel Xeon Platinum 8352V), as audio I/O can be quite time-consuming. You might try adjusting the num_workers parameter in the DataLoader to speed things up. Besides, maybe you can try a smaller batch_size, as it won't affect the results too much.

1.b. All models in this work were trained on the same dataset. The difference in path names is due to ESC-large and ablation models being added in a later version, during which we reorganized the dataset folder.

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training process #8

Questions about training process #8

isjwdu commented Mar 19, 2025

yzGuu830 commented Mar 20, 2025

Questions about training process #8

Questions about training process #8

Comments

isjwdu commented Mar 19, 2025

yzGuu830 commented Mar 20, 2025