-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Hi,
Thanks a lot for releasing this great project.
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed
force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates()
the same at each train step?
configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]
Thanks a lot for your help.
Metadata
Metadata
Assignees
Labels
No labels