Question about the SubTransformers sampling process.

Hi, 

Thanks a lot for releasing this great project. 
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would `reset_rand_seed` force all GPUs to sample the same random SubTransformer from the SuperNet? And is `trainer.get_num_updates()` the same at each train step? 

`configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(),
                  super_decoder_num_layer=args.decoder_layers)]`

Thanks a lot for your help. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the SubTransformers sampling process. #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the SubTransformers sampling process. #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions