You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in here, reduce_fx will help us average the loss that we pass to self.log. The real loss we get (print on the bar) is self.value / self.cumulated_batch_size. So the loss = self.criterion(logits, y) will be used for backward? or self.value / self.cumulated_batch_size?
The loss I got from loss=self.criterion(logits, y) is unnormalized gradients. So, the PL can do normalized gradients automatically? Do we just need to calculate the batch size loss and pass it to self.log? That is to see, if I set trainer = Trainer(accumulate_grad_batches=K), PL will do normalized gradients automatically to get normalized gradients for backward, even if I just return the batch size loss?
As discussed in here, the step in W&B is the optimization step which should be equal to self.global_step. The value of loss corresponds to the value of the self.global_step should be calculated as: loss *K. Do it ok that we just pass the batch size loss to self.log?
For LearningRateMonitor, the step in logging_interval is also the self.gloabl_step?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I am very confused about using the relation between them. First, I give my understanding:
For gradient accumulation, we need the normalized gradients for backward. In PyTorch, we need
For PL, the following are my codes
As discussed in here, reduce_fx will help us average the loss that we pass to self.log. The real loss we get (print on the bar) is self.value / self.cumulated_batch_size. So the loss = self.criterion(logits, y) will be used for backward? or self.value / self.cumulated_batch_size?
The loss I got from loss=self.criterion(logits, y) is unnormalized gradients. So, the PL can do normalized gradients automatically? Do we just need to calculate the batch size loss and pass it to self.log? That is to see, if I set trainer = Trainer(accumulate_grad_batches=K), PL will do normalized gradients automatically to get normalized gradients for backward, even if I just return the batch size loss?
As discussed in here, the step in W&B is the optimization step which should be equal to self.global_step. The value of loss corresponds to the value of the self.global_step should be calculated as: loss *K. Do it ok that we just pass the batch size loss to self.log?
For LearningRateMonitor, the step in logging_interval is also the self.gloabl_step?
Beta Was this translation helpful? Give feedback.
All reactions