-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Bug description
Utilising Automated Logging with self.log
and self.log_dict
as described in the documentation results in a shift of the logging frequency after various amounts of steps.
This is also observed in all train metrics but only in the _step
. It could be mutually exclusive to the WandbLogger
but has already been observed to some extent in the TensorboardLogger
as reported in #13525 and more generally in #10436.
How to reproduce the bug
Call self.log(, on_step=True, on_epoch=True)
in training_step
and let it run for more than 15k steps (in my case).
The logging rate initial is equal to log_every_n_steps=50
for some iterations but jumps wildly around for others.
batch_size=10
(specified in self.log(batch_size=10)
)
test_subjects=20
samples_per_subject=10
This equals to 200 samples per epoch and 20 steps per epoch. Even if log_every_n_steps=50
, this should log then precisely every 100 steps (as the first 50 is not met according to my understanding) and not jump around from 2 to 200.
Environment
- Lightning Component: Trainer/ LightningModule
- Python 3.9
- Pytorch-lightning 1.9.1 (installed with pip)
- PyTorch 1.13.1
- CUDA/ cuDNN: cuda11.6-cudnn8
- OS: Linux (Kernel 3.10.0-1160)
- Running environment: server