-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Bug description
Description
I'm training a model based on number of iterations instead of a number of epochs. The same model trains on datasets of different sizes, hence one epoch differs in the number of iterations. Let's say I want to train e.g. a model for 900 iterations which corresponds to 90 epochs on one of the datasets and want to have a stepwise lr scheduler on iteration 300 & 600. To my understanding this is not natively possible in the pytorch lightning environment.
I know that I can change the lr scheduler interval to "step" and then set the frequency, like so:
'lr_scheduler': {"scheduler": sched, "interval": "step", "frequency": 300}
However this only applies the steps within one epoch. If I set the frequency larger than the number of iteration per epoch no scheduler step is applied. I would assume that the expected behaviour is to call the scheduler.step() every n frequency across multiple epochs.
What version are you seeing the problem on?
v2_0
How to reproduce the bug
import os
import torch
from torch.utils.data import DataLoader, Dataset
from pytorch_lightning import LightningModule, Trainer
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
loss = self(batch).sum()
for param_group in self.optimizers().optimizer.param_groups:
lr = param_group['lr']
self.log('lr', lr, prog_bar=True, on_step=True, on_epoch=False)
return {"loss": loss}
def configure_optimizers(self):
opt = torch.optim.SGD(self.layer.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(opt, 1)
return {"optimizer": opt, 'lr_scheduler': {"scheduler": scheduler,
"interval": "step",
"frequency": 10}}
def run():
train_data = DataLoader(RandomDataset(32, 32), batch_size=8)
model = BoringModel()
trainer = Trainer(
accelerator='cpu',
default_root_dir=os.getcwd(),
num_sanity_val_steps=0,
max_epochs=-1,
max_steps=30,
log_every_n_steps=1
)
trainer.fit(model, train_dataloaders=train_data)
if __name__ == "__main__":
run()