loading from checkpoint but modifying the lr_schedulers #18387

YoelShoshan · 2023-08-24T18:57:27Z

YoelShoshan
Aug 24, 2023

Hi!
I want to load a checkpoint saved by pytorch-lightning, and continue training from that point,
and it's important that I'll be able to modify the lr_scheduler.

What I do is:

Create an instance of my pl.LightningModule (lightning_module= SomeLightningModule(...)that inherits frompl.LightningModule`)
Do trainer.fit(lightning_module, datamodule=lightning_data, ckpt_path=checkpoint_path)

However, regardless of how I instantiate my lightning_module, the lr_scheduler gets overriden by the lr_scheduler in the checkpoint.

Loading the checkpoints shows that it indeed has lr_scheduler info, which makes sense, in case you want to continue training from that point, in the same way it was originally designed.

In [6]: checkpoint.keys()
Out[6]: dict_keys(['epoch', 'global_step', 'pytorch-lightning_version', 'state_dict', 'loops', 'callbacks', 'optimizer_states', 'lr_schedulers', 'hparams_name', 'hyper_parameters', 'ProteinStructureDataModule'])

Is there a way to limit what is being loaded by pytorch-lightning from the checkpoint? can I exclude the checkpoint lr_schedulers from overriding what I have in the lightning module I've created?
I still want it to load the iteration num and act accordingly

A possible hack could be to save a modified version of the checkpoint without the lr_schedulers, but I'm hoping for a less hacky solution.
Any ideas?
Any help will be highly appreciated :)

Update:
I was able to solve this but in a pretty hacky way - any less hacky solutions/suggestions are welcome :)

Step 1:
Before calling trainer.fit(...) I do:
pl.trainer.connectors.checkpoint_connector.CheckpointConnector.restore_lr_schedulers = passthrough #monkey patch it
to disable restoring the lr_schedulers state from the checkpoint.

Step 2:
in my lightning module, I do:

if not self._stepped_forward_lr_schedulers_after_loading_from_checkpoint:
            print('stepping all global_step iterations in lr_scheduler for the case that we are loading from checkpoint')
            for i in trange(self.global_step):
                self._lr_scheduler.step()
            self._stepped_forward_lr_schedulers_after_loading_from_checkpoint = True

to make sure the lr_schedulers state is stepped through global_step times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

loading from checkpoint but modifying the lr_schedulers #18387

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

loading from checkpoint but modifying the lr_schedulers #18387

Uh oh!

Uh oh!

YoelShoshan Aug 24, 2023

Replies: 0 comments

YoelShoshan
Aug 24, 2023