loading from checkpoint but modifying the lr_schedulers #18387
YoelShoshan
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I want to load a checkpoint saved by pytorch-lightning, and continue training from that point,
and it's important that I'll be able to modify the lr_scheduler.
What I do is:
pl.LightningModule (
lightning_module= SomeLightningModule(...)that inherits from
pl.LightningModule`)trainer.fit(lightning_module, datamodule=lightning_data, ckpt_path=checkpoint_path)
However, regardless of how I instantiate my lightning_module, the lr_scheduler gets overriden by the lr_scheduler in the checkpoint.
Loading the checkpoints shows that it indeed has lr_scheduler info, which makes sense, in case you want to continue training from that point, in the same way it was originally designed.
Is there a way to limit what is being loaded by pytorch-lightning from the checkpoint? can I exclude the checkpoint lr_schedulers from overriding what I have in the lightning module I've created?
I still want it to load the iteration num and act accordingly
A possible hack could be to save a modified version of the checkpoint without the lr_schedulers, but I'm hoping for a less hacky solution.
Any ideas?
Any help will be highly appreciated :)
Update:
I was able to solve this but in a pretty hacky way - any less hacky solutions/suggestions are welcome :)
Step 1:
Before calling
trainer.fit(...)
I do:pl.trainer.connectors.checkpoint_connector.CheckpointConnector.restore_lr_schedulers = passthrough #monkey patch it
to disable restoring the lr_schedulers state from the checkpoint.
Step 2:
in my lightning module, I do:
to make sure the lr_schedulers state is stepped through
global_step
times.Beta Was this translation helpful? Give feedback.
All reactions