Skip to content

Make asyncio checkpointing work if validate/fit is called more than once #20952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jjh42
Copy link

@jjh42 jjh42 commented Jul 1, 2025

What does this PR do?

Currently if using async checkpointing if fit or validate is called than once it will crash (because the threadpool is shutdown and never re-created).

  • This PR modifies the test to induce the crash and fixes it.

    No.

  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)

    No, this is just a bugfix, not a behavior change. Should I create an issue?

  • Did you read the contributor guideline, Pull Request section?

    Yes

  • Did you make sure your PR does only one thing, instead of bundling different changes together?

    Yes

  • Did you make sure to update the documentation with your changes? (if necessary)

    na

  • Did you write any new necessary tests? (not for typos and docs)

    yes

  • Did you verify new and existing tests pass locally with your changes?

    as best I could, I'm not very clear the recommended setup for testing pytorch lightning locally, I was only able to run the test I modified.

  • Did you list all the breaking changes introduced by this pull request?

    na

  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

    Yes


📚 Documentation preview 📚: https://pytorch-lightning--20952.org.readthedocs.build/en/20952/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jul 1, 2025
@Borda Borda changed the title Make asyncio checkpointing work if validate/fit is called more than o… Make asyncio checkpointing work if validate/fit is called more than once Jul 14, 2025
@jjh42
Copy link
Author

jjh42 commented Jul 15, 2025

let me know if this looks ok and then I'll fix the mypy errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant