-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
checkpointingRelated to checkpointingRelated to checkpointingfeatureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked onstrategy: fsdpFully Sharded Data ParallelFully Sharded Data Parallel
Description
Bug description
FSDPStrategy.load_checkpoint
casts checkpoint_path
to a pathlib.Path
here. This will bork URIs, such as cloud checkpoint paths, e.g. s3://...
.
Example:
from pathlib import Path
checkpoint_path = "s3://asd/asd"
assert Path(checkpoint_path).as_posix() == checkpoint_path
NOTE: I am reporting this merely by looking at the source code; I have yet to confirm this with a test.
What version are you seeing the problem on?
master
How to reproduce the bug
from lightning.pytorch.strategies import FSDPStrategy
FSDPStrategy(...).load_checkpoint("s3://my/checkpoint")
Error messages and logs
I believe this exception will be raised.
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
awaelchli
Metadata
Metadata
Assignees
Labels
checkpointingRelated to checkpointingRelated to checkpointingfeatureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked onstrategy: fsdpFully Sharded Data ParallelFully Sharded Data Parallel