Open
Description
System Info
- `Accelerate` version: 1.6.0
- Platform: Linux-5.10.0-34-cloud-amd64-x86_64-with-glibc2.31
- `accelerate` bash location: /xxx/training/.venv/bin/accelerate
- Python version: 3.12.4
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch SDAA available: False
- PyTorch MUSA available: False
- System RAM: 1842.60 GB
- GPU type: NVIDIA H100 80GB HBM3
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI-GPU
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - My own task or dataset (give details below)
Reproduction
Error Message:
[rank7]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
[rank7]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will
likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
[rank7]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
[rank7]: WeightsUnpickler error: Unsupported global: GLOBAL omegaconf.listconfig.ListConfig was not an allowed global by default. Please use `torch.serialization.add_safe_globals([ListConfig])` or
the `torch.serialization.safe_globals([ListConfig])` context manager to allowlist this global if you trust this class/function.
Reproduce:
- Save a state using Accelerator.save_state() with an optimizer that includes custom objects in its state (like omegaconf.listconfig.ListConfig)
- Try to load the state using Accelerator.load_state() on PyTorch 2.6+
Current behavior:
When calling load_state(), it fails with an unpickling error because torch.load() now uses weights_only=True by default in PyTorch 2.6, which restricts loading certain custom objects.
Expected behavior
The load_state() method should provide a way to pass custom parameters to the underlying torch.load() calls, specifically to set weights_only=False or to add safe globals when needed.
Proposed solution
Add parameters to the load_state() method that allow passing keyword arguments to torch.load() for each component:
- optimizer_load_kwargs: Pass to optimizer's load function
- scheduler_load_kwargs: Pass to scheduler's load function
- dataloader_load_kwargs: Pass to dataloader's load function
Metadata
Metadata
Assignees
Labels
No labels