Skip to content

When running merge_lora.sh I get 'RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory' #165

@GiilDe

Description

@GiilDe
[2025-07-16 00:38:57,576] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Loading Qwen2-VL from base model...
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 66365.57it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.41it/s]
Loading additional Qwen2-VL weights...
Traceback (most recent call last):
  File "/qwen_finetune/src/merge_lora_weights.py", line 22, in <module>
    merge_lora(args)
  File "/qwen_finetune/src/merge_lora_weights.py", line 6, in merge_lora
    processor, model = load_pretrained_model(model_path=args.model_path, model_base=args.model_base,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/qwen_finetune/src/utils.py", line 61, in load_pretrained_model
    non_lora_trainables = torch.load(os.path.join(model_path, 'non_lora_state_dict.bin'), map_location='cpu')
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/train/lib/python3.11/site-packages/torch/serialization.py", line 1432, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/train/lib/python3.11/site-packages/torch/serialization.py", line 763, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

For some loras it worked after trying a few times, but I got bad results for them so not sure if related. For others it is persisting. Also I had to manually copy the config.json file since it wasn't present in the checkpoint folders.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions