-
Notifications
You must be signed in to change notification settings - Fork 128
Description
Describe the bug
It happened at saving quantized model stage:
FileNotFoundError: Offloaded tensor 'code2wav.pre_transformer.layers.0.self_attn.q_proj.weight' not found in offload directory './gptqmodel_offload/splenopathy-unconstellated/'
And I did some debugging: https://github.com/ModelCloud/GPTQModel/blob/744fed8a49967ef82ff13df16f7e4503c75ec58c/gptqmodel/utils/model.py#L1297C25-L1297C29
the index_path that I print in my code:
"./gptqmodel_offload/splenopathy-unconstellated/code2wav.pre_transformer.layers.0.self_attn.q_proj/index.json"
but my local directory is:
rectangled-aerophagist/
├── code2wav
│ ├── index.json
│ └── module.safetensors
├── embed_tokens
│ ├── index.json
│ └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.down_proj
│ ├── index.json
│ └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.gate_proj
│ ├── index.json
│ └── module.safetensors
...
So no "code2wav.pre_transformer.layers.0.self_attn.q_proj" directory but only "code2wav", I think for "code2wav", it handles differently with "thinker" which leads to this bug.
GPU Info
| 0 NVIDIA RTX A3000 Laptop GPU On | 00000000:01:00.0 On | N/A |
| N/A 55C P8 19W / 80W | 5902MiB / 6144MiB | 0% Default |
| | | N/A |
Software Info
OS: ubuntu22.04 wsl
python: 3.10
Name: GPTQModel
Version: 5.1.0.dev0
Name: torch
Version: 2.9.1+cu126
Name: transformers
Version: 4.57.1
Name: accelerate
Version: 1.10.1
Name: triton
Version: 3.5.1
To Reproduce
So I load one layer of Qwen/Qwen3-Omni-30B-A3B-Instruct model, set every num_layers=1 to reproduce this bug quickly. And I loaded the whole model before still this bug shows so this won't matter.
My quant script:
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
model_id = "qwen3-omni-layer1"
quant_path = "qwen3-omni-layer1-GPTQ-4bit"
calibration_dataset = load_dataset(
"json",
data_files="c4-train.00001-of-01024.json.gz",
split="train"
)
calibration_dataset = calibration_dataset.filter(lambda x: len(x["text"]) <= 1024)
calibration_dataset = calibration_dataset.select(range(1))["text"]
quant_config = QuantizeConfig(bits=4, group_size=128, vram_strategy="balanced")
model = GPTQModel.load(model_id, quant_config)
# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)
model.save(quant_path)Expected behavior
Save quantized model successfully
Model/Datasets
model: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
dataset: https://huggingface.co/datasets/allenai/c4/blob/main/en/c4-train.00001-of-01024.json.gz