Skip to content

[BUG]Quant qwen3-omni failed at save stage #2197

@allerou4

Description

@allerou4

Describe the bug

It happened at saving quantized model stage:
FileNotFoundError: Offloaded tensor 'code2wav.pre_transformer.layers.0.self_attn.q_proj.weight' not found in offload directory './gptqmodel_offload/splenopathy-unconstellated/'

And I did some debugging: https://github.com/ModelCloud/GPTQModel/blob/744fed8a49967ef82ff13df16f7e4503c75ec58c/gptqmodel/utils/model.py#L1297C25-L1297C29
the index_path that I print in my code:
"./gptqmodel_offload/splenopathy-unconstellated/code2wav.pre_transformer.layers.0.self_attn.q_proj/index.json"

but my local directory is:
rectangled-aerophagist/
├── code2wav
│ ├── index.json
│ └── module.safetensors
├── embed_tokens
│ ├── index.json
│ └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.down_proj
│ ├── index.json
│ └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.gate_proj
│ ├── index.json
│ └── module.safetensors
...

So no "code2wav.pre_transformer.layers.0.self_attn.q_proj" directory but only "code2wav", I think for "code2wav", it handles differently with "thinker" which leads to this bug.

GPU Info
| 0 NVIDIA RTX A3000 Laptop GPU On | 00000000:01:00.0 On | N/A |
| N/A 55C P8 19W / 80W | 5902MiB / 6144MiB | 0% Default |
| | | N/A |

Software Info

OS: ubuntu22.04 wsl
python: 3.10

Name: GPTQModel
Version: 5.1.0.dev0

Name: torch
Version: 2.9.1+cu126

Name: transformers
Version: 4.57.1

Name: accelerate
Version: 1.10.1

Name: triton
Version: 3.5.1

To Reproduce

So I load one layer of Qwen/Qwen3-Omni-30B-A3B-Instruct model, set every num_layers=1 to reproduce this bug quickly. And I loaded the whole model before still this bug shows so this won't matter.

My quant script:

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "qwen3-omni-layer1"
quant_path = "qwen3-omni-layer1-GPTQ-4bit"

calibration_dataset = load_dataset(
    "json",
    data_files="c4-train.00001-of-01024.json.gz",
    split="train"
  )

calibration_dataset = calibration_dataset.filter(lambda x: len(x["text"]) <= 1024)
calibration_dataset = calibration_dataset.select(range(1))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128, vram_strategy="balanced")

model = GPTQModel.load(model_id, quant_config)

# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)

model.save(quant_path)

Expected behavior

Save quantized model successfully

Model/Datasets

model: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
dataset: https://huggingface.co/datasets/allenai/c4/blob/main/en/c4-train.00001-of-01024.json.gz

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions