[BUG]Quant qwen3-omni failed at save stage

**Describe the bug**

It happened at saving quantized model stage:
FileNotFoundError: Offloaded tensor 'code2wav.pre_transformer.layers.0.self_attn.q_proj.weight' not found in offload directory './gptqmodel_offload/splenopathy-unconstellated/'

And I did some debugging: https://github.com/ModelCloud/GPTQModel/blob/744fed8a49967ef82ff13df16f7e4503c75ec58c/gptqmodel/utils/model.py#L1297C25-L1297C29
the index_path that I print in my code:
"./gptqmodel_offload/splenopathy-unconstellated/code2wav.pre_transformer.layers.0.self_attn.q_proj/index.json"

but my local directory is:
rectangled-aerophagist/
├── code2wav
│   ├── index.json
│   └── module.safetensors
├── embed_tokens
│   ├── index.json
│   └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.down_proj
│   ├── index.json
│   └── module.safetensors
├── thinker.model.layers.0.mlp.experts.0.gate_proj
│   ├── index.json
│   └── module.safetensors
...

So no "code2wav.pre_transformer.layers.0.self_attn.q_proj" directory but only "code2wav", I think for "code2wav", it handles differently with "thinker" which leads to this bug.

**GPU Info**
|   0  NVIDIA RTX A3000 Laptop GPU    On  |   00000000:01:00.0  On |                  N/A |
| N/A   55C    P8             19W /   80W |    5902MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |

**Software Info**

OS: ubuntu22.04 wsl  
python: 3.10

Name: GPTQModel
Version: 5.1.0.dev0

Name: torch
Version: 2.9.1+cu126

Name: transformers
Version: 4.57.1

Name: accelerate
Version: 1.10.1

Name: triton
Version: 3.5.1


**To Reproduce**

So I load one layer of Qwen/Qwen3-Omni-30B-A3B-Instruct model, set every num_layers=1 to reproduce this bug quickly. And I loaded the whole model before still this bug shows so this won't matter.

My quant script:
```python
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "qwen3-omni-layer1"
quant_path = "qwen3-omni-layer1-GPTQ-4bit"

calibration_dataset = load_dataset(
    "json",
    data_files="c4-train.00001-of-01024.json.gz",
    split="train"
  )

calibration_dataset = calibration_dataset.filter(lambda x: len(x["text"]) <= 1024)
calibration_dataset = calibration_dataset.select(range(1))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128, vram_strategy="balanced")

model = GPTQModel.load(model_id, quant_config)

# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)

model.save(quant_path)
``` 

**Expected behavior**

Save quantized model successfully

**Model/Datasets**

model: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
dataset: https://huggingface.co/datasets/allenai/c4/blob/main/en/c4-train.00001-of-01024.json.gz




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]Quant qwen3-omni failed at save stage #2197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]Quant qwen3-omni failed at save stage #2197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions