GPT-QModel v5.2.0
Notable Changes:
- Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
AWQquantization now out of beta and now fully integrated into life cycle- New
VramStrategy.Balancedproperty to spreadMoEmodules to different gpus - New pure torch AWQ kernel
- New
calibration_concat_separatorproperty - Fixed HF bug that did not save
mtplayers for GLM 4.5/4.6 (air) models. - Fixed multi-gpu cuda asserts due to stream/sync
What's Changed
- try not adding mem guards for marlin kernel launch protection by @Qubitium in https://github.com/ModelCloud/GPTQModel/*pull/2108
- MoE vram by @Qubitium in #2110
- Fix GLM 4.5/4.6 and AIr not saving mtp layer after save (HF bug) by @LRL2-ModelCloud in #2109
- torchao 0.14.1 update by @Qubitium in #2111
- Test refractor by @Qubitium in #2113
- Bump the github-actions group with 2 updates by @dependabot[bot] in #2120
- [FIX] xpu unit test by @ZX-ModelCloud in #2122
- modular by @Qubitium in #2123
- update scores by @Qubitium in #2124
- Fp8 dequant by @Qubitium in #2125
- Model dequant by @Qubitium in #2126
- Fp4 e2m1 by @Qubitium in #2127
- [FIX] ovis2, compatible with transformers v4.57.1 by @ZX-ModelCloud in #2129
- fix cols padding by @LRL2-ModelCloud in #2130
- [FIX] ovis_1_6 quantization by @ZX-ModelCloud in #2131
- Minimax m2 by @Qubitium in #2128
- Fix awq marlin kernel for bf16 by @Qubitium in #2135
- [FIX] incorrect AWQ NODES by @ZX-ModelCloud in #2133
- add support_offload_to_disk check by @LRL2-ModelCloud in #2134
- Add Awq torch kernel by @Qubitium in #2137
- Marin by @Qubitium in #2139
- Marin scores by @Qubitium in #2141
- Fix triton version detection in nogil patcher by @amd-vlarakic in #2144
- Fix qwen2 omni by @LRL2-ModelCloud in #2140
- [MODEL] Add GraniteMoEHybrid by @ZX-ModelCloud in #2142
- Fold AWQ into proper Looper/Layer/Subset Lifecycle by @Qubitium in #2138
- Refine GPT-QModel description in README by @Qubitium in #2145
- fix device_map by @LRL2-ModelCloud in #2146
- [MODEL] Add Qwen3-VL by @techshoww in #2136
- Add calibration_concat_separator by @Qubitium in #2148
- add test_qwen3_vl.py by @LRL2-ModelCloud in #2147
- Fix triton monkeypatch by @Qubitium in #2149
- [MODEL] Add Brumby by @Qubitium in #2150
- Dedup/Cleanup by @Qubitium in #2151
- Prep for 5.2 release by @Qubitium in #2152
- Dedup3 by @Qubitium in #2153
- add missing file by @Qubitium in #2154
- GPTAQ rename by @Qubitium in #2155
- fix ci test by @Qubitium in #2158
- fix setup license by @Qubitium in #2160
- FIx snapshot_download receiving unsupported kwargs by @Qubitium in #2162
- Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups by @avtc in #2163
- Comments + Sync by @Qubitium in #2164
- Stats/Logs by @Qubitium in #2165
New Contributors
- @amd-vlarakic made their first contribution in #2144
- @techshoww made their first contribution in #2136
Full Changelog: v5.0.0...v5.2.0