-
Notifications
You must be signed in to change notification settings - Fork 548
Commit 21de42f
Enable GPTQModel (#2064)
* align gptq check to transformers for supporting cpu
* fix comment
* gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* compatible with auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix compatible with auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix compatible with auto-gptq linear
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert unrelated changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* gptqmodel need use checkpoint_format (#1)
* need checkpoint_format
* default value of checkpoint_format is gptq
* fix quantize
* fix quantize
* fix quantize
* Update quantizer.py
* need convert to v1 before gptqmodel save
* back checkpoint_format to gptq after convert
* cleanup code
* sym=False is not supported with auto-gptq
* add comments
* cleanup code
* Update quantizer.py
* always convert v2 to v1 if checkpoint_format = "gptq"
* Update quantizer.py
---------
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* Mod backend code (#2)
* keep gptq_v2 if sym is false
* use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init
* no need check backend
* use device_map
* cleanup
* Update quantizer.py
* move import
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* fix format and log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable gptqmodel tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update check quant type
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix optimum compat (#3)
* add meta info
* cleanup
* cleanup
* The value of quantizer should be an array
* Update quantizer.py
* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"
* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"
* Update quantizer.py
* cleanup
* comment on meta
* hf_select_quant_linear pass checkpoint_format
* add todo fix
* move convert code to quantizer.save()
* Update quantizer.py
* Optimize hf_convert_gptq_v2_to_v1_format()
* Optimize hf_convert_gptq_v1_to_v2_format()
* fix GPTQTestCUDA
* hf_select_quant_linear() always set pack=True
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* GPTQQuantizer add backend
* lower checkpoint_format and backend
* cleanup
* move backend to bottom
* no need to check gptqmodel version for ipex support
* Update import_utils.py
* Update quantizer.py
* fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value
* make version var short
* Update import_utils.py
* fix unittest
* use assertLessEqual
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: LRL <lrl@lbx.dev>
* fix format and convert v2 to v1
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* [Fix] all tensors not same device (#5)
* fix device error
* update gptqmodel version
* fix test
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* add gptqmodel tests which contains cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix all auto-gptq tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* rm gptqmodel yaml
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable real cpu tests by fp32
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix test model name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* keep the original device setting when using auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update optimum/gptq/quantizer.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* Update optimum/gptq/quantizer.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>1 parent 0ea269f commit 21de42fCopy full SHA for 21de42f
File tree
Expand file treeCollapse file tree
4 files changed
+227
-61
lines changedFilter options
- optimum
- gptq
- utils
Expand file treeCollapse file tree
4 files changed
+227
-61
lines changed
0 commit comments