v0.3.0
Highlights
- AIU support: new example added for model conversion for AIU (see
examples/AIU_CONVERSIONfolder) and new add-ons forfms - triton kernel for specialized matmul HW simulation and verification
- microscaling format support by integrating functionalities from microsoft
mxpackage (seeexamples/MXfor more details) - other upgrades and improvements:
qmodel_preptracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now- Upgrade base dependencies to
torch 2.5,python 3.12and migrated fromauto_gptqtogptqmodel
What's Changed
- Add spell checker to alleviate spelling errors by @hickeyma in #32
- chore: Remove Makefile by @hickeyma in #37
- chore: Add GitHub badges to project README by @hickeyma in #38
- ci: Replace coverage with pytest-cov plugin for code coverage by @hickeyma in #39
- fix: Update the quantization notebook tutorial by @hickeyma in #41
- Small updates to the docs by @hickeyma in #40
- fix: Error in quantization notebook tutorial when retrieving image by @hickeyma in #42
- OptArguments by @tharapalanivel in #43
- Add logging and tests for run_quant.py by @tharapalanivel in #44
- Aiu addons by @andrea-fasoli in #46
- fix Qbmm tracing issue by @chichun-charlie-liu in #47
- Add build backend by @tharapalanivel in #50
- Add mypy static checker tool by @hickeyma in #49
- [utils] check if folder exists before attempting to create directory by @kcirred in #52
- fms_mo docker image by @tharapalanivel in #48
- Lint for fx by @tharapalanivel in #54
- Update accelerate requirement from !=0.34,<1.1,>=0.20.3 to >=0.20.3,!=0.34,<1.4 by @dependabot in #56
- Update transformers requirement from <4.48,>=4.45 to >=4.45,<4.49 by @dependabot in #55
- Add FP/INT triton kernels and unit tests, also update QAT example by @chichun-charlie-liu in #58
- ci: Add workflow for PR labels by @tharapalanivel in #57
- fix: Fix labelpr workflow by @tharapalanivel in #63
- feat: added granite support; fixed adapters to ignore model_config by @JRosenkranz in #53
- fix: Triton kernel bug fix by @chichun-charlie-liu in #61
- feat: Support for int8 smoothquant by @andrea-fasoli in #65
- test: Unit test int8 by @andrea-fasoli in #62
- fix: bug fix and minor changes on triton kernel: by @chichun-charlie-liu in #69
- fix: handle linear_type callable at int8 linear instantiation by @andrea-fasoli in #68
- fix: multiple bug fixes: by @chichun-charlie-liu in #70
- feat: improve transformers tracing for last layers by @chichun-charlie-liu in #72
- fix: in DQ example, when nbits_kvcache=8, context manager will detect incorrect frame by @chichun-charlie-liu in #74
- fix: Fix build and check packages flow by @tharapalanivel in #79
- fix: make triton optional for systems without GPUs by @chichun-charlie-liu in #78
- fix: a bug that prevented dynamo from working with PT 2.5.1 has been fixed by @chichun-charlie-liu in #81
- test: int8 unit tests for aiu add-ons by @iqbal-saraf in #77
- feat: confirmed py3.12 with pt2.5.1 by @chichun-charlie-liu in #83
- fix: finish missed items for upgrading to python 3.12 by @chichun-charlie-liu in #84
- fix: minor fix from last PR regarding py3.12 upgrades by @chichun-charlie-liu in #85
- feat: Update accelerate requirement from !=0.34,<1.4,>=0.20.3 to >=0.20.3,!=0.34,<1.7 by @dependabot in #86
- feat: Update transformers requirement from <4.49,>=4.45 to >=4.45,<4.51 by @dependabot in #80
- feat: triton matmul kernel adjusted, now is closer to HW behavior by @chichun-charlie-liu in #82
- fix: fix QBmm detection and default behavior by @chichun-charlie-liu in #87
- feat: expand detection of data types in model size estimation by @andrea-fasoli in #88
- fix: Fix push to pypi flow by @tharapalanivel in #90
- feat: int8 granite addon by @andrea-fasoli in #92
- feat: INT8 LLM TP>1 enablement by @andrea-fasoli in #94
- dependencies: Update transformers requirement from <4.51,>=4.45 to >=4.45,<4.52 by @dependabot in #91
- dependencies: Update triton requirement from <3.2,>=3.0 to >=3.0,<3.4 by @dependabot in #93
- feat: Update syntax of custom torch ops by @andrea-fasoli in #96
- feat: add granite architecture support for DQ with smoothquant by @andrea-fasoli in #101
- feat: trimming config save by @BrandonGroth in #103
- feat: Add int8 sd conversion function for aiu by @andrea-fasoli in #95
- fix: Config save cleanup by @BrandonGroth in #113
- feat: add verbosity to smoothquant during conversion for AIU by @andrea-fasoli in #115
- feat: Conversion example by @andrea-fasoli in #118
- feat: adjust int8 triton to enable msb/lsb truncation by @chichun-charlie-liu in #120
- feat: mx integration by @chichun-charlie-liu in #110
- feat: GPTQModel Migration by @tharapalanivel in #102
- fix: disable granite in custom gptq as gptqmodel already supports it, fix … by @chichun-charlie-liu in #130
- test: Add tests for save_for_aiu functionality w/ tiny models by @BrandonGroth in #126
- fix: Update GPTQ example README.md for typo by @chichun-charlie-liu in #132
- docs: Fix README typo by @tharapalanivel in #135
- build: Update test/verification section of PR template by @tharapalanivel in #136
- fix: Fix versioning by @tharapalanivel in #137
New Contributors
- @chichun-charlie-liu made their first contribution in #47
- @kcirred made their first contribution in #52
- @JRosenkranz made their first contribution in #53
- @iqbal-saraf made their first contribution in #77
- @BrandonGroth made their first contribution in #103
Full Changelog: v0.2.0...v0.3.0