-
Notifications
You must be signed in to change notification settings - Fork 313
[bc-breaking] Generalize FakeQuantizeConfig beyond intx #2628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
**Summary:** The existing `FakeQuantizeConfig` performs only intx quantization, but we plan to extend QAT to other dtypes such as fp8 and nvfp4 in the near future. This is the necessary refactor before that. Specifically: ``` # New abstract class FakeQuantizeConfigBase # Rename FakeQuantizeConfig -> IntxFakeQuantizeConfig ``` In the future, we will have other types of `FakeQuantizeConfigBase` for float dtypes that users can pass in instead of the existing Intx one. **BC-breaking notes:** For BC, we keep around the old names to reference the new ones. However, this commit is still BC-breaking in the sense that a few APIs now accept the abstract `FakeQuantizeConfigBase` instead. For the most part, this abstract class will be hidden from the user. Before: ``` activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) ``` After: ``` activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32) ``` **Test Plan:** python test/quantization/test_qat.py [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2628
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 8245cee with merge base 2f8fd69 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Just to confirm, we are changing the name of FakeQuantizeConfig to intxFakeQuantizeConfig, But we are also keeping around the fake quantize config object as a rename of int x fake quantize config ? And then in two releases we will remove it? |
Yes keeping around |
**Summary:** The existing `FakeQuantizeConfig` performs only intx quantization, but we plan to extend QAT to other dtypes such as fp8 and nvfp4 in the near future. This is the necessary refactor before that. Specifically: ``` # New abstract class FakeQuantizeConfigBase # Rename FakeQuantizeConfig -> IntxFakeQuantizeConfig ``` In the future, we will have other types of `FakeQuantizeConfigBase` for float dtypes that users can pass in instead of the existing Intx one. **BC-breaking notes:** For BC, we keep around the old names to reference the new ones. However, this commit is still BC-breaking in the sense that a few APIs now accept the abstract `FakeQuantizeConfigBase` instead. For the most part, this abstract class will be hidden from the user. Before: ``` activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) ``` After: ``` activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32) ``` **Test Plan:** python test/quantization/test_qat.py [ghstack-poisoned]
**Summary:** The existing `FakeQuantizeConfig` performs only intx quantization, but we plan to extend QAT to other dtypes such as fp8 and nvfp4 in the near future. This is the necessary refactor before that. Specifically: ``` # New abstract class FakeQuantizeConfigBase # Rename FakeQuantizeConfig -> IntxFakeQuantizeConfig ``` In the future, we will have other types of `FakeQuantizeConfigBase` for float dtypes that users can pass in instead of the existing Intx one. **BC-breaking notes:** For BC, we keep around the old names to reference the new ones. However, this commit is still BC-breaking in the sense that a few APIs now accept the abstract `FakeQuantizeConfigBase` instead. For the most part, this abstract class will be hidden from the user. Before: ``` activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) ``` After: ``` activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32) ``` **Test Plan:** python test/quantization/test_qat.py [ghstack-poisoned]
) | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think you need this dataclass decorator here
|
||
def __init__( | ||
self, | ||
dtype: Union[torch.dtype, TorchAODType], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you can type this as Literal[...]
so that it only allows for int inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually have a lot of dtypes we allow, like all of int2-8
and uint2-8
, will be too verbose I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just define a Allowed Types
above and use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I just tried it but didn't really like it. I think I prefer a simpler signature like just torch.dtype
(we can drop TorchAODType
soon, only needed for pytorch 2.5 and before) and do the validation in init
self.eps = eps | ||
|
||
# Validate dtype | ||
all_dtypes = [torch.int8, torch.uint8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to this
**Summary:** The existing `FakeQuantizeConfig` performs only intx quantization, but we plan to extend QAT to other dtypes such as fp8 and nvfp4 in the near future. This is the necessary refactor before that. Specifically: ``` # New abstract class FakeQuantizeConfigBase # Rename FakeQuantizeConfig -> IntxFakeQuantizeConfig ``` In the future, we will have other types of `FakeQuantizeConfigBase` for float dtypes that users can pass in instead of the existing Intx one. **BC-breaking notes:** For BC, we keep around the old names to reference the new ones. However, this commit is still BC-breaking in the sense that a few APIs now accept the abstract `FakeQuantizeConfigBase` instead. For the most part, this abstract class will be hidden from the user. Before: ``` activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) ``` After: ``` activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32) ``` **Test Plan:** python test/quantization/test_qat.py [ghstack-poisoned]
**Summary:** Similar to #2628, but for `FakeQuantizer`. It is cleaner to isolate the logic of each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8. Naming change: ``` FakeQuantizer -> IntxFakeQuantizer ``` **BC-breaking notes:** This is technically not BC-breaking yet since we are just deprecating the old APIs while keeping them around. It will be when we do remove the old APIs in the future according to #2630. Before: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") FakeQuantizer(config) ``` After: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") IntxFakeQuantizer(config) # or FakeQuantizerBase.from_config(config) ``` **Test Plan:** ``` python test/quantization/test_qat.py ``` [ghstack-poisoned]
**Summary:** Similar to #2628, but for `FakeQuantizer`. It is cleaner to isolate the logic of each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8. Naming change: ``` FakeQuantizer -> IntxFakeQuantizer ``` **BC-breaking notes:** This is technically not BC-breaking yet since we are just deprecating the old APIs while keeping them around. It will be when we do remove the old APIs in the future according to #2630. Before: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") FakeQuantizer(config) ``` After: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") IntxFakeQuantizer(config) # or FakeQuantizerBase.from_config(config) ``` **Test Plan:** ``` python test/quantization/test_qat.py ``` ghstack-source-id: 3867fab Pull Request resolved: #2714
**Summary:** Similar to #2628, but for `FakeQuantizer`. It is cleaner to isolate the logic of each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8. Naming change: ``` FakeQuantizer -> IntxFakeQuantizer ``` **BC-breaking notes:** This is technically not BC-breaking yet since we are just deprecating the old APIs while keeping them around. It will be when we do remove the old APIs in the future according to #2630. Before: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") FakeQuantizer(config) ``` After: ``` config = IntxFakeQuantizeConfig(torch.int8, "per_channel") IntxFakeQuantizer(config) # or FakeQuantizerBase.from_config(config) ``` **Test Plan:** ``` python test/quantization/test_qat.py ``` [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Summary: The existing
FakeQuantizeConfig
performs onlyintx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:
In the future, we will have other types of
FakeQuantizeConfigBase
for float dtypes that users can pass in instead of the existing
Intx one.
BC-breaking notes: For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
FakeQuantizeConfigBase
instead. For the most part, this abstractclass will be hidden from the user.
Before:
After:
Test Plan:
python test/quantization/test_qat.py