[CPU] Enable DA8W4 on CPU #2128

Xia-Weiwen · 2025-04-25T10:22:16Z

Summary
This PR enables DA8W4 on CPU.

It adds a new layout Int8DynamicActInt4WeightCPULayout and its implementation
It adds two custom ops:
- da8w4_linear_prepack_cpu for weight packing
- da8w4_linear_cpu for A8W4 GEMM.
It adds C++ kernels for the two new custom ops

The feature supports symmetric and asymmetric quantization of activation.

The ops and kernels won't be available unless

torchao is built from source with USE_CPP_KERNELS=1 on Linux with an X86 CPU with AVX512.
torchao is run on Linux with an X86 CPU with AVX512.
PyTorch version >= 2.7

To get the best performance, one needs a CPU with AMX support.

Implementation details

The weight-packing kernel is implemented with AVX512 intrinsics if available. Otherwise, a reference path is used.
The GEMM kernel uses at::cpublas brgemm utilities from Pytorch core if available.
In the GEMM kernel, if M is large (>4)
- if brgemm is available, brgemm is used.
- otherwise, fallback to reference implementation
In the GEMM kernel, if M is small (<=4):
- if AVX512_VNNI is available, the kernel uses AVX512_VNNI intrinsics.
- otherwise, go to the same path for large M.
All utilities functions used in the kernel are implemented with AVX512 if available. Otherwise fall back to reference implementation.

Usage

quantize_(
    model,
    int8_dynamic_activation_int4_weight(
        group_size=32,  # or 64, 128
        layout=Int8DynamicActInt4WeightCPULayout(),
        act_mapping_type=MappingType.SYMMETRIC,  # or MappingType.ASYMMETRIC
    ),
)

Test plan

pytest test/quantization/test_quant_api.py -k test_8da4w_cpu

pytorch-bot · 2025-04-25T10:22:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e3731f7 with merge base 4ebc9c0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/quantization/test_quant_api.py

Xia-Weiwen · 2025-05-14T10:22:42Z

@leslie-fang-intel This PR is updated to use a new layout. Please review again. Thanks.

torchao/dtypes/uintx/int4_cpu_layout.py

Xia-Weiwen · 2025-05-16T09:51:35Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-19T02:09:46Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-20T14:36:51Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-06-04T06:47:41Z

Hi @leslie-fang-intel Please review this PR again. I have also added the kernel code in this PR. It showed reasonable performance in internal benchmarks. Thanks.

leslie-fang-intel

Please also describe how we choose different implementations based on the CPU Info.

torchao/csrc/cpu/da8w4_linear.cpp

Xia-Weiwen · 2025-06-04T15:16:16Z

Please also describe how we choose different implementations based on the CPU Info.

I have added more details in the description. Thanks.

torchao/csrc/cpu/da8w4_linear.cpp

Xia-Weiwen · 2025-06-06T01:50:07Z

Hi @jerryzh168 Could you please review this PR? Thanks. It's changed a lot since your last review.

Xia-Weiwen · 2025-06-11T03:04:07Z

Hi @jerryzh168 Could you please review this PR? Thanks.

torchao/quantization/quant_api.py

jerryzh168 · 2025-06-12T17:03:15Z

torchao/dtypes/uintx/int4_cpu_layout.py

+
+
+@dataclass(frozen=True)
+class Int8DynamicActInt4WeightCPULayout(Layout):


it looks like you can just reuse Int4CPULayout

can you move the layout and impl to a separate file?

Sure. Done.

jerryzh168 · 2025-06-12T17:15:48Z

torchao/dtypes/uintx/int4_cpu_layout.py

+
+
+@register_layout(Int8DynamicActInt4WeightCPULayout)
+class DA8W4CPUAQTTensorImpl(Int4CPUAQTTensorImpl):


oh I see, OK if you need a separate Impl then makes sense to have a separate layout

Yes. We need a different impl from W16W4 because the ISA (AMX and VNNI) requires different memory formats of weight for computation in BF16 or INT8. Thanks.

torchao/dtypes/uintx/int4_cpu_layout.py

jerryzh168 · 2025-06-12T17:32:56Z

torchao/dtypes/uintx/int4_cpu_layout.py

+        int_data = (int_data + 8).to(torch.uint8)
+        if scale.dim() == 1:
+            scale.unsqueeze_(-1)
+        scale = scale.to(torch.float)
+        if zero_point.dim() == 1:
+            zero_point.unsqueeze_(-1)
+        zero_point = zero_point.to(torch.int8) + 8


can you configure dtypes of int_data, scale, zero_point and shapes in the call to to_affine_quantized_intx?

Thanks for the suggestion. I have improved this part.

jerryzh168 · 2025-06-20T17:07:24Z

test/quantization/test_quant_api.py

+            assert "torch.ops.torchao.da8w4_linear_cpu.default" in code[0]
+            quantize_(
+                m2,
+                int8_dynamic_activation_int4_weight(


nit: can you use the new API: Int8DynamicActivationInt4WeightConfig instead of int8_dynamic_activation_int4_weight?

Thanks. Done.

jerryzh168 · 2025-06-23T20:56:01Z

torchao/quantization/quant_api.py

@@ -728,9 +761,17 @@ def _int8_dynamic_activation_int4_weight_transform(
    quant_min = -8
    quant_max = 7

+    if isinstance(layout, Int8DynamicActInt4WeightCPULayout):


can this happen in kernel? we have dtype conversions like this:

ao/torchao/dtypes/uintx/plain_layout.py

Line 260 in 2898903

w_vals_int8_t.to(input_tensor.dtype),

Thanks for the comment. I have moved this to _linear_int8_act_int4_weight_cpu_impl.

[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout

0581451

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

dffbbab

Xia-Weiwen added cpu quantize topic: new feature Use this tag if this PR adds a new feature labels Apr 25, 2025

Xia-Weiwen added 2 commits April 25, 2025 03:27

Fix format issue

9fb7f77

Merge branch 'main' into da8w4_with_int4_cpu_layout

35ece3b

Xia-Weiwen requested a review from leslie-fang-intel April 28, 2025 11:02

jerryzh168 reviewed Apr 28, 2025

View reviewed changes

test/quantization/test_quant_api.py Outdated Show resolved Hide resolved

leslie-fang-intel approved these changes Apr 29, 2025

View reviewed changes

Xia-Weiwen marked this pull request as ready for review April 29, 2025 02:01

Xia-Weiwen requested a review from jerryzh168 April 29, 2025 03:16

Xia-Weiwen marked this pull request as draft May 7, 2025 01:17

Xia-Weiwen added 2 commits May 11, 2025 20:08

Merge branch 'main' into da8w4_with_int4_cpu_layout

c5b6d87

Add Int8DynamicActInt4WeightCPULayout

8e80d03

Xia-Weiwen requested a review from leslie-fang-intel May 14, 2025 10:22

Merge branch 'main' into da8w4_with_int4_cpu_layout

51249c3

leslie-fang-intel reviewed May 15, 2025

View reviewed changes

torchao/dtypes/uintx/int4_cpu_layout.py Outdated Show resolved Hide resolved

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout~~ [CPU] enable int8_dynamic_activation_int4_weight on CPU May 16, 2025

remove dispatch for t()

3e20172

Xia-Weiwen marked this pull request as ready for review May 16, 2025 05:59

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight on CPU~~ [CPU] Add a new layout for int8_dynamic_activation_int4_weight on CPU May 16, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

e765664

Xia-Weiwen marked this pull request as draft May 21, 2025 02:57

leslie-fang-intel reviewed Jun 4, 2025

View reviewed changes

torchao/csrc/cpu/da8w4_linear.cpp Outdated Show resolved Hide resolved

Xia-Weiwen added 3 commits June 4, 2025 14:02

Support symmetric quantization of activation

e05b96a

Merge branch 'main' into da8w4_with_int4_cpu_layout

fd6e4b1

Refine code

18335c6

leslie-fang-intel reviewed Jun 5, 2025

View reviewed changes

torchao/csrc/cpu/da8w4_linear.cpp Outdated Show resolved Hide resolved

torchao/csrc/cpu/da8w4_linear.cpp Show resolved Hide resolved

leslie-fang-intel approved these changes Jun 5, 2025

View reviewed changes

Xia-Weiwen added 2 commits June 5, 2025 14:53

Refine code

66ab77f

Merge branch 'main' into da8w4_with_int4_cpu_layout

2c5a799

Xia-Weiwen requested a review from jerryzh168 June 6, 2025 01:49

Xia-Weiwen marked this pull request as ready for review June 6, 2025 01:49

Merge branch 'main' into da8w4_with_int4_cpu_layout

131660e

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

torchao/dtypes/uintx/int4_cpu_layout.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

Xia-Weiwen added 2 commits June 14, 2025 17:05

Put in a separate file

75fbd6f

Merge branch 'main' into da8w4_with_int4_cpu_layout

24268fd

Xia-Weiwen requested a review from jerryzh168 June 15, 2025 11:32

jerryzh168 reviewed Jun 20, 2025

View reviewed changes

jerryzh168 reviewed Jun 23, 2025

View reviewed changes

jerryzh168 approved these changes Jun 23, 2025

View reviewed changes

Xia-Weiwen merged commit 8b57afe into pytorch:main Jun 25, 2025
35 checks passed

Xia-Weiwen added 3 commits June 25, 2025 13:38

Bug fix

4c0a739

Merge branch 'main' into da8w4_with_int4_cpu_layout

0815d96

refine code

e3731f7



		@dataclass(frozen=True)
		class Int8DynamicActInt4WeightCPULayout(Layout):



		@register_layout(Int8DynamicActInt4WeightCPULayout)
		class DA8W4CPUAQTTensorImpl(Int4CPUAQTTensorImpl):

[CPU] Enable DA8W4 on CPU #2128

[CPU] Enable DA8W4 on CPU #2128

Uh oh!

Conversation

Xia-Weiwen commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

✅ No Failures

Uh oh!

Uh oh!

Xia-Weiwen commented May 14, 2025

Uh oh!

Uh oh!

Xia-Weiwen commented May 16, 2025

Uh oh!

Xia-Weiwen commented May 19, 2025

Uh oh!

Xia-Weiwen commented May 20, 2025

Uh oh!

Xia-Weiwen commented Jun 4, 2025

Uh oh!

leslie-fang-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xia-Weiwen commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Jun 6, 2025

Uh oh!

Xia-Weiwen commented Jun 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Apr 25, 2025 •

edited

Loading

pytorch-bot bot commented Apr 25, 2025 •

edited

Loading

jerryzh168 Jun 23, 2025 •

edited

Loading