[RemoveLayout] Remove convert layout op for any layout if the user is tt.store with block pointer #4751

chengjunlu · 2025-07-21T02:05:38Z

It is always lower cost to store the value to the memory referred by block pointer directly without layout conversion.

Copilot

Pull Request Overview

This PR removes layout conversion optimizations for tt.store operations when using block pointers. The change eliminates conditional checks that previously prevented layout conversion removal in specific scenarios involving DPAS encoding, making the optimization more aggressive by always preferring direct memory stores without layout conversion.

Removes DPAS encoding validation checks for convert operations
Eliminates restriction on forwarding DPAS encodings when output type already has DPAS encoding

third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp

whitneywhtsang

missing test.
can you ensure no performance regression?

chengjunlu · 2025-07-22T05:56:52Z

The benchmark runner: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16433641480

…block pointer. It is always lower cost to store the value to the memory referred by block pointer. without layout conversion. Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

anmyachev · 2025-07-28T09:46:23Z

The benchmark runner: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16433641480

I don't know why the launch is in the queue for so long.

Here are new runs:

etiotto · 2025-07-28T19:14:14Z

Possible degradation in FlexAttention:

Need to rebase the PR and rerun. New run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16578249148

etiotto · 2025-07-28T19:41:15Z

test/TritonIntelGPU/backward_combine_dpas_dot_layout.mlir

+#blocked = #ttg.blocked<{sizePerThread = [1, 1], threadsPerWarp = [1, 16], warpsPerCTA = [2, 2], order = [1, 0]}>
+#blocked1 = #ttg.blocked<{sizePerThread = [1, 1], threadsPerWarp = [1, 16], warpsPerCTA = [1, 4], order = [1, 0]}>
+module attributes {"ttg.num-ctas" = 1 : i32, "ttg.num-warps" = 4 : i32, "ttg.threads-per-warp" = 16 : i32, "ttig.support_sg_2d_block"} {
+  tt.func public @matmul_kernel_with_block_pointers(%arg0: !tt.ptr<f16>, %arg1: !tt.ptr<f16>, %arg2: !tt.ptr<f16>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32) {


None of the argument are used by the kernel. Remove them.

except %arg2

etiotto · 2025-07-28T19:42:50Z

test/TritonIntelGPU/backward_combine_dpas_dot_layout.mlir

+    %c1_i64 = arith.constant 1 : i64
+    %c256_i64 = arith.constant 256 : i64
+    %cst = arith.constant dense<0.000000e+00> : tensor<64x256xf16, #blocked>
+    %25 = ttg.convert_layout %cst : tensor<64x256xf16, #blocked> -> tensor<64x256xf16, #blocked1>


Add a CHECK-NOT to ensure the generated code doesn't contain a convert layout operation

whitneywhtsang · 2025-07-28T20:39:26Z

Flex Attention performance is good with the latest run.

etiotto · 2025-07-28T21:56:45Z

Flex Attention performance is good with the latest run.

Yup, it is. No performance degradations in microbenchmark due to this PR.

chengjunlu requested review from Copilot and mfrancepillois July 21, 2025 02:05

Copilot AI reviewed Jul 21, 2025

View reviewed changes

third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp Show resolved Hide resolved

third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp Show resolved Hide resolved

whitneywhtsang reviewed Jul 21, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/improve_remove_layout branch 2 times, most recently from 42fa047 to ca1a49d Compare July 22, 2025 05:56

Remove convert layout op for any layout if the user is tt.store with …

ca1a49d

…block pointer. It is always lower cost to store the value to the memory referred by block pointer. without layout conversion. Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

Merge branch 'main' into chengjun/improve_remove_layout

9266734

etiotto reviewed Jul 28, 2025

View reviewed changes

etiotto approved these changes Jul 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RemoveLayout] Remove convert layout op for any layout if the user is tt.store with block pointer #4751

[RemoveLayout] Remove convert layout op for any layout if the user is tt.store with block pointer #4751

chengjunlu commented Jul 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

whitneywhtsang left a comment

Uh oh!

chengjunlu commented Jul 22, 2025

Uh oh!

anmyachev commented Jul 28, 2025

Uh oh!

etiotto commented Jul 28, 2025 •

edited

Loading

Uh oh!

etiotto Jul 28, 2025

Uh oh!

whitneywhtsang Jul 28, 2025

Uh oh!

etiotto Jul 28, 2025

Uh oh!

whitneywhtsang commented Jul 28, 2025

Uh oh!

etiotto commented Jul 28, 2025

Uh oh!

Uh oh!

[RemoveLayout] Remove convert layout op for any layout if the user is tt.store with block pointer #4751

Are you sure you want to change the base?

[RemoveLayout] Remove convert layout op for any layout if the user is tt.store with block pointer #4751

Conversation

chengjunlu commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

whitneywhtsang left a comment

Choose a reason for hiding this comment

Uh oh!

chengjunlu commented Jul 22, 2025

Uh oh!

anmyachev commented Jul 28, 2025

Uh oh!

etiotto commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etiotto Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

etiotto Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang commented Jul 28, 2025

Uh oh!

etiotto commented Jul 28, 2025

Uh oh!

Uh oh!

chengjunlu commented Jul 21, 2025 •

edited

Loading

etiotto commented Jul 28, 2025 •

edited

Loading