Introduce a new pass to change LoadOp layouts to Subgroup2DBlock layouts

In today's pass pipeline a load op feeding a dot op is tagged with the `block_io` attribute during the `MaterializeBlockPointer` pass, after the DPAS layout has been applied in `AccelerateMatmul`. The load op layout is then changed to a `ttg.dot_op` operand layout during `RemoveLayoutConversions`. 

IR before `RemoveLayoutConversions`
```
   %10 = tt.make_tensor_ptr %arg0, [%c1024_i64, %c5120_i64], [%c5120_i64, %c1_i64], [%9, %c0_i32] {order = array<i32: 1, 0>} : <tensor<256x32xf16, #blocked1>> loc(#loc12)
    %11 = arith.muli %8, %c256_i32 : i32 loc(#loc13)
    %12 = tt.make_tensor_ptr %arg1, [%c5120_i64, %c4096_i64], [%c4096_i64, %c1_i64], [%c0_i32, %11] {order = array<i32: 1, 0>} : <tensor<32x256xf16, #blocked2>> loc(#loc14)
    %13:3 = scf.for %arg3 = %c0_i32 to %c5120_i32 step %c32_i32 iter_args(%arg4 = %cst, %arg5 = %10, %arg6 = %12) -> (tensor<256x256xf32, #blocked>, !tt.ptr<tensor<256x32xf16, #blocked1>>, !tt.ptr<tensor<32x256xf16, #blocked2>>)  : i32 {
      %17 = tt.load %arg5 {boundaryCheck = array<i32: 0, 1>, ttig.block_io = "row_major"} : !tt.ptr<tensor<256x32xf16, #blocked1>> loc(#loc16)
      %18 = tt.load %arg6 {boundaryCheck = array<i32: 0, 1>, ttig.block_io = "row_major"} : !tt.ptr<tensor<32x256xf16, #blocked2>> loc(#loc17)
      %19 = ttg.convert_layout %17 : tensor<256x32xf16, #blocked1> -> tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> loc(#loc16)
      %20 = ttg.convert_layout %18 : tensor<32x256xf16, #blocked2> -> tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> loc(#loc17)
      %21 = ttg.convert_layout %arg4 : tensor<256x256xf32, #blocked> -> tensor<256x256xf32, #mma> loc(#loc1)
      %22 = ttg.convert_layout %19 : tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> -> tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #mma, kWidth = 1}>> loc(#loc16)
      %23 = ttg.convert_layout %20 : tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> -> tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #mma, kWidth = 2}>> loc(#loc17)
      %24 = tt.dot %22, %23, %21, inputPrecision = tf32 : tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #mma, kWidth = 1}>> * tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #mma, kWidth = 2}>> -> tensor<256x256xf32, #mma> loc(#loc18)
```
Note the `block_io` attribute on both loads, then the subsequent conversion to `ttg.dot_op` layout with blocked parent. 

After `RemoveLayoutConversions` the blocked layouts have been removed and the load ops now have `ttg.dot_op` layouts with the DPAS parent: 
```
 %17 = tt.load %arg5 {boundaryCheck = array<i32: 0, 1>, ttig.block_io = "row_major"} : !tt.ptr<tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #mma, kWidth = 1}>>> loc(#loc16)
      %18 = tt.load %arg6 {boundaryCheck = array<i32: 0, 1>, ttig.block_io = "row_major"} : !tt.ptr<tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #mma, kWidth = 2}>>> loc(#loc17)
      %19 = tt.dot %17, %18, %arg4, inputPrecision = tf32 : tensor<256x32xf16, #ttg.dot_op<{opIdx = 0, parent = #mma, kWidth = 1}>> * tensor<32x256xf16, #ttg.dot_op<{opIdx = 1, parent = #mma, kWidth = 2}>> -> tensor<256x256xf32, #mma> loc(#loc18)
```

The Subgroup2DBlockIO layout should be applied during this process. I propose adding a new pass which would run after `MaterializeBlockPointer` but before `RemoveLayoutConversions`. This new pass will apply the subgroup layout and modify downstream layout conversions to use the new layout. `MaterializeBlockPointer` would still be used to apply the `block_io` tag to the LoadOp, and the `block_io` tag would be used in the new pass as a signal to apply layout conversion. 

Note that we could probably shift the decision making about when to apply the block io tag / use the Subgroup2DBlock layout to the new pass. But I think it is easier to introduce the new pass in stages, giving it more responsibility after we demonstrate the pass works as expected within the existing pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a new pass to change LoadOp layouts to Subgroup2DBlock layouts #4362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce a new pass to change LoadOp layouts to Subgroup2DBlock layouts #4362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions