Skip to content

Commit d54fb85

Browse files
RandySherifffacebook-github-bot
authored andcommitted
Boost CMF 100x QPS from 176k to 178k on mi300x (#4867)
Summary: Pull Request resolved: #4867 X-link: facebookresearch/FBGEMM#1888 As titled ~ by which triton fp8 gemm on-pars torch counterpart, which yield 178k QPS. Reviewed By: pranavsharma Differential Revision: D82276002 fbshipit-source-id: ce1d755eb6f89e6feb8847ac09d520079fbdc510
1 parent 6ef0978 commit d54fb85

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3866,6 +3866,20 @@ def get_full_non_persistent_tuning_space():
38663866
num_warps=8,
38673867
num_stages=2,
38683868
),
3869+
triton.Config(
3870+
{
3871+
"BLOCK_M": 256,
3872+
"BLOCK_N": 256,
3873+
"BLOCK_K": 64,
3874+
"GROUP_M": 2,
3875+
"SPLIT_K": 1,
3876+
"waves_per_eu": 2,
3877+
"matrix_instr_nonkdim": 32,
3878+
"kpack": 2,
3879+
},
3880+
num_warps=8,
3881+
num_stages=2,
3882+
),
38693883
triton.Config(
38703884
{
38713885
"BLOCK_M": 256,

0 commit comments

Comments
 (0)