Skip to content

Commit 9c5b36c

Browse files
authored
fix CMS for 224x128x64 NT bf16 (#4434)
Fixes issue #4422 by reducing the dscnt (follow-up from #4137) No impact to performance in Tensile. Tested with tensile-client ``` - ProblemSizes: - Exact: [3584, 2048, 1, 8192] - Range: [[224], [128], [1], [1, 16, 64]] - Range: [[224], [128], [1], [32, 64, 256]] ``` hipblaslt-test also passes ``` [==========] 21891 tests from 12 test suites ran. (1441926 ms total) [ PASSED ] 21891 tests. ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
1 parent c68fe0a commit 9c5b36c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

projects/hipblaslt/tensilelite/Tensile/Components/CustomSchedule.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1741,7 +1741,7 @@ def _get_schedule_224x128x64_16bit(kernel, useLDSTr, TLDS):
17411741
3, SWaitCnt(dscnt=2, vlcnt=-1, vscnt=-1, comment="Wait for prior LRA1/LRB1 for the remaining main loop"),
17421742

17431743
# GRB must wait for LRB0 (interleave LRA0 + GRB safely)
1744-
15, SWaitCnt(dscnt=6, vlcnt=-1, vscnt=-1, comment="Wait for LRB0 to complete to start GRB"),
1744+
15, SWaitCnt(dscnt=5, vlcnt=-1, vscnt=-1, comment="Wait for LRB0 to complete to start GRB"),
17451745
15, SBarrier(comment=""),
17461746

17471747
# GRA must wait for LRA0; LRB1 can be interleaved with GRA after this fence

0 commit comments

Comments
 (0)