-
Notifications
You must be signed in to change notification settings - Fork 200
[CK_TILE] Extend SplitK support to ABQuantGrouped mode #4438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Add AQ offset calculation in SplitKBatchOffset for ABQuant mode - Update mode restriction to support ABQuant alongside BQuant - Add AQ alignment validation for split-K batches - Apply AQ pointer offsetting based on tensor layout - Update block window creation to handle AQ group offsets - Requires non-preshuffle mode for both A and B quantization
- Create test_gemm_quant_abquant_splitk_decode.cpp for decode workloads - Create test_gemm_quant_abquant_splitk_prefill.cpp for prefill workloads - Add k_batch parameter to ABQuant test fixture run_test_with_validation - Fix missing c_m_n_dev_buf.SetZero() in ABQuant fixture (critical for split-K atomic operations) - Register new test targets in CMakeLists.txt - Tests cover split_k values: 2, 3, 4, 5 - All 16 tests passing (8 FP8, 8 BF8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Extends Split-K support in CK Tile quantized GEMM to cover ABQuantGrouped mode (in addition to existing BQuantGrouped), and adds Split-K ABQuant test coverage for decode/prefill shapes.
Changes:
- Add Split-K AQ/BQ group offset + pointer offset handling for
ABQuantGroupedinQuantGemmKernel(with added alignment validation). - Update ABQuant test fixture to accept
k_batchand zero-initialize C for Split-K atomic accumulation. - Add new ABQuant Split-K gtests (decode + prefill) and wire them into the test CMake.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/composablekernel/test/ck_tile/gemm_block_scale/test_gemm_quant_fixtures.hpp | Add k_batch parameter to ABQuant fixture and zero C buffer before Split-K runs |
| projects/composablekernel/test/ck_tile/gemm_block_scale/test_gemm_quant_abquant_splitk_prefill.cpp | New ABQuantGrouped Split-K tests for prefill shape |
| projects/composablekernel/test/ck_tile/gemm_block_scale/test_gemm_quant_abquant_splitk_decode.cpp | New ABQuantGrouped Split-K tests for decode shape |
| projects/composablekernel/test/ck_tile/gemm_block_scale/CMakeLists.txt | Build new ABQuant Split-K test executables |
| projects/composablekernel/include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp | Implement Split-K support for ABQuantGrouped (AQ offsets, constraints, block windows) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/composablekernel/include/ck_tile/ops/gemm_quant/kernel/gemm_quant_kernel.hpp
Show resolved
Hide resolved
projects/composablekernel/test/ck_tile/gemm_block_scale/test_gemm_quant_fixtures.hpp
Show resolved
Hide resolved
- Add missing aq_group_offset param to MakeAQBlockWindow call - Use actual k_batch for tolerance calculation in split-K tests
|
@AviralGoelAMD Could we add the support for the pre-shuffle of BQuant? |
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist