make cache op support non-contiguous num_blocks dim by ganyi1996ppo · Pull Request #2772 · ROCm/aiter

ganyi1996ppo · 2026-04-17T07:42:44Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ontiguous block dim Signed-off-by: ganyi <ygan@amd.com>

github-actions · 2026-04-17T07:43:25Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2772 --add-label <label>

Copilot

Pull request overview

Updates KV-cache write kernels to support a strided (non-contiguous) num_blocks dimension by using the cache tensors’ stride(0) when computing target indices.

Changes:

Add key_cache_stride0 / value_cache_stride0 (and per-token variants) to relevant kernel signatures and launches.
Update key/value cache linear index calculations to use block_idx * stride(0) instead of assuming dense packing for dim0.
Thread stride(0) values from host (key_cache.stride(0), value_cache.stride(0)) into the affected kernel launches.

Comments suppressed due to low confidence (1)

csrc/kernels/cache_kernels.cu:189

reshape_and_cache_kernel now supports a non-contiguous num_blocks dim via key_cache_stride0 / value_cache_stride0, but the kernel still hardcodes the inner-dimension layout (it assumes the remaining dims are densely packed/contiguous). Please add input validation (e.g., TORCH_CHECK on the expected strides for dims 1..end) or extend the kernel to use full per-dim strides; otherwise passing a tensor that's non-contiguous in other dims will silently write to the wrong locations.

                         const int64_t key_cache_stride0,
                         const int64_t value_cache_stride0,
                         const int num_heads,
                         const int head_size,
                         const int block_size,
                         const int x,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T07:49:12Z

+    int64_t k_cache_stride0 = key_cache.stride(0);
+    int64_t v_cache_stride0 = value_cache.stride(0);
+


The PR title suggests cache ops broadly support non-contiguous num_blocks, but only reshape_and_cache / per-token-quant paths were updated to use stride(0). Other cache paths in this file (e.g., the block-quant kernels) still compute block_idx * num_heads * head_size * block_size and will remain broken for a strided num_blocks dim. Either extend the same stride(0) handling there as well, or narrow/clarify the PR scope in the title/description.

Copilot · 2026-04-17T07:49:13Z

+    int64_t k_cache_stride0 = key_cache.stride(0);
+    int64_t v_cache_stride0 = value_cache.stride(0);
+


There’s no test coverage exercising the new “non-contiguous num_blocks” behavior (e.g., key_cache = big_cache[layer_idx] so stride(0) differs from the dense product of inner dims). Please add a unit/integration test (see op_tests/test_kvcache.py) that constructs strided views for key_cache/value_cache and verifies reshape_and_cache{,_with_pertoken_quant} writes to the correct blocks.

Copilot · 2026-04-17T07:49:13Z

+    const int64_t k_cache_stride0,
+    const int64_t v_cache_stride0,


Parameter naming is inconsistent between kernels (key_cache_stride0/value_cache_stride0 vs k_cache_stride0/v_cache_stride0). Please standardize the naming across these kernels and the corresponding host variables to make the API easier to follow and reduce the chance of wiring the wrong stride into a launch.

Suggested change

const int64_t k_cache_stride0,

const int64_t v_cache_stride0,

const int64_t key_cache_stride0,

const int64_t value_cache_stride0,

ganyi1996ppo requested review from a team and Copilot April 17, 2026 07:42

make reshape and cache and it's per token quant version support non-c…

0b7d21d

…ontiguous block dim Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo force-pushed the ganyi/cache_op_stride_0 branch from c75e1dd to 0b7d21d Compare April 17, 2026 07:43

Copilot started reviewing on behalf of ganyi1996ppo April 17, 2026 07:44 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

ganyi1996ppo mentioned this pull request Apr 17, 2026

[Perf] enable shuffle layout full attention for qwen3.5 and qwen3next ROCm/ATOM#594

Open

1 task

valarLip approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make cache op support non-contiguous num_blocks dim#2772

make cache op support non-contiguous num_blocks dim#2772
ganyi1996ppo wants to merge 1 commit intomainfrom
ganyi/cache_op_stride_0

ganyi1996ppo commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		int64_t k_cache_stride0 = key_cache.stride(0);
		int64_t v_cache_stride0 = value_cache.stride(0);

		const int64_t k_cache_stride0,
		const int64_t v_cache_stride0,

Conversation

ganyi1996ppo commented Apr 17, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Apr 17, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants