-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Description
🚀 The feature, motivation and pitch
This is an issue that tracks PRs related to AITER https://github.com/ROCm/aiter .
AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good, unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.
Note: This issue tracker description has been reorganized from the latest to the oldest
Based on AITER commit (12 July 2025): 916bf3c
- [V1] [ROCm] [AITER] Upgrade AITER to commit
916bf3c
and bugfix APIs #20880 - [FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel #21242
- [ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521
- [ROCm][Aiter] Add triton fp8 bmm kernel for mla #22759
Based on AITER commit:
Based on AITER commit: 636a9f0d56c202040e93b9560c296441b7f77233
- Add weight preshuffled PTPC FP8 GEMM ([ROCm][FEAT] Integrate AITER gemm w8a8 ptpc #19417)
Based on AITER commit: 648764942e552a8bb5fe16026703716a81f05374
- AITER MHA V1 ([Hardware][AMD] integrate aiter chunked prefill into vllm #18596) ([Hardware][AMD] integrate aiter into vllm #17710)
- Patch for new AITER commit ([ROCm] [AITER] [Bugfix] Patch for AITER commit
648764942e552a8bb5fe16026703716a81f05374
#18990) - [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) #19904
- [ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) #20254
- [V1] [ROCm] Enable EP with AITER Fused MoE #20270
Enhancement
- Bugfix to enable PP with AITER MLA [Bugfix] Enable PP with AITER+V1 #19822
- Add padding to weight to use block scaled fused moe on Qwen3-235B TP4 ([Bugfix] Add padding for block-scale fused-moe weights for AITER lib #19234)
- [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) #19904
Based on AITER commit: c1debd87ce0391aa27438d9e07e76e4fea7c4b70
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) #17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" #17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 #17912)
- Use AITER fused moe external API ([FEAT] [ROCm] Upgrade AITER Fused MoE kernels. #18271)
- [FEAT][ROCm] Upgrade AITER MLA v1 backend #18338
- [FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 #18825
- Enable full context length of DeepSeekV3 [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. #18938
Based on AITER commit: 5a77249
The kernels from #14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear #14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature #14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER #15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel #15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA #15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel #16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support #16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support #17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine #17523)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. #16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
#17857
Bugfix
Archived on 2025-05-14
The kernels from #14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear #14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature #14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER #15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel #15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA #15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel #16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support #16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support #17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine #17523)
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) #17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" #17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 #17912)
- AITER MHA V1 ([Hardware][AMD] integrate aiter into vllm #17710)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. #16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
#17857
Bugfix
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.