-
Notifications
You must be signed in to change notification settings - Fork 462
[torchair]remove aicpu op #2640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a targeted optimization within the Ascend attention backend. The change modifies the calculation of block_size
used for KV cache indexing by adding a zero-value NPU tensor. This technique is likely intended to influence the operator dispatching mechanism, ensuring that the subsequent integer division and modulo operations are executed on NPU kernels rather than falling back to slower AICPU operations. The change is localized and appears to be a correct, albeit subtle, platform-specific performance enhancement. I have not identified any critical or high-severity issues with this implementation.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…into main_829 * 'main_829' of https://github.com/raindaywhu/vllm-ascend: [torchair]remove aicpu op (vllm-project#2640) bugfix for torchair graph (vllm-project#2639) [CI] fix UT error. (vllm-project#2644) [3/N][Feat][Graph] Support `all-to-all` and quantized models with ACL Graph (vllm-project#2614) [Bugfix] Fix mc2 operator error in aclgraph + ep<16 scenario (vllm-project#2609)
### What this PR does / why we need it? remove aicpu op for torchair mode ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@05d839c - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@67c1490 Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com> Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com> Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>
### What this PR does / why we need it? remove aicpu op for torchair mode ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@05d839c - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@67c1490 Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com> Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
### What this PR does / why we need it? remove aicpu op for torchair mode ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@05d839c - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@67c1490 Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com> Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
What this PR does / why we need it?
remove aicpu op
Does this PR introduce any user-facing change?
No
How was this patch tested?
vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@05d839c