Skip to content

Conversation

panchao-hub
Copy link
Contributor

@panchao-hub panchao-hub commented Aug 29, 2025

What this PR does / why we need it?

remove aicpu op

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@05d839c

Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a targeted optimization within the Ascend attention backend. The change modifies the calculation of block_size used for KV cache indexing by adding a zero-value NPU tensor. This technique is likely intended to influence the operator dispatching mechanism, ensuring that the subsequent integer division and modulo operations are executed on NPU kernels rather than falling back to slower AICPU operations. The change is localized and appears to be a correct, albeit subtle, platform-specific performance enhancement. I have not identified any critical or high-severity issues with this implementation.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@panchao-hub
Copy link
Contributor Author

image

@wangxiyuan wangxiyuan changed the title remove aicpu op [torchair]remove aicpu op Aug 30, 2025
@wangxiyuan wangxiyuan merged commit 20ae712 into vllm-project:main Aug 30, 2025
22 of 26 checks passed
845473182 pushed a commit to raindaywhu/vllm-ascend that referenced this pull request Sep 1, 2025
…into main_829

* 'main_829' of https://github.com/raindaywhu/vllm-ascend:
  [torchair]remove aicpu op (vllm-project#2640)
  bugfix for torchair graph (vllm-project#2639)
  [CI] fix UT error. (vllm-project#2644)
  [3/N][Feat][Graph] Support `all-to-all` and quantized models with ACL Graph (vllm-project#2614)
  [Bugfix] Fix mc2 operator error in aclgraph + ep<16 scenario (vllm-project#2609)
wenba0 pushed a commit to wenba0/vllm-ascend that referenced this pull request Sep 5, 2025
### What this PR does / why we need it?
remove aicpu op for torchair mode
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vLLM version: v0.10.1.1
vLLM main:
vllm-project/vllm@05d839c
- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@67c1490

Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com>
Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
remove aicpu op for torchair mode
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vLLM version: v0.10.1.1
vLLM main:
vllm-project/vllm@05d839c
- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@67c1490

Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com>
Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
remove aicpu op for torchair mode
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vLLM version: v0.10.1.1
vLLM main:
vllm-project/vllm@05d839c
- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@67c1490

Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com>
Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants