Skip to content

[V0.9.1] optimize rope in qwen3 #1719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2025
Merged

Conversation

David9857
Copy link
Contributor

@David9857 David9857 commented Jul 10, 2025

use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

What this PR does / why we need it?

Optimize rope by extracting index_select from layers into model, which can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage.

Does this PR introduce any user-facing change?

NA

How was this patch tested?

NA

@David9857 David9857 force-pushed the pr-rope branch 2 times, most recently from 1874f93 to efc563b Compare July 11, 2025 01:11
@ganyi1996ppo
Copy link
Collaborator

Please add unittest for this rope impl

Signed-off-by: David9857 <985700846@qq.com>

use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

Signed-off-by: David9857 <985700846@qq.com>
Signed-off-by: David9857 <985700846@qq.com>
@David9857
Copy link
Contributor Author

Please add unittest for this rope impl

added

@ganyi1996ppo ganyi1996ppo merged commit f08283a into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed
David9857 added a commit to rjg-lyh/vllm-ascend that referenced this pull request Jul 14, 2025
use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

### What this PR does / why we need it?

Optimize rope by extracting index_select from layers into model, which
can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage.

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

NA

---------

Signed-off-by: David9857 <985700846@qq.com>
David9857 added a commit to rjg-lyh/vllm-ascend that referenced this pull request Jul 14, 2025
use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

### What this PR does / why we need it?

Optimize rope by extracting index_select from layers into model, which
can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage.

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

NA

---------

Signed-off-by: David9857 <985700846@qq.com>
David9857 added a commit to rjg-lyh/vllm-ascend that referenced this pull request Jul 14, 2025
use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

### What this PR does / why we need it?

Optimize rope by extracting index_select from layers into model, which
can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage.

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

NA

---------

Signed-off-by: David9857 <985700846@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants