[V0.9.1] optimize rope in qwen3 #1719

David9857 · 2025-07-10T06:42:25Z

use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style

What this PR does / why we need it?

Optimize rope by extracting index_select from layers into model, which can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage.

Does this PR introduce any user-facing change?

NA

How was this patch tested?

NA

ganyi1996ppo · 2025-07-11T01:58:33Z

Please add unittest for this rope impl

Signed-off-by: David9857 <985700846@qq.com> use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style Signed-off-by: David9857 <985700846@qq.com>

Signed-off-by: David9857 <985700846@qq.com>

David9857 · 2025-07-14T03:00:59Z

Please add unittest for this rope impl

added

use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style ### What this PR does / why we need it? Optimize rope by extracting index_select from layers into model, which can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA --------- Signed-off-by: David9857 <985700846@qq.com>

github-actions bot added the module:ops label Jul 10, 2025

David9857 force-pushed the pr-rope branch 2 times, most recently from 1874f93 to efc563b Compare July 11, 2025 01:11

optimize rope in qwen3

71db15b

Signed-off-by: David9857 <985700846@qq.com> use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style Signed-off-by: David9857 <985700846@qq.com>

David9857 force-pushed the pr-rope branch from efc563b to 139aa82 Compare July 11, 2025 06:23

github-actions bot added the module:tests label Jul 11, 2025

David9857 force-pushed the pr-rope branch 2 times, most recently from 5b30648 to e568126 Compare July 11, 2025 07:27

add ut for npu_apply_rotary_pos_emb

3b20948

Signed-off-by: David9857 <985700846@qq.com>

David9857 force-pushed the pr-rope branch from e568126 to 3b20948 Compare July 14, 2025 02:04

ganyi1996ppo approved these changes Jul 14, 2025

View reviewed changes

ganyi1996ppo merged commit f08283a into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed

wangxiyuan added the no-main label Jul 14, 2025

David9857 mentioned this pull request Jul 15, 2025

[WIP][perf] Replace _npu_rotary_embedding with npu_mrope #1195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V0.9.1] optimize rope in qwen3 #1719

[V0.9.1] optimize rope in qwen3 #1719

David9857 commented Jul 10, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo commented Jul 11, 2025

Uh oh!

David9857 commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

[V0.9.1] optimize rope in qwen3 #1719

[V0.9.1] optimize rope in qwen3 #1719

Conversation

David9857 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ganyi1996ppo commented Jul 11, 2025

Uh oh!

David9857 commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

David9857 commented Jul 10, 2025 •

edited

Loading