[0.9.1]optmize rope in qwen2 #1782

David9857 · 2025-07-14T09:40:58Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions · 2025-07-14T09:50:47Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

ApsarasX · 2025-07-14T09:54:59Z

vllm_ascend/ops/rotary_embedding.py

+            # TODO: Remove the contiguous in the future.
+            query = query.contiguous().view(query.shape[0], -1)
+            key = key.contiguous().view(key.shape[0], -1)
+            torch_npu._npu_rotary_embedding(


Are you considering using the npu_mrope operator here?

Are you considering using the npu_mrope operator here?

Not sure if npu_mrope performs exactly the same with _npu_rotary_embedding, so I keep this part unchanged.

Signed-off-by: David9857 <985700846@qq.com>

use npu_apply_rotary_pos_emb when head_size is 128 and is noex_style ### What this PR does / why we need it? Optimize rope by extracting index_select from layers into model, which can reduce (layer_num -1) * 2 Gather ops in each prefill/decode stage. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA --------- Signed-off-by: David9857 <985700846@qq.com>

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5

137810d

Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions bot added module:tests module:ops module:core merge-conflicts labels Jul 14, 2025

ApsarasX reviewed Jul 14, 2025

View reviewed changes

David9857 force-pushed the qwen2-rope branch from 48965fb to d5f8b5f Compare July 14, 2025 12:04

David9857 and others added 2 commits July 14, 2025 20:07

qwen2 optimize rope

7c4cad1

Signed-off-by: David9857 <985700846@qq.com>

David9857 force-pushed the qwen2-rope branch from d5f8b5f to f3bf3e4 Compare July 14, 2025 12:07

wangxiyuan changed the title ~~optmize rope in qwen2~~ [0.9.1]optmize rope in qwen2 Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1]optmize rope in qwen2 #1782

[0.9.1]optmize rope in qwen2 #1782

David9857 commented Jul 14, 2025

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

ApsarasX Jul 14, 2025

Uh oh!

David9857 Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

[0.9.1]optmize rope in qwen2 #1782

Are you sure you want to change the base?

[0.9.1]optmize rope in qwen2 #1782

Conversation

David9857 commented Jul 14, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

ApsarasX Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

David9857 Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

David9857 Jul 15, 2025 •

edited

Loading