Skip to content

Conversation

David9857
Copy link
Contributor

What this PR does / why we need it?

Replace the original interface(_npu_rotary_embedding) with a new interface(npu_mrope) with better performance.

Does this PR introduce any user-facing change?

NA

How was this patch tested?

Signed-off-by: David9857 <985700846@qq.com>
Signed-off-by: David9857 <985700846@qq.com>
@Yikun Yikun added long-term-test enable long term test for PR accuracy-test enable all accuracy test for PR performance-test enable performance test for PR ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 15, 2025
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
@Yikun Yikun added the ready-for-test start test by label for PR label Jun 15, 2025
query = query.contiguous().view(query.shape[0], -1)
key = key.contiguous().view(key.shape[0], -1)
torch_npu._npu_rotary_embedding(
query, key = torch_npu.npu_mrope(
Copy link
Collaborator

@Yikun Yikun Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question: which torch_npu version supports the npu_mrope operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for graph mode, no release version supports npu_mrope yet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you combine #1231 together

@David9857 David9857 changed the title [perf] Replace _npu_rotary_embedding with npu_mrope [WIP][perf] Replace _npu_rotary_embedding with npu_mrope Jun 16, 2025
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@David9857
Copy link
Contributor Author

for models using GPT-NeoX style Rope, here's an alternative solution with lower latency. #1719

@David9857 David9857 closed this Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy-test enable all accuracy test for PR long-term-test enable long term test for PR merge-conflicts module:ops module:tests performance-test enable performance test for PR ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants