[long_seq_optim] BSND to TND and FA_UPDATE replacement #3778

pichangping · 2025-10-27T03:07:05Z

What this PR does / why we need it?

We have optimized the performance of long sequences：First,Modify the input data format for attention calculation. Instead of using the original BSND format, remove the logic for converting between TND and BSND, and directly adopt the TND format.
The TND input format can be directly reused, which shortens the data flow path. Converting to BSND is an unnecessary processing step.Second, we switched the output update of the concatenated small operators to the npu_attention_update fusion operator to improve performance.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@c9461e0

Signed-off-by: pichangping <1337510399@qq.com>

github-actions · 2025-10-27T03:07:18Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the attention mechanism to leverage native TND layout support and a new npu_attention_update kernel on Ascend NPUs, removing the manual BSND packing/unpacking and update logic. While this simplifies the code and likely improves performance, I've identified several critical issues related to potential memory leaks in graph capture mode and a bug with a hardcoded value that could cause errors with models that have large context windows. There is also a high-severity concern about unnecessary type casting that could impact performance.

vllm_ascend/attention/attention_v1.py

vllm_ascend/worker/model_runner_v1.py

vllm_ascend/attention/attention_v1.py

Signed-off-by: pichangping <1337510399@qq.com>

vllm_ascend/attention/attention_v1.py

Signed-off-by: pichangping <1337510399@qq.com>

vllm_ascend/attention/attention_v1.py

zhenwenqi2024 · 2025-10-27T09:30:42Z

vllm_ascend/worker/model_runner_v1.py

-                    max_seq_len = max(seq_lens, default=0)
                    pcp_prefill_mask = torch.triu(
-                        torch.full((num_prefills, max_seq_len, max_seq_len),
+                        torch.full((2048, 2048),


这里硬编码成2048,2048的原因是啥

Signed-off-by: pichangping <1337510399@qq.com>

BSND to TND and FA_UPDATE replacement

a16d2de

Signed-off-by: pichangping <1337510399@qq.com>

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

vllm_ascend/attention/attention_v1.py Outdated Show resolved Hide resolved

vllm_ascend/attention/attention_v1.py Outdated Show resolved Hide resolved

vllm_ascend/worker/model_runner_v1.py Show resolved Hide resolved

vllm_ascend/attention/attention_v1.py Show resolved Hide resolved

pichangping added 5 commits October 27, 2025 11:19

BSND to TND and FA_UPDATE replacement

090bc49

Signed-off-by: pichangping <1337510399@qq.com>

BSND to TND and FA_UPDATE replacement

212dfa5

Signed-off-by: pichangping <1337510399@qq.com>

BSND to TND and FA_UPDATE replacement

0d67a17

Signed-off-by: pichangping <1337510399@qq.com>

BSND to TND and FA_UPDATE replacement

6ee9dc8

Signed-off-by: pichangping <1337510399@qq.com>

BSND to TND and FA_UPDATE replacement

8dea6e9

Signed-off-by: pichangping <1337510399@qq.com>

pichangping changed the title ~~BSND to TND and FA_UPDATE replacement~~ [long_seq_optim] BSND to TND and FA_UPDATE replacement Oct 27, 2025

wangxiyuan approved these changes Oct 27, 2025

View reviewed changes

momo609 reviewed Oct 27, 2025

View reviewed changes

vllm_ascend/attention/attention_v1.py Outdated Show resolved Hide resolved

BSND to TND and FA_UPDATE replacement

466f819

Signed-off-by: pichangping <1337510399@qq.com>

zhenwenqi2024 reviewed Oct 27, 2025

View reviewed changes

vllm_ascend/attention/attention_v1.py Show resolved Hide resolved

zhenwenqi2024 reviewed Oct 27, 2025

View reviewed changes

pichangping added 2 commits October 27, 2025 19:28

[long_seq_optim]BSND to TND and FA_UPDATE replacement

ff3244c

Signed-off-by: pichangping <1337510399@qq.com>

[long_seq_optim]BSND to TND and FA_UPDATE replacement

38540c8

Signed-off-by: pichangping <1337510399@qq.com>

yiz-liu added ready read for review ready-for-test start test by label for PR labels Oct 28, 2025

yiz-liu merged commit f57bdb0 into vllm-project:main Oct 29, 2025
46 of 53 checks passed

weiguihua2 mentioned this pull request Nov 3, 2025

revert TND modify when dcp pcp #3948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[long_seq_optim] BSND to TND and FA_UPDATE replacement #3778

[long_seq_optim] BSND to TND and FA_UPDATE replacement #3778

Uh oh!

pichangping commented Oct 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhenwenqi2024 Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[long_seq_optim] BSND to TND and FA_UPDATE replacement #3778

[long_seq_optim] BSND to TND and FA_UPDATE replacement #3778

Uh oh!

Conversation

pichangping commented Oct 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhenwenqi2024 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pichangping commented Oct 27, 2025 •

edited by github-actions bot

Loading