[BUGFIX][v0.9.1] fix torchair bug when DP is enabled (#1727)

JC-ut0 · web-flow · commit 9ca9c6f0305c · 2025-07-10T22:56:20.000+08:00
### What this PR does / why we need it?

[BUGFIX][v0.9.1] fix torchair bug when DP is enabled

### Does this PR introduce _any_ user-facing change?


### How was this patch tested?
&lt;!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
--&gt;

Tested on DP4,TP4,EP16 with/without MTP sernerio

Signed-off-by: xuyexiong &lt;xuyexiong@huawei.com&gt;
diff --git a/vllm_ascend/attention/mla_v1.py b/vllm_ascend/attention/mla_v1.py
@@ -538,11 +538,11 @@ def build(
                 actual_seq_q_lens = query_start_loc[1:].tolist(
                 ) + self.runner.actual_seq_q_lens[num_reqs:num_reqs +
                                                   num_reqs_pad_size]
-                # mtp torchair + PD scenario, last element of actual_seq_q_lens must equal to num_reqs_pad_size
+                # mtp torchair + PD scenario, last element of actual_seq_q_lens must equal to num_padded_token_size
                 num_padded_token_size = slot_mapping.size(0)
-                if actual_seq_q_lens[-1] != num_padded_token_size:
-                    actual_seq_q_lens.append(num_padded_token_size)
-                    seq_lens_list.append(0)
+                if actual_seq_q_lens[-1] != num_padded_token_size \
+                    and self.runner.attn_state == AscendAttentionState.SpecDecoding:
+                    actual_seq_q_lens[-1] = num_padded_token_size
             else:
                 seq_lens_list = seq_lens.tolist()