Skip to content

Commit 9ca9c6f

Browse files
authored
[BUGFIX][v0.9.1] fix torchair bug when DP is enabled (#1727)
### What this PR does / why we need it? [BUGFIX][v0.9.1] fix torchair bug when DP is enabled ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Tested on DP4,TP4,EP16 with/without MTP sernerio Signed-off-by: xuyexiong <xuyexiong@huawei.com>
1 parent 8e42f71 commit 9ca9c6f

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

vllm_ascend/attention/mla_v1.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -538,11 +538,11 @@ def build(
538538
actual_seq_q_lens = query_start_loc[1:].tolist(
539539
) + self.runner.actual_seq_q_lens[num_reqs:num_reqs +
540540
num_reqs_pad_size]
541-
# mtp torchair + PD scenario, last element of actual_seq_q_lens must equal to num_reqs_pad_size
541+
# mtp torchair + PD scenario, last element of actual_seq_q_lens must equal to num_padded_token_size
542542
num_padded_token_size = slot_mapping.size(0)
543-
if actual_seq_q_lens[-1] != num_padded_token_size:
544-
actual_seq_q_lens.append(num_padded_token_size)
545-
seq_lens_list.append(0)
543+
if actual_seq_q_lens[-1] != num_padded_token_size \
544+
and self.runner.attn_state == AscendAttentionState.SpecDecoding:
545+
actual_seq_q_lens[-1] = num_padded_token_size
546546
else:
547547
seq_lens_list = seq_lens.tolist()
548548

0 commit comments

Comments
 (0)