Skip to content

Commit 741a8cf

Browse files
authored
[BUGFIX][0.9.1] FIX ring_mla input ‘query_lens’ to cpu (vllm-project#2170)
### What this PR does / why we need it? [BUGFIX][0.9.1] FIX ring_mla input ‘query_lens’ to cpu ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: xuyexiong <xuyexiong@huawei.com>
1 parent 2ef3d98 commit 741a8cf

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm_ascend/worker/mtp_proposer_v1.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ def propose(
218218
self.hidden_states[:num_tokens] = target_hidden_states
219219

220220
if attn_metadata.prefill is not None:
221-
attn_metadata.prefill.query_lens = query_lens
221+
attn_metadata.prefill.query_lens = query_lens.cpu()
222222
attn_metadata.prefill.input_positions = target_positions
223223

224224
if not self.runner.torchair_graph_enabled:

0 commit comments

Comments
 (0)