Skip to content

Commit fa4a5d9

Browse files
authored
[Bugfix] Remove redundant tensor creation and unused code (#656)
### What this PR does / why we need it? Eliminated duplicate `block_table` tensor initialization and cleaned up unused code segments. This resolves an issue where the second creation was overwriting the first, potentially leading to unexpected behavior. Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
1 parent ba3d8aa commit fa4a5d9

File tree

1 file changed

+0
-16
lines changed

1 file changed

+0
-16
lines changed

vllm_ascend/attention/attention.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -599,14 +599,6 @@ def build(
599599
max_query_len = max(query_lens)
600600
max_prefill_seq_len = max(self.prefill_seq_lens, default=0)
601601
max_decode_seq_len = max(self.curr_seq_lens, default=0)
602-
603-
if self.num_prefills > 0:
604-
self.attn_mask = AscendMetadataBuilder._attn_mask_builder.get_attn_mask( # type: ignore
605-
max_prefill_seq_len,
606-
self.input_builder.runner.model_config.dtype,
607-
self.input_builder.runner.device)
608-
else:
609-
self.attn_mask = None
610602
num_decode_tokens = self.num_decode_tokens
611603

612604
if self.num_prefills == 0 and use_torchair_graph:
@@ -630,14 +622,6 @@ def build(
630622
self.input_builder.runner.device)
631623
else:
632624
self.attn_mask = None
633-
num_decode_tokens = self.num_decode_tokens
634-
635-
block_tables = make_tensor_with_pad(
636-
self.block_tables,
637-
pad=0,
638-
dtype=torch.int32,
639-
device=device,
640-
)
641625

642626
assert max_query_len > 0, "query_lens: {}".format(query_lens)
643627

0 commit comments

Comments
 (0)