Skip to content

Commit 4ad3ffc

Browse files
panchao-hubzhangdepeng
authored andcommitted
[torchair]remove aicpu op (vllm-project#2640)
### What this PR does / why we need it? remove aicpu op for torchair mode ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.10.1.1 vLLM main: vllm-project/vllm@05d839c - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@67c1490 Signed-off-by: zhangdepeng <zhangdepeng2@huawei.com> Co-authored-by: zhangdepeng <zhangdepeng2@huawei.com>
1 parent ab5aeac commit 4ad3ffc

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm_ascend/torchair/torchair_attention.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,7 @@ def __init__(
304304
self.num_queries_per_kv = self.num_heads // self.num_kv_heads
305305
self.key_cache = None
306306
self.value_cache = None
307+
self.scale_tensor = torch.zeros((), device='npu', dtype=torch.int32)
307308

308309
def forward(
309310
self,
@@ -366,7 +367,7 @@ def forward(
366367
key_cache, value_cache = kv_cache[0], kv_cache[1]
367368
slots = attn_metadata.slot_mapping
368369

369-
block_size = key_cache.shape[1]
370+
block_size = self.scale_tensor + key_cache.shape[1]
370371
slots_indices = slots.reshape(-1, 1)
371372
block_indices = slots_indices // block_size
372373
slots_indices = slots_indices % block_size

0 commit comments

Comments
 (0)