Skip to content

Commit 05a700d

Browse files
authored
[Bugfix] Fix async copy bug under single expert scenario (#3005)
Add missing barrier when no implicit synchonize by `repeat_interleave` is available. Otherwise, the `non_blocking=True` copy of `output_splits` and `input_splits` from NPU may failed to complete before later `async_all_to_all` uses them. ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@ef7eefe Signed-off-by: sdmyzlp <lrwei2@petalmail.com>
1 parent 2a87b4c commit 05a700d

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

vllm_ascend/ops/moe/token_dispatcher.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -639,6 +639,10 @@ def _preprocess(self, topk_ids: torch.Tensor) -> torch.Tensor:
639639
self.global_input_tokens_local_experts_indices = torch.repeat_interleave(
640640
self.expert_ids_per_ep_rank,
641641
self.num_global_tokens_per_local_expert.ravel())
642+
else:
643+
# TODO: This full synchronization can be a performance bottleneck.
644+
# A more granular sync (e.g., blocking D2H copies) should be investigated.
645+
torch.npu.synchronize()
642646

643647
return num_tokens_per_local_expert
644648

0 commit comments

Comments
 (0)