fix dp of has_unfinished_dp #2966

zhaowei1936 · 2025-09-16T13:25:48Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@4e5affe

Signed-off-by: zhaowe1936 <zhaowei6@huawei.com>

github-actions · 2025-09-16T13:26:01Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces Ascend-specific implementations for data parallel group initialization and for checking for unfinished sequences across data parallel workers. This is achieved by monkey-patching methods on vllm.config.ParallelConfig. The changes appear functionally correct, but I have one suggestion to improve the implementation of ascend_has_unfinished_dp to be more idiomatic and clearer by using boolean tensors and logical operators, which is a better practice for this kind of distributed check.

gemini-code-assist · 2025-09-16T13:26:43Z

vllm_ascend/patch/platform/patch_common/patch_distributed.py

+    tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu")
+    # dp rank 0: has_unfinished_seqs=True
+    # dp rank 1: has_unfinished_seqs=False
+    # aggregated: has_unfinished_seqs=True
+    # so this is an OR operation, i.e. MAX in integers
+    torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group)
+    aggregated_has_unfinished = bool(tensor.item())
+    return aggregated_has_unfinished


For improved clarity and correctness, it's better to use a boolean tensor with a logical OR operation (ReduceOp.LOR) for this check. The current implementation uses an integer tensor with ReduceOp.MAX, which works but is less direct and requires comments to explain the logic. Using torch.bool and ReduceOp.LOR is more idiomatic for boolean reductions and makes the code's intent self-evident, removing the need for extra casting and explanatory comments.

Suggested change

tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu")

# dp rank 0: has_unfinished_seqs=True

# dp rank 1: has_unfinished_seqs=False

# aggregated: has_unfinished_seqs=True

# so this is an OR operation, i.e. MAX in integers

torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group)

aggregated_has_unfinished = bool(tensor.item())

return aggregated_has_unfinished

tensor = torch.tensor([has_unfinished], dtype=torch.bool, device="npu")

torch.distributed.all_reduce(tensor, op=ReduceOp.LOR, group=dp_group)

return tensor.item()

fix dp of has_unfinished_dp

f675b3e

Signed-off-by: zhaowe1936 <zhaowei6@huawei.com>

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix dp of has_unfinished_dp #2966

fix dp of has_unfinished_dp #2966

zhaowei1936 commented Sep 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

Uh oh!

fix dp of has_unfinished_dp #2966

Are you sure you want to change the base?

fix dp of has_unfinished_dp #2966

Conversation

zhaowei1936 commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaowei1936 commented Sep 16, 2025 •

edited by github-actions bot

Loading