Skip to content

Conversation

zhaowei1936
Copy link

@zhaowei1936 zhaowei1936 commented Sep 16, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: zhaowe1936 <zhaowei6@huawei.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Ascend-specific implementations for data parallel group initialization and for checking for unfinished sequences across data parallel workers. This is achieved by monkey-patching methods on vllm.config.ParallelConfig. The changes appear functionally correct, but I have one suggestion to improve the implementation of ascend_has_unfinished_dp to be more idiomatic and clearer by using boolean tensors and logical operators, which is a better practice for this kind of distributed check.

Comment on lines +59 to +66
tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu")
# dp rank 0: has_unfinished_seqs=True
# dp rank 1: has_unfinished_seqs=False
# aggregated: has_unfinished_seqs=True
# so this is an OR operation, i.e. MAX in integers
torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group)
aggregated_has_unfinished = bool(tensor.item())
return aggregated_has_unfinished
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For improved clarity and correctness, it's better to use a boolean tensor with a logical OR operation (ReduceOp.LOR) for this check. The current implementation uses an integer tensor with ReduceOp.MAX, which works but is less direct and requires comments to explain the logic. Using torch.bool and ReduceOp.LOR is more idiomatic for boolean reductions and makes the code's intent self-evident, removing the need for extra casting and explanatory comments.

Suggested change
tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu")
# dp rank 0: has_unfinished_seqs=True
# dp rank 1: has_unfinished_seqs=False
# aggregated: has_unfinished_seqs=True
# so this is an OR operation, i.e. MAX in integers
torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group)
aggregated_has_unfinished = bool(tensor.item())
return aggregated_has_unfinished
tensor = torch.tensor([has_unfinished], dtype=torch.bool, device="npu")
torch.distributed.all_reduce(tensor, op=ReduceOp.LOR, group=dp_group)
return tensor.item()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant