-
Notifications
You must be signed in to change notification settings - Fork 453
fix dp of has_unfinished_dp #2966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: zhaowe1936 <zhaowei6@huawei.com>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Ascend-specific implementations for data parallel group initialization and for checking for unfinished sequences across data parallel workers. This is achieved by monkey-patching methods on vllm.config.ParallelConfig
. The changes appear functionally correct, but I have one suggestion to improve the implementation of ascend_has_unfinished_dp
to be more idiomatic and clearer by using boolean tensors and logical operators, which is a better practice for this kind of distributed check.
tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu") | ||
# dp rank 0: has_unfinished_seqs=True | ||
# dp rank 1: has_unfinished_seqs=False | ||
# aggregated: has_unfinished_seqs=True | ||
# so this is an OR operation, i.e. MAX in integers | ||
torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group) | ||
aggregated_has_unfinished = bool(tensor.item()) | ||
return aggregated_has_unfinished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved clarity and correctness, it's better to use a boolean tensor with a logical OR operation (ReduceOp.LOR
) for this check. The current implementation uses an integer tensor with ReduceOp.MAX
, which works but is less direct and requires comments to explain the logic. Using torch.bool
and ReduceOp.LOR
is more idiomatic for boolean reductions and makes the code's intent self-evident, removing the need for extra casting and explanatory comments.
tensor = torch.tensor([has_unfinished], dtype=torch.int32, device="npu") | |
# dp rank 0: has_unfinished_seqs=True | |
# dp rank 1: has_unfinished_seqs=False | |
# aggregated: has_unfinished_seqs=True | |
# so this is an OR operation, i.e. MAX in integers | |
torch.distributed.all_reduce(tensor, op=ReduceOp.MAX, group=dp_group) | |
aggregated_has_unfinished = bool(tensor.item()) | |
return aggregated_has_unfinished | |
tensor = torch.tensor([has_unfinished], dtype=torch.bool, device="npu") | |
torch.distributed.all_reduce(tensor, op=ReduceOp.LOR, group=dp_group) | |
return tensor.item() |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?