-
Notifications
You must be signed in to change notification settings - Fork 467
[Feat] communication optimization for mc2 ops on A2 #2752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables hierarchical communication for mc2 operations on A2 hardware by adding checks for the Ascend A2 SOC version and specific environment variables. While the changes correctly add the necessary logic, I've identified a critical bug in token_dispatcher.py
where variables are unpacked incorrectly, which will likely lead to runtime errors. Additionally, there is significant code duplication across three files that should be refactored to improve maintainability. Addressing these points will improve the correctness and quality of the code.
expand_x, dynamic_scale, assist_info_for_combine, expert_token_nums, \ | ||
ep_recv_counts, _, expand_scales = self.output[0:7] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variables assist_info_for_combine
and ep_recv_counts
are unpacked as local variables, but they are later used as instance attributes (self.assist_info_for_combine
and self.ep_recv_counts
) in get_combine_mc_kwargs
. This will cause them to have their initial None
value from __init__
, likely leading to a NoneType
error or incorrect behavior. You should assign the unpacked values directly to the instance attributes.
expand_x, dynamic_scale, self.assist_info_for_combine, expert_token_nums, self.ep_recv_counts, _, expand_scales = self.output[0:7]
self.a2_need_extra_args = (get_ascend_soc_version() == AscendSocVersion.A2 | ||
and os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0" | ||
and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic to determine if hierarchical communication for A2 is enabled is duplicated in three files: vllm_ascend/ops/moe_dispatcher/token_dispatcher.py
, vllm_ascend/torchair/ops/torchair_fused_moe.py
, and vllm_ascend/torchair/quantization/torchair_w8a8_dynamic.py
. This duplication harms maintainability and makes future changes error-prone.
Consider refactoring this into a helper function in a shared utility module, such as vllm_ascend/utils.py
. For example:
# in vllm_ascend/utils.py
def is_a2_hierarchical_comm_enabled():
return (get_ascend_soc_version() == AscendSocVersion.A2
and os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0"
and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")
You can then call this function from all three locations to remove the duplicated code.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2752 +/- ##
==========================================
+ Coverage 74.76% 74.95% +0.18%
==========================================
Files 150 150
Lines 20891 20925 +34
==========================================
+ Hits 15620 15684 +64
+ Misses 5271 5241 -30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1b5b021
to
8a1c041
Compare
d4ffa13
to
43b522d
Compare
# NOTE: When in A2, setting the environment variables HCCL_INTRA_PCIE_ENABLE=1 and | ||
# HCCL_INTRA_ROCE_ENABLE=0 can reduce cross-machine communication traffic and significantly | ||
# improve communication performance. | ||
a2_need_extra_args = (get_ascend_soc_version() == AscendSocVersion.A2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be more appropriate to add the os variable in the script.
43b522d
to
fd780e1
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
fd780e1
to
5f92012
Compare
What this PR does / why we need it?
Currently, when in A2, setting the environment variables
HCCL_INTRA_PCIE_ENABLE=1
andHCCL_INTRA_ROCE_ENABLE=0
can reduce cross-machine communication traffic and significantly improve communication performance.For more details, please refer to document
Does this PR introduce any user-facing change?
Nope
How was this patch tested?