Skip to content

Conversation

lyj-jjj
Copy link
Contributor

@lyj-jjj lyj-jjj commented Jun 14, 2025

What this PR does / why we need it?

support fused_moe_allgather_ep

Does this PR introduce any user-facing change?

~~

How was this patch tested?

~~

lyj-jjj added 2 commits June 14, 2025 14:30
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@Yikun
Copy link
Collaborator

Yikun commented Jun 14, 2025

  • Please update commits msg (epscially How was this patch tested?, should mention a E2E step how to reproduce)
  • Add UT or E2E test to avoid this being break again after merge.

@jianzs jianzs requested a review from Copilot June 15, 2025 07:39
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a new fused MoE allgather endpoint path (fused_moe_allgather_ep) in both the dynamic quantization workflow and the DeepSeek model.

  • Introduces fused_experts_with_allgather operator in w8a8_dynamic.py
  • Registers and checks a new VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP env var and etp_group
  • Updates DeepSeek V2 to conditionally prefer allgather-based MoE routing over all-to-all

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
vllm_ascend/quantization/w8a8_dynamic.py Added fused_experts_with_allgather, imported get_etp_group
vllm_ascend/models/deepseek_v2.py Added fused_experts_allgather_ep_enabled flag and logic tweaks
vllm_ascend/envs.py Defined VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP environment var
Comments suppressed due to low confidence (2)

vllm_ascend/quantization/w8a8_dynamic.py:349

  • This new public function lacks a docstring explaining its purpose, parameters, and return value. Please add a clear docstring to improve maintainability and usability.
def fused_experts_with_allgather(hidden_states: torch.Tensor,

vllm_ascend/models/deepseek_v2.py:332

  • [nitpick] The flag enable_alltoall_ep is overloaded to control both all-to-all and allgather behaviors. Consider renaming it to something like alltoall_ep_enabled for clarity.
        else:

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@jianzs jianzs added the ready-for-test start test by label for PR label Jun 16, 2025
@lyj-jjj
Copy link
Contributor Author

lyj-jjj commented Jun 16, 2025

  • Please update commits msg (epscially How was this patch tested?, should mention a E2E step how to reproduce)
  • Add UT or E2E test to avoid this being break again after merge.
  • Please update commits msg (epscially How was this patch tested?, should mention a E2E step how to reproduce)
  • Add UT or E2E test to avoid this being break again after merge.
  • Please update commits msg (epscially How was this patch tested?, should mention a E2E step how to reproduce)
  • Add UT or E2E test to avoid this being break again after merge.

@lyj-jjj lyj-jjj closed this Jun 16, 2025
@lyj-jjj lyj-jjj reopened this Jun 16, 2025
lyj-jjj added 2 commits June 16, 2025 20:56
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@jianzs
Copy link
Collaborator

jianzs commented Jun 16, 2025

This pr can be merged once the CI is happy. cc @Yikun @wangxiyuan

@jianzs jianzs requested a review from realliujiaxu June 16, 2025 15:06
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is fine. But I'd like to merge this after #1229. what do you think? @jianzs @Yikun

@lyj-jjj
Copy link
Contributor Author

lyj-jjj commented Jun 17, 2025

I think my PR can be merged first because this PR is already ready and thoroughly tested with complete unit tests. It has been waiting to be merged for several days. The other PR involves significant refactoring and it's unclear when it can be merged, so I do not want my PR to be blocked by the other one. @wangxiyuan @Yikun @jianzs

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@lyj-jjj lyj-jjj closed this Jun 18, 2025
@lyj-jjj lyj-jjj reopened this Jun 18, 2025
lyj-jjj added 3 commits June 18, 2025 23:33
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@lyj-jjj lyj-jjj closed this Jun 20, 2025
@lyj-jjj lyj-jjj reopened this Jun 20, 2025
@lyj-jjj lyj-jjj closed this Jun 21, 2025
@lyj-jjj lyj-jjj deleted the lyj-main-allgather-update branch July 1, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants