-
Notifications
You must be signed in to change notification settings - Fork 475
support fused_moe_allgather_ep #1220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for a new fused MoE allgather endpoint path (fused_moe_allgather_ep
) in both the dynamic quantization workflow and the DeepSeek model.
- Introduces
fused_experts_with_allgather
operator inw8a8_dynamic.py
- Registers and checks a new
VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP
env var andetp_group
- Updates DeepSeek V2 to conditionally prefer allgather-based MoE routing over all-to-all
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
vllm_ascend/quantization/w8a8_dynamic.py | Added fused_experts_with_allgather , imported get_etp_group |
vllm_ascend/models/deepseek_v2.py | Added fused_experts_allgather_ep_enabled flag and logic tweaks |
vllm_ascend/envs.py | Defined VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP environment var |
Comments suppressed due to low confidence (2)
vllm_ascend/quantization/w8a8_dynamic.py:349
- This new public function lacks a docstring explaining its purpose, parameters, and return value. Please add a clear docstring to improve maintainability and usability.
def fused_experts_with_allgather(hidden_states: torch.Tensor,
vllm_ascend/models/deepseek_v2.py:332
- [nitpick] The flag
enable_alltoall_ep
is overloaded to control both all-to-all and allgather behaviors. Consider renaming it to something likealltoall_ep_enabled
for clarity.
else:
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
|
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
This pr can be merged once the CI is happy. cc @Yikun @wangxiyuan |
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my PR can be merged first because this PR is already ready and thoroughly tested with complete unit tests. It has been waiting to be merged for several days. The other PR involves significant refactoring and it's unclear when it can be merged, so I do not want my PR to be blocked by the other one. @wangxiyuan @Yikun @jianzs |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
What this PR does / why we need it?
support fused_moe_allgather_ep
Does this PR introduce any user-facing change?
~~
How was this patch tested?
~~