-
Notifications
You must be signed in to change notification settings - Fork 466
[Refactor]Support gatingtopk operator generalization and remove row idx #3265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the MoE expert selection logic to generalize the gatingtopk
operator and remove the row_idx
tensor, which simplifies the codebase across numerous files. The removal of row_idx
appears to be handled correctly and consistently. However, the generalization of _select_experts_with_fusion_ops
in vllm_ascend/ops/moe/experts_selector.py
introduces a critical issue. The new logic incorrectly handles grouped top-k, which will likely lead to incorrect behavior and potential crashes. I have provided a detailed comment with a suggested fix for this issue.
if scoring_func == "softmax": | ||
norm_type = 0 | ||
topk_group = 1 | ||
num_expert_group = 1 | ||
else: | ||
norm_type = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for determining topk_group
and num_expert_group
is incorrect. It unconditionally overwrites these parameters to 1 when scoring_func
is "softmax", effectively disabling grouped top-k for softmax, which is not the intended behavior when use_grouped_topk
is True
. Additionally, for scoring_func="sigmoid"
, if use_grouped_topk
is False
, topk_group
and num_expert_group
would be None
, which could lead to a crash in the torch_npu.npu_moe_gating_top_k
operator.
The logic should be updated to respect the use_grouped_topk
parameter for all scoring functions. When use_grouped_topk
is False
, topk_group
and num_expert_group
should be set to 1 to disable grouping. Otherwise, the provided values should be used.
if scoring_func == "softmax": | |
norm_type = 0 | |
topk_group = 1 | |
num_expert_group = 1 | |
else: | |
norm_type = 1 | |
if scoring_func == "softmax": | |
norm_type = 0 | |
else: | |
norm_type = 1 | |
if not use_grouped_topk: | |
topk_group = 1 | |
num_expert_group = 1 |
b990378
to
c73dd8f
Compare
Signed-off-by: 1092626063 <1092626063@qq.com>
Signed-off-by: CaranLic <740821011@qq.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?