Skip to content

Conversation

Liccol
Copy link
Contributor

@Liccol Liccol commented Sep 29, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MoE expert selection logic to generalize the gatingtopk operator and remove the row_idx tensor, which simplifies the codebase across numerous files. The removal of row_idx appears to be handled correctly and consistently. However, the generalization of _select_experts_with_fusion_ops in vllm_ascend/ops/moe/experts_selector.py introduces a critical issue. The new logic incorrectly handles grouped top-k, which will likely lead to incorrect behavior and potential crashes. I have provided a detailed comment with a suggested fix for this issue.

Comment on lines +171 to +176
if scoring_func == "softmax":
norm_type = 0
topk_group = 1
num_expert_group = 1
else:
norm_type = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current logic for determining topk_group and num_expert_group is incorrect. It unconditionally overwrites these parameters to 1 when scoring_func is "softmax", effectively disabling grouped top-k for softmax, which is not the intended behavior when use_grouped_topk is True. Additionally, for scoring_func="sigmoid", if use_grouped_topk is False, topk_group and num_expert_group would be None, which could lead to a crash in the torch_npu.npu_moe_gating_top_k operator.

The logic should be updated to respect the use_grouped_topk parameter for all scoring functions. When use_grouped_topk is False, topk_group and num_expert_group should be set to 1 to disable grouping. Otherwise, the provided values should be used.

Suggested change
if scoring_func == "softmax":
norm_type = 0
topk_group = 1
num_expert_group = 1
else:
norm_type = 1
if scoring_func == "softmax":
norm_type = 0
else:
norm_type = 1
if not use_grouped_topk:
topk_group = 1
num_expert_group = 1

1092626063 and others added 2 commits September 30, 2025 14:44
Signed-off-by: 1092626063 <1092626063@qq.com>
Signed-off-by: CaranLic <740821011@qq.com>
@Liccol Liccol reopened this Sep 30, 2025
@Liccol Liccol changed the title Support gatingtopk operator generalization and remove row idx [Refactor]Support gatingtopk operator generalization and remove row idx Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants