Skip to content

Conversation

jianzs
Copy link
Collaborator

@jianzs jianzs commented Jun 13, 2025

Pass expert_scale and expand_scale args to the dispatch and combine functions.

btw, I think the current code for mc2 only works when graph mode is enabled.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs added the ready read for review label Jun 13, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that the newly computed scaling factors are correctly passed into the MC2 dispatch and combine routines for dynamic expert routing.

  • Pass expert_scales into the initial npu_moe_distribute_dispatch call
  • Unpack and capture expand_scales from the dispatch output
  • Provide expand_scales to the subsequent npu_moe_combine invocation
Comments suppressed due to low confidence (4)

vllm_ascend/quantization/w8a8_dynamic.py:132

  • The function’s docstring should be updated to document the new expert_scales and expand_scales parameters so their roles are clear to future maintainers.
def fused_experts_with_mc2(

vllm_ascend/quantization/w8a8_dynamic.py:164

  • [nitpick] Unpacking with a placeholder _ and a magic slice range can be hard to follow—consider naming the sixth element or adding a comment to clarify what it represents.
expand_x, dynamic_scale, expand_idx, expert_token_nums, ep_recv_counts, _, expand_scales = output[0:7]

vllm_ascend/quantization/w8a8_dynamic.py:135

  • There should be unit or integration tests verifying that expert_scales and expand_scales are correctly applied in the MC2 routines to prevent regressions.
"expert_scales": topk_weights.to(torch.float32),

vllm_ascend/quantization/w8a8_dynamic.py:135

  • [nitpick] Calling .to(torch.float32) inside the hot path may incur extra device transfers; consider precomputing or caching the float32 conversion if this function is called repeatedly.
"expert_scales": topk_weights.to(torch.float32),

@jianzs jianzs merged commit afc8edb into vllm-project:main Jun 17, 2025
21 checks passed
@Yikun Yikun added this to the v0.9.1 milestone Jun 23, 2025
shiyuan680 pushed a commit to raindaywhu/vllm-ascend that referenced this pull request Jul 7, 2025
Pass `expert_scale` and `expand_scale` args to the dispatch and combine
functions.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants