[Bugfix]: Pass scaling args to mc2 #1202

jianzs · 2025-06-13T03:36:51Z

Pass expert_scale and expand_scale args to the dispatch and combine functions.

btw, I think the current code for mc2 only works when graph mode is enabled.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Copilot

Pull Request Overview

This PR ensures that the newly computed scaling factors are correctly passed into the MC2 dispatch and combine routines for dynamic expert routing.

Pass expert_scales into the initial npu_moe_distribute_dispatch call
Unpack and capture expand_scales from the dispatch output
Provide expand_scales to the subsequent npu_moe_combine invocation

Comments suppressed due to low confidence (4)

vllm_ascend/quantization/w8a8_dynamic.py:132

The function’s docstring should be updated to document the new expert_scales and expand_scales parameters so their roles are clear to future maintainers.

def fused_experts_with_mc2(

vllm_ascend/quantization/w8a8_dynamic.py:164

[nitpick] Unpacking with a placeholder _ and a magic slice range can be hard to follow—consider naming the sixth element or adding a comment to clarify what it represents.

expand_x, dynamic_scale, expand_idx, expert_token_nums, ep_recv_counts, _, expand_scales = output[0:7]

vllm_ascend/quantization/w8a8_dynamic.py:135

There should be unit or integration tests verifying that expert_scales and expand_scales are correctly applied in the MC2 routines to prevent regressions.

"expert_scales": topk_weights.to(torch.float32),

vllm_ascend/quantization/w8a8_dynamic.py:135

[nitpick] Calling .to(torch.float32) inside the hot path may incur extra device transfers; consider precomputing or caching the float32 conversion if this function is called repeatedly.

"expert_scales": topk_weights.to(torch.float32),

Pass `expert_scale` and `expand_scale` args to the dispatch and combine functions. Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: pass scaling args to mc2

7f68905

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

github-actions bot added the module:quantization label Jun 13, 2025

jianzs requested review from ApsarasX, Copilot and wangxiyuan June 13, 2025 06:15

jianzs added the ready read for review label Jun 13, 2025

Copilot AI reviewed Jun 13, 2025

View reviewed changes

ApsarasX approved these changes Jun 13, 2025

View reviewed changes

wangxiyuan approved these changes Jun 17, 2025

View reviewed changes

jianzs merged commit afc8edb into vllm-project:main Jun 17, 2025
21 checks passed

Yikun added this to the v0.9.1 milestone Jun 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix]: Pass scaling args to mc2 #1202

[Bugfix]: Pass scaling args to mc2 #1202

Uh oh!

jianzs commented Jun 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix]: Pass scaling args to mc2 #1202

[Bugfix]: Pass scaling args to mc2 #1202

Uh oh!

Conversation

jianzs commented Jun 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants