Skip to content

add qwen3-moe operation #1709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

shiyuan680
Copy link
Contributor

@shiyuan680 shiyuan680 commented Jul 9, 2025

What this PR does / why we need it?

origin qwen3_moe loss alltoall operation which result fault resultt, in this pr reuse some optimizations from deepseek.

Does this PR introduce any user-facing change?

no user-facing change

How was this patch tested?

e2e & ut

@shiyuan680 shiyuan680 force-pushed the qwen3_update branch 5 times, most recently from aa65366 to a63e381 Compare July 12, 2025 03:30
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
@shiyuan680 shiyuan680 changed the title add moe operation add qwen3-moe operation Jul 12, 2025
@@ -33,3 +49,85 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM):
"experts":
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
}


class AscendQwen3MoeSparseMoeBlock(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue that this PR aims to address?

@shiyuan680 shiyuan680 closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants