Skip to content

Conversation

yifeilin1015
Copy link

@yifeilin1015 yifeilin1015 commented Sep 19, 2025

###What this PR does / why we need it?
Support w8a8_dynamic for Step3 model

###Does this PR introduce any user-facing change?
No user-facing change

###How was this patch tested?
This patch is tested with following datasets:
MMMU: 73.11, MMSTAR: 70.68, AIME25: 80.00, GPQA-diamond: 72.73

###Modify points:

  • add step3_text.py file with new load_weights method
  • load_weights method only change the way of loading MoE experts compared to the original method

vLLM version: v0.10.2

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Step3 model with W8A8 dynamic quantization. The changes include registering the new model and implementing a custom load_weights method to handle the specific weight structure of MoE experts in this model. My main feedback is to avoid hardcoding the number of experts to improve the model's reusability.

Comment on lines +10 to +31
experts_ = [f"experts.{i}.{proj}" for i in range(48) for proj in ("down_proj", "gate_proj", "up_proj")]

packed_modules_mapping = {
"qkv_proj": [
"q_proj",
"k_proj",
"v_proj",
],
"gate_up_proj":[
"gate_proj",
"up_proj",
],
"experts": experts_
}

def __init__(
self,
*,
vllm_config: VllmConfig,
prefix: str = "",
):
super().__init__(vllm_config=vllm_config, prefix="model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The number of experts is hardcoded to 48, which limits this implementation to a specific model configuration. To support Step3Text models with a varying number of experts, experts_ and packed_modules_mapping should be initialized as instance attributes within __init__. This allows dynamically setting the number of experts from the model configuration (self.model.config.moe_num_experts).

    def __init__(
        self,
        *,
        vllm_config: VllmConfig,
        prefix: str = "",
    ):
        super().__init__(vllm_config=vllm_config, prefix="model")
        num_experts = self.model.config.moe_num_experts
        self.experts_ = [f"experts.{i}.{proj}" for i in range(num_experts) for proj in ("down_proj", "gate_proj", "up_proj")]
        self.packed_modules_mapping = {
            "qkv_proj": [
                "q_proj",
                "k_proj",
                "v_proj",
            ],
            "gate_up_proj": [
                "gate_proj",
                "up_proj",
            ],
            "experts": self.experts_,
        }

class CustomStep3TextForCausalLM(Step3TextForCausalLM):
experts_ = [f"experts.{i}.{proj}" for i in range(48) for proj in ("down_proj", "gate_proj", "up_proj")]

packed_modules_mapping = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants