[BugFix] Add addtional check for mc2 on different hardwares. #3429

Angazenn · 2025-10-13T14:20:12Z

What this PR does / why we need it?

This if an extension based on #3411. Hardware-specific restriction for mc2 is added. Besides, I also limit the scenarios where this check take effects to only moe models that might uses mc2.

Does this PR introduce any user-facing change?

For situations that mc2 is not supported with excessive input tokens, we throws an error earlier in vLLM-Ascend.

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@17c540a

github-actions · 2025-10-13T14:20:21Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a valuable hardware-specific check for the MC2 communication method in MoE models, preventing potential runtime issues by enforcing token limits on different Ascend hardware versions. The logic for calculating token capacity and raising an informative error is sound. The change to restrict a related profiling run to only MoE models is also appropriate. I have one suggestion to improve the maintainability of the hardware limit implementation.

gemini-code-assist · 2025-10-13T14:21:00Z

vllm_ascend/worker/model_runner_v1.py

+            limit = None
+            if soc_version in {AscendSocVersion.A3}:
+                limit = 512
+            elif soc_version in {AscendSocVersion.A2}:
+                limit = 256


The use of an if/elif chain to determine the token limit for different SoC versions is functional but can be improved for better maintainability and extensibility. As more hardware variants are supported in the future, this chain will grow, increasing complexity and the chance of errors. A dictionary mapping SoC versions to their limits would be a cleaner and more scalable approach.

Suggested change

limit = None

if soc_version in {AscendSocVersion.A3}:

limit = 512

elif soc_version in {AscendSocVersion.A2}:

limit = 256

limit_map = {

AscendSocVersion.A3: 512,

AscendSocVersion.A2: 256,

}

limit = limit_map.get(soc_version)

whx-sjtu · 2025-10-14T02:16:52Z

vllm_ascend/worker/model_runner_v1.py

+                limit = 256
+
+            if limit is not None and num_tokens_per_tp_rank > limit:
+                raise ValueError(


Perhaps in this scenario, it would be better to fall back to other moe methods that support current token size (such as all2all) rather than just reporting an error? I think it is weired to force users to use smaller max_num_seqs due to kernel restriction.

It seems that we can make a better selection on this. But the problem is that aclgraph mode requires a unified communication schema. So we need to ensure that each communication ops in a model keep unchanged with respect to different capture sizes. cc @yiz-liu

github-actions · 2025-10-14T03:00:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

weijinqian0 · 2025-10-14T09:04:49Z

vllm_ascend/worker/model_runner_v1.py

-                    self.mc2_tokens_capacity,
-                    with_prefill=True) == MoECommType.MC2:
-                self._dummy_run(self.mc2_tokens_capacity, with_prefill=True)
+            if self._is_moe_model():


may replace this with model_config.get_num_experts()

wangxiyuan · 2025-10-14T09:03:15Z

vllm_ascend/worker/model_runner_v1.py

            kv_connector_output=kv_connector_output,
        )

+    def _is_moe_model(self):


use vllm_ascend.utils.is_moe_model

wangxiyuan · 2025-10-14T09:04:58Z

vllm_ascend/worker/model_runner_v1.py

+        # Since not all models have moe modules and requires mc2, we leave
+        # the initialization of mc2 related parameters later.
+        self.mc2_tokens_capacity = 0
+        self.reserved_mc2_mask = None


nice change

wangxiyuan · 2025-10-14T09:07:36Z

vllm_ascend/worker/model_runner_v1.py

+            if self._is_moe_model():
+                self._initialize_mc2()
+                if self.max_num_tokens > self.mc2_tokens_capacity and \
+                    self._select_moe_comm_method(


_select_moe_comm_method has been called in _initialize_mc2 and is called in _dummy_run . You should consider to merge them into one.

github-actions · 2025-10-15T06:23:29Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

JC-ut0 · 2025-10-21T09:12:50Z

vllm_ascend/worker/model_runner_v1.py


-    def _init_mc2_tokens_capacity(self):
+    def _init_mc2(self):
+        """Initialization of MC2-related parameters and verify the validity."""
+


Need to change _init_mc2_tokens_capacity in torchair_model_runner.py as well.

sure, I will revert this change.

github-actions · 2025-10-24T02:35:18Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Angazenn <supperccell@163.com>

github-actions · 2025-10-29T08:01:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

whx-sjtu reviewed Oct 14, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Oct 14, 2025

Angazenn force-pushed the mc2 branch from 8fb714e to 6363ba7 Compare October 14, 2025 03:14

github-actions bot removed the merge-conflicts label Oct 14, 2025

weijinqian0 reviewed Oct 14, 2025

View reviewed changes

wangxiyuan reviewed Oct 14, 2025

View reviewed changes

Angazenn force-pushed the mc2 branch from 0ab0103 to 8a962d3 Compare October 15, 2025 03:20

github-actions bot added the merge-conflicts label Oct 15, 2025

Angazenn force-pushed the mc2 branch from 8a962d3 to f74b5d4 Compare October 21, 2025 06:53

github-actions bot added documentation Improvements or additions to documentation ci/build module:tests module:ops module:core module:quantization module:tools labels Oct 21, 2025

Angazenn force-pushed the mc2 branch from f74b5d4 to 9a1b012 Compare October 21, 2025 06:59

github-actions bot removed documentation Improvements or additions to documentation ci/build module:tests module:ops module:core module:quantization module:tools merge-conflicts labels Oct 21, 2025

wangxiyuan approved these changes Oct 21, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Oct 21, 2025

JC-ut0 reviewed Oct 21, 2025

View reviewed changes

Angazenn mentioned this pull request Oct 21, 2025

[Bugfix]quick fix of mc2 #3593

Open

github-actions bot added the merge-conflicts label Oct 24, 2025

Angazenn added 7 commits October 25, 2025 09:26

refine init logic for mc2

5bc2cc0

Signed-off-by: Angazenn <supperccell@163.com>

fix comments

379a772

Signed-off-by: Angazenn <supperccell@163.com>

compatibile with torchair

f5cf2d9

Signed-off-by: Angazenn <supperccell@163.com>

fix

4ba9613

Signed-off-by: Angazenn <supperccell@163.com>

fix

233f1eb

Signed-off-by: Angazenn <supperccell@163.com>

fix

37b3163

Signed-off-by: Angazenn <supperccell@163.com>

fix lint

dbc10e8

Signed-off-by: Angazenn <supperccell@163.com>

Angazenn force-pushed the mc2 branch from 980bcb2 to dbc10e8 Compare October 25, 2025 01:26

github-actions bot added merge-conflicts and removed merge-conflicts labels Oct 25, 2025

github-actions bot removed the ready read for review label Oct 30, 2025

[BugFix] Add addtional check for mc2 on different hardwares. #3429

Are you sure you want to change the base?

[BugFix] Add addtional check for mc2 on different hardwares. #3429

Uh oh!

Conversation

Angazenn commented Oct 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

weijinqian0 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

JC-ut0 Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Angazenn commented Oct 13, 2025 •

edited by github-actions bot

Loading

Angazenn Oct 14, 2025 •

edited

Loading