Skip to content

Conversation

@Angazenn
Copy link
Collaborator

@Angazenn Angazenn commented Oct 13, 2025

What this PR does / why we need it?

This if an extension based on #3411. Hardware-specific restriction for mc2 is added. Besides, I also limit the scenarios where this check take effects to only moe models that might uses mc2.

Does this PR introduce any user-facing change?

For situations that mc2 is not supported with excessive input tokens, we throws an error earlier in vLLM-Ascend.

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable hardware-specific check for the MC2 communication method in MoE models, preventing potential runtime issues by enforcing token limits on different Ascend hardware versions. The logic for calculating token capacity and raising an informative error is sound. The change to restrict a related profiling run to only MoE models is also appropriate. I have one suggestion to improve the maintainability of the hardware limit implementation.

Comment on lines 480 to 484
limit = None
if soc_version in {AscendSocVersion.A3}:
limit = 512
elif soc_version in {AscendSocVersion.A2}:
limit = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of an if/elif chain to determine the token limit for different SoC versions is functional but can be improved for better maintainability and extensibility. As more hardware variants are supported in the future, this chain will grow, increasing complexity and the chance of errors. A dictionary mapping SoC versions to their limits would be a cleaner and more scalable approach.

Suggested change
limit = None
if soc_version in {AscendSocVersion.A3}:
limit = 512
elif soc_version in {AscendSocVersion.A2}:
limit = 256
limit_map = {
AscendSocVersion.A3: 512,
AscendSocVersion.A2: 256,
}
limit = limit_map.get(soc_version)

limit = 256

if limit is not None and num_tokens_per_tp_rank > limit:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in this scenario, it would be better to fall back to other moe methods that support current token size (such as all2all) rather than just reporting an error? I think it is weired to force users to use smaller max_num_seqs due to kernel restriction.

Copy link
Collaborator Author

@Angazenn Angazenn Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can make a better selection on this. But the problem is that aclgraph mode requires a unified communication schema. So we need to ensure that each communication ops in a model keep unchanged with respect to different capture sizes. cc @yiz-liu

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

self.mc2_tokens_capacity,
with_prefill=True) == MoECommType.MC2:
self._dummy_run(self.mc2_tokens_capacity, with_prefill=True)
if self._is_moe_model():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may replace this with model_config.get_num_experts()

kv_connector_output=kv_connector_output,
)

def _is_moe_model(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use vllm_ascend.utils.is_moe_model

# Since not all models have moe modules and requires mc2, we leave
# the initialization of mc2 related parameters later.
self.mc2_tokens_capacity = 0
self.reserved_mc2_mask = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice change

if self._is_moe_model():
self._initialize_mc2()
if self.max_num_tokens > self.mc2_tokens_capacity and \
self._select_moe_comm_method(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_select_moe_comm_method has been called in _initialize_mc2 and is called in _dummy_run . You should consider to merge them into one.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Oct 21, 2025
Comment on lines 530 to 533

def _init_mc2_tokens_capacity(self):
def _init_mc2(self):
"""Initialization of MC2-related parameters and verify the validity."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change _init_mc2_tokens_capacity in torchair_model_runner.py as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will revert this change.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot removed the ready read for review label Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants