Skip to content

Conversation

shen-shanshan
Copy link
Collaborator

@shen-shanshan shen-shanshan commented Sep 24, 2025

What this PR does / why we need it?

Fix multistream_overlap_shared_expert test.

Main changes:

  1. Use model with shared experts for test.
  2. enable ep.
  3. Move this test to multicard dir.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: shen-shanshan <467638484@qq.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@shen-shanshan shen-shanshan changed the title [CI] Fix multistream_overlap_shared_expert test [CI] Minor fix to multistream_overlap_shared_expert test Sep 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes the multistream_overlap_shared_expert test. The changes involve using a more appropriate model with shared experts, enabling expert parallelism, and setting up the test for a multi-card environment. These modifications are well-aligned with the PR's objective. I have one suggestion to improve the test's maintainability by reducing code duplication, which will make it more robust against future changes.

Comment on lines +53 to +54
data_parallel_size=2,
enable_expert_parallel=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These parameters are duplicated across the three VllmRunner instantiations in this test (see also lines 66-67 and 79-80). This code duplication makes the test harder to maintain and more fragile. If these parameters need to be updated in the future, it's easy to forget to update all instances, which could lead to incorrect test behavior and mask real bugs.

To improve robustness, I recommend refactoring to remove the duplication. You can define a dictionary with the common parameters and reuse it for each VllmRunner call.

Example:

common_args = {
    "model": model,
    "max_model_len": 1024,
    "data_parallel_size": 2,
    "enable_expert_parallel": True,
}

with VllmRunner(
    **common_args,
    enforce_eager=True,
    additional_config={
        "multistream_overlap_shared_expert": True,
    },
) as runner:
    # ...

with VllmRunner(
    **common_args,
    enforce_eager=False,
    additional_config={
        "multistream_overlap_shared_expert": True,
    },
) as runner:
    # ...

with VllmRunner(**common_args, enforce_eager=True) as runner:
    # ...

model,
max_model_len=1024,
enforce_eager=True,
data_parallel_size=2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dp works in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants