-
Notifications
You must be signed in to change notification settings - Fork 484
[CI] Minor fix to multistream_overlap_shared_expert
test
#3157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: shen-shanshan <467638484@qq.com>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
multistream_overlap_shared_expert
testmultistream_overlap_shared_expert
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes the multistream_overlap_shared_expert
test. The changes involve using a more appropriate model with shared experts, enabling expert parallelism, and setting up the test for a multi-card environment. These modifications are well-aligned with the PR's objective. I have one suggestion to improve the test's maintainability by reducing code duplication, which will make it more robust against future changes.
data_parallel_size=2, | ||
enable_expert_parallel=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters are duplicated across the three VllmRunner
instantiations in this test (see also lines 66-67 and 79-80). This code duplication makes the test harder to maintain and more fragile. If these parameters need to be updated in the future, it's easy to forget to update all instances, which could lead to incorrect test behavior and mask real bugs.
To improve robustness, I recommend refactoring to remove the duplication. You can define a dictionary with the common parameters and reuse it for each VllmRunner
call.
Example:
common_args = {
"model": model,
"max_model_len": 1024,
"data_parallel_size": 2,
"enable_expert_parallel": True,
}
with VllmRunner(
**common_args,
enforce_eager=True,
additional_config={
"multistream_overlap_shared_expert": True,
},
) as runner:
# ...
with VllmRunner(
**common_args,
enforce_eager=False,
additional_config={
"multistream_overlap_shared_expert": True,
},
) as runner:
# ...
with VllmRunner(**common_args, enforce_eager=True) as runner:
# ...
model, | ||
max_model_len=1024, | ||
enforce_eager=True, | ||
data_parallel_size=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dp works in this case?
What this PR does / why we need it?
Fix
multistream_overlap_shared_expert
test.Main changes:
Does this PR introduce any user-facing change?
How was this patch tested?