-
Notifications
You must be signed in to change notification settings - Fork 462
[BugFix]Revise the EPLB feature guide content and optimize the EPLB algorithm. #3080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
# Conflicts: # vllm_ascend/worker/model_runner_v1.py
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
ec when usingglobal_redundant_expert_num. Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
# Conflicts: # docs/source/user_guide/configuration/additional_config.md # docs/source/user_guide/feature_guide/eplb_swift_balancer.md # vllm_ascend/eplb/core/eplb_utils.py # vllm_ascend/eplb/core/eplb_worker.py # vllm_ascend/ops/common_fused_moe.py # vllm_ascend/quantization/w8a8_dynamic.py
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several improvements to the EPLB (Expert Parallel Load Balancing) feature. It updates the documentation with new configuration parameters and provides clearer examples. A key change is the optimization of the generate_log2phy_map
function, which is now vectorized for better performance. A correctness fix is also included in the EPLB worker. My review identifies a critical issue in one of the new documentation examples where the provided JSON is invalid, which could lead to user errors.
vllm serve Qwen/Qwen3-235B-A22 \ | ||
--tensor-parallel-size 16 \ | ||
--enable-expert-parallel \ | ||
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON string in this example is invalid. There is a missing double quote "
before dynamic_eplb
. This will cause a parsing error for users who copy and paste this command.
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}' | |
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, "dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}' |
What this PR does / why we need it?
1.Revise the EPLB feature guide content.Add eplb params to ascend config.
2.Optimize the EPLB algorithm.
Does this PR introduce any user-facing change?
How was this patch tested?
We run vllm online serving with quantized qwen3_235b.
vLLM version: v0.10.2
vLLM main: vllm-project/vllm@c60e613