Skip to content

Conversation

offline893
Copy link
Contributor

@offline893 offline893 commented Sep 22, 2025

What this PR does / why we need it?

1.Revise the EPLB feature guide content.Add eplb params to ascend config.
2.Optimize the EPLB algorithm.

Does this PR introduce any user-facing change?

How was this patch tested?

We run vllm online serving with quantized qwen3_235b.
vLLM version: v0.10.2
vLLM main: vllm-project/vllm@c60e613

offline0806 added 30 commits September 16, 2025 14:46
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
# Conflicts:
#	vllm_ascend/worker/model_runner_v1.py
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
ec when usingglobal_redundant_expert_num.
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
mercykid and others added 5 commits September 19, 2025 10:01
Signed-off-by: offline0806 <z00858301@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
# Conflicts:
#	docs/source/user_guide/configuration/additional_config.md
#	docs/source/user_guide/feature_guide/eplb_swift_balancer.md
#	vllm_ascend/eplb/core/eplb_utils.py
#	vllm_ascend/eplb/core/eplb_worker.py
#	vllm_ascend/ops/common_fused_moe.py
#	vllm_ascend/quantization/w8a8_dynamic.py
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added documentation Improvements or additions to documentation module:ops module:quantization labels Sep 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements to the EPLB (Expert Parallel Load Balancing) feature. It updates the documentation with new configuration parameters and provides clearer examples. A key change is the optimization of the generate_log2phy_map function, which is now vectorized for better performance. A correctness fix is also included in the EPLB worker. My review identifies a critical issue in one of the new documentation examples where the provided JSON is invalid, which could lead to user errors.

vllm serve Qwen/Qwen3-235B-A22 \
--tensor-parallel-size 16 \
--enable-expert-parallel \
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The JSON string in this example is invalid. There is a missing double quote " before dynamic_eplb. This will cause a parsing error for users who copy and paste this command.

Suggested change
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}'
--additional-config '{ "expert_map_record_path": "/path/to/eplb.json", "init_redundancy_expert": 16, "dynamic_eplb":true,"num_iterations_eplb_update":400, "gate_eplb":true, "num_wait_worker_iterations":30}'

@offline893 offline893 closed this Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:ops module:quantization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants