Skip to content

Commit 41c5ad8

Browse files
author
offline0806
committed
[EPLB]Fix eplbReformat eplb doc.
Signed-off-by: offline0806 <z00858301@china.huawei.com>
1 parent 20b3b85 commit 41c5ad8

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source/user_guide/feature_guide/eplb_swift_balancer.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# Swift Balancer
22

33
## Overview
4-
Experts rebalancing of MoE models for LLM serving is a mandatory option.Changing experts dynamically would have a negative impact on TTFT and TPOT while stop-the-world.
4+
Experts rebalancing of MoE models for LLM serving is a mandatory option.Changing experts dynamically would have a negative impact on TTFT and TPOT while stop-the-world.
55
Asynchronously expert load balancing would be a better choice.
66
We have launched SwiftBalancer to support dynamic experts load balancing with Zero-overhead experts movement.
77

8-
## Design
8+
## Design
99

1010
![img.png](images/eplb_img.png)
1111

1212
The overall workflow involves:
13-
1. Record experts distribution during forward. We using expert_token_num after disptach instead of topk_ids, thus we got much smaller tensor shape to reduce cost of hbm
13+
1. Record experts distribution during forward. We using expert_token_num after dispatch instead of topk_ids, thus we got much smaller tensor shape to reduce cost of hbm
1414
recording and add-operator.
1515
2. Do all-gather for experts distribution. Using all-gather instead of all-reduce as less traffic volume.
1616
3. Wake up eplb worker process with experts distribution when num_iterations comes. Run eplb algorithm in eplb worker.

0 commit comments

Comments
 (0)