You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user_guide/feature_guide/eplb_swift_balancer.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,16 @@
1
1
# Swift Balancer
2
2
3
3
## Overview
4
-
Experts rebalancing of MoE models for LLM serving is a mandatory option.Changing experts dynamically would have a negative impact on TTFT and TPOT while stop-the-world.
4
+
Experts rebalancing of MoE models for LLM serving is a mandatory option.Changing experts dynamically would have a negative impact on TTFT and TPOT while stop-the-world.
5
5
Asynchronously expert load balancing would be a better choice.
6
6
We have launched SwiftBalancer to support dynamic experts load balancing with Zero-overhead experts movement.
7
7
8
-
## Design
8
+
## Design
9
9
10
10

11
11
12
12
The overall workflow involves:
13
-
1. Record experts distribution during forward. We using expert_token_num after disptach instead of topk_ids, thus we got much smaller tensor shape to reduce cost of hbm
13
+
1. Record experts distribution during forward. We using expert_token_num after dispatch instead of topk_ids, thus we got much smaller tensor shape to reduce cost of hbm
14
14
recording and add-operator.
15
15
2. Do all-gather for experts distribution. Using all-gather instead of all-reduce as less traffic volume.
16
16
3. Wake up eplb worker process with experts distribution when num_iterations comes. Run eplb algorithm in eplb worker.
0 commit comments