Skip to content

SwiftBalancer Zero OverHead Expert Movement #1855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 48 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6682deb
dynamic eplb
wanghanqingLYT Jul 9, 2025
431320f
fix
wanghanqingLYT Jul 17, 2025
abbb39a
fix is not None
YiYang-Eon Jul 17, 2025
8f9c3b7
新算法更新
Jul 21, 2025
cc35a75
dynamic eplb
Jul 21, 2025
b02f517
Partially address review comments
Jul 21, 2025
c66e4ce
simplify eplb policy
wanghanqingLYT Jul 21, 2025
8866b9e
Merge branch 'raindaywhu:whq-v091-new' into whq-v091-new
845473182 Jul 22, 2025
d60b96c
fix is not None
YiYang-Eon Jul 17, 2025
cea6967
Merge remote-tracking branch 'origin/eplb' into eplb
Jul 22, 2025
1868d2c
代码质量看护
Jul 22, 2025
bf708cd
代码质量看护
Jul 22, 2025
e77d7bb
代码质量看护
Jul 22, 2025
48343f3
address review comments
Jul 22, 2025
b3417f6
simplify eplb policy
wanghanqingLYT Jul 21, 2025
3fcc55f
Merge branch 'whq-v091-new' of https://github.com/raindaywhu/vllm-asc…
raindaywhu Jul 22, 2025
a0cd659
fix bugs
845473182 Jul 22, 2025
c4fbab0
add swift balancer doc
raindaywhu Jul 22, 2025
72e22ee
Merge pull request #118 from raindaywhu/cy_0722
raindaywhu Jul 22, 2025
d28bd52
修改self.ascend_config注册顺序
Jul 22, 2025
ee7937e
Merge pull request #117 from 845473182/whq-v091-new
raindaywhu Jul 22, 2025
c453b59
代码质量看护
Jul 22, 2025
5c260bb
代码质量看护
Jul 22, 2025
f3296db
代码质量看护
Jul 22, 2025
86984da
修改self.ascend_config注册顺序
Jul 22, 2025
bf066aa
Merge remote-tracking branch 'origin/eplb' into eplb
Jul 22, 2025
3f5f792
fix is not None
YiYang-Eon Jul 17, 2025
9f15780
代码质量看护修改
Jul 22, 2025
f6e7651
代码质量看护
Jul 22, 2025
b5867fd
代码质量看护
Jul 22, 2025
7690863
代码质量看护
Jul 22, 2025
ea9e7f1
Merge remote-tracking branch 'origin/eplb' into eplb
Jul 22, 2025
90396ad
Merge pull request #116 from sadatama/eplb
raindaywhu Jul 22, 2025
26e483e
修改注册引用错误
Jul 22, 2025
77c31c0
修改注册引用错误
Jul 22, 2025
b8a3096
fix param bug
Jul 22, 2025
056c359
fix param bug
Jul 22, 2025
b0836d1
Merge pull request #119 from sadatama/eplb
raindaywhu Jul 22, 2025
8fd696c
fix import
raindaywhu Jul 22, 2025
e175550
Merge pull request #120 from raindaywhu/cy_0722
raindaywhu Jul 22, 2025
9ea2232
fix lint errors
845473182 Jul 22, 2025
699dc7b
Merge branch 'whq-v091-new' of https://github.com/845473182/vllm-asce…
845473182 Jul 22, 2025
8603f7b
fix bugs for dynamic eplb without assigning expert json
wanghanqingLYT Jul 22, 2025
e6d9dfb
Merge branch 'raindaywhu:whq-v091-new' into whq-v091-new
845473182 Jul 22, 2025
9e51e41
Merge pull request #122 from 845473182/whq-v091-new
raindaywhu Jul 22, 2025
ca34eb8
fix doc title
raindaywhu Jul 22, 2025
53cf807
fix doc title
raindaywhu Jul 22, 2025
56152ee
add eplb_swift_balancer to doc tree
raindaywhu Jul 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions vllm_ascend/ascend_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def __init__(self, vllm_config):
ascend_scheduler_config)

self.expert_map_path = additional_config.get("expert_map_path", None)
self.dynamic_eplb = additional_config.get("dynamic_eplb", False)
Copy link
Collaborator

@wangxiyuan wangxiyuan Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use vllm eplb config enalbe_eplb instead of adding a new config?

self.chunked_prefill_for_mla = additional_config.get(
"chunked_prefill_for_mla", False)
self.enable_weight_nz_layout = additional_config.get(
Expand Down
39 changes: 39 additions & 0 deletions vllm_ascend/eplb/adaptor/abstract_adaptor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#
Copy link
Collaborator

@wangxiyuan wangxiyuan Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_ascend/eplb/__init__.py is missied

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

from abc import ABC, abstractmethod

class EplbAdaptor():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's is this abstract used for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's is this abstract used for?

for SGlang/Vllm abstract


def __init__(self, **args):
pass

@abstractmethod
def get_rank_expert_workload(self, num_moe_layers):
raise NotImplementedError

@abstractmethod
def get_init_expert_map(self):
raise NotImplementedError

@abstractmethod
def do_update_expert_map(self):
raise NotImplementedError

@abstractmethod
def do_update_expert_weight(self):
raise NotImplementedError
209 changes: 209 additions & 0 deletions vllm_ascend/eplb/adaptor/vllm_adaptor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
import os
import json
import torch
import random
import torch.distributed as dist
import numpy as np

from vllm_ascend.eplb.adaptor.abstract_adaptor import EplbAdaptor
from vllm.logger import logger



class VllmEplbAdaptor(EplbAdaptor):

def __init__(self, model, **args):
super().__init__(**args)
self.model = model
self.rank_id = dist.get_rank()
self.world_size = dist.get_world_size()
self.param_dict = dict(self.model.named_parameters())
self.num_dense_layers = self.model.config.first_k_dense_replace
self.num_moe_layers = self.model.config.num_hidden_layers - self.num_dense_layers
self.global_expert_num = self.model.config.n_routed_experts


# TODO: init self.expert_weight_names depending on different model types, only deepseek v3 w8a8 is supported here
self.expert_weight_names = ["w13_weight", "w2_weight", "w13_weight_scale", "w13_weight_offset",
"w2_weight_scale", "w2_weight_offset"]

self.expert_map_per_layer = dict() # reference to expert map on device for expert map update
self.expert_map_per_layer_cpu = dict() # copy of expert map on CPU to avoid device synchronize frequently
for layer_idx in range(self.num_moe_layers):
self.expert_map_per_layer[self.num_dense_layers + layer_idx] =\
self.model.get_expert_map(self.num_dense_layers + layer_idx)

# TODO: here we set number of buffer tensor equal to number of expert in each laryer, which can be improved
num_buffer_tensor = torch.where(self.expert_map_per_layer[self.num_dense_layers] != -1)[0].numel()
self.buffer_tensor_list = [[] for _ in range(num_buffer_tensor)]
self.init_buffer_tensor(num_buffer_tensor)

self.expert_param_per_layer = dict()
self.init_expert_param_per_layer()

self.log2phy_map_per_layer = dict()
for layer_idx in range(self.num_moe_layers):
self.log2phy_map_per_layer[self.num_dense_layers + layer_idx] =\
self.model.get_log2phy_map(self.num_dense_layers + layer_idx)

self.all_topk_ids = []

def init_buffer_tensor(self, num_buffer_tensor):
for name in self.expert_weight_names:
complete_name = "model.layers." + str(self.num_dense_layers) + ".mlp.experts." + name
expert_tensor = self.param_dict[complete_name].data[0:num_buffer_tensor]
buffer_tensors = torch.empty_like(expert_tensor)
for buffer_id in range(num_buffer_tensor):
self.buffer_tensor_list[buffer_id].append(buffer_tensors[buffer_id])

def init_expert_param_per_layer(self):
num_local_expert = self.param_dict["model.layers." + str(self.num_dense_layers) +\
".mlp.experts." + self.expert_weight_names[0]].data.shape[0]
for moe_layer_id in range(self.num_moe_layers):
layer_idx = self.num_dense_layers + moe_layer_id
self.expert_param_per_layer[layer_idx] = list()
for local_expert_id in range(num_local_expert):
self.expert_param_per_layer[layer_idx].append(
[self.param_dict["model.layers." + str(layer_idx) + ".mlp.experts." + name].data[local_expert_id]
for name in self.expert_weight_names]
)

# def collect_topk_ids(self, dummy_run=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# if dummy_run:
# return
# self.all_topk_ids.append(self.model.get_all_topk_ids(self.num_moe_layers))

def get_rank_expert_workload(self) -> torch.Tensor:
self.moe_load = self.model.get_all_moe_loads()
return self.moe_load

def get_init_expert_map(self, num_moe_layers):
expert_map = self.model.get_all_expert_map(num_moe_layers)
if dist.is_initialized():
world_size = dist.get_world_size()
rank = dist.get_rank()

gathered = torch.empty((world_size, *expert_map.shape), # [W, L, E]
dtype=expert_map.dtype,
device=expert_map.device)

dist.all_gather_into_tensor(gathered, expert_map)
all_maps = gathered.permute(1, 0, 2)
all_expert_maps = all_maps.cpu()

for layer_idx in range(num_moe_layers):
self.expert_map_per_layer_cpu[self.num_dense_layers + layer_idx] = \
all_expert_maps[layer_idx][self.rank_id]

return all_expert_maps

def get_init_expert_map_from_file(self, num_moe_layers, expert_map_path):

try:
expert_map_tensor, layers_num, ranks_num = self._expert_file_to_tensor(expert_map_path)
expert_map_all = self.local2global(expert_map_tensor)
except (TypeError, FileNotFoundError, OSError):
expert_map_all = self.determine_expert_map_all()

for layer_idx in range(num_moe_layers):
self.expert_map_per_layer_cpu[layer_idx+3] = \
expert_map_all[layer_idx][self.rank_id]
return expert_map_all

def _expert_file_to_tensor(self, expert_map_path: str):
with open(expert_map_path, "r") as f:
data = json.load(f)
layers_num = data["moe_layer_count"]
gpus_num = data["layer_list"][0]["device_count"]

tensor_data = []
for layer in data["layer_list"]:
device_data = []
for device in layer["device_list"]:
device_data.append(device["device_expert"])
tensor_data.append(device_data)
expert_map_tensor = torch.tensor(tensor_data, dtype=torch.int32)
return expert_map_tensor, layers_num, gpus_num
logger.error(f"failed to read expert_map_path: {expert_map_path}")

def do_update_expert_map(self, layer_id, updated_expert_map):
self.expert_map_per_layer[layer_id].copy_(updated_expert_map)
self.expert_map_per_layer_cpu[layer_id].copy_(updated_expert_map)

def do_update_expert_weight(self, layer_id, local_expert_to_replace, buffer_tensor_id):
for expert_tensor, buffer_tensor in zip(
self.expert_param_per_layer[layer_id][local_expert_to_replace],
self.buffer_tensor_list[buffer_tensor_id]
):
expert_tensor.copy_(buffer_tensor)

def do_update_log2phy_map(self, layer_id, updated_log2phy_map):
if self.log2phy_map_per_layer[layer_id] is not None:
self.log2phy_map_per_layer[layer_id].copy_(updated_log2phy_map)

def local2global(self,
placement_local: torch.Tensor
) -> torch.Tensor:

L, G, E_local = placement_local.shape
device = placement_local.device

max_id = torch.max(placement_local)
E_global = (max_id + 1).item() if max_id >= 0 else 0

if E_global == 0:
return torch.empty((L, G, 0), dtype=torch.long, device=device)

placement_global = torch.full((L, G, E_global),
fill_value=-1,
dtype=torch.long,
device=device)

valid = placement_local >= 0
l_idx, g_idx, slot_idx = valid.nonzero(as_tuple=True)
gid_idx = placement_local[l_idx, g_idx, slot_idx]

placement_global[l_idx, g_idx, gid_idx] = slot_idx

return placement_global

def determine_expert_map_all(self):

local_num_experts = self.global_expert_num // self.world_size

expert_map_all = torch.full(
(self.num_moe_layers, self.world_size, self.global_expert_num),
-1,
dtype=torch.int32
)

for r in range(self.world_size):
if r < self.world_size - 1:
start = r * local_num_experts
end = (r + 1) * local_num_experts
local_count = local_num_experts
else:
start = r * local_num_experts
end = self.global_expert_num
local_count = self.global_expert_num - r * local_num_experts

local_ids = torch.arange(local_count, dtype=torch.int32)
expert_map_all[:, r, start:end] = local_ids.unsqueeze(0).expand(self.num_moe_layers, -1)

return expert_map_all
24 changes: 24 additions & 0 deletions vllm_ascend/eplb/core/loader/abstract_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

from abc import ABC, abstractmethod

class ExpertWeightLoader:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, seem the abstract is useless

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


@abstractmethod
def load_impl(self, old_expert_table, new_expert_table):
raise NotImplementedError
Loading
Loading