Skip to content

Commit 07f4710

Browse files
[BugFix] Fix dummy_run memory explosion in eager mode (#3132)
### What this PR does / why we need it? It is a quick bugfix for the memory explosion issue that requires further refactoring. The dummy_run in eager mode may lead to OOM and the reason is that `hidden_states` were not released in time. The PR temporarily resolves the issue by manually clearing the cache, and further refactoring will be conducted subsequently. Before the modification, the dummy_run's memory showed an accumulation issue. <img width="1796" height="207" alt="image" src="https://github.com/user-attachments/assets/05e2b04c-2f99-4085-9eda-c78b7d9a57b0" /> After modification, it can be observed that the memory is released promptly. And it was verified that the model responded normally after a single data input. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@b106890 --------- Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
1 parent 72f64c1 commit 07f4710

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

vllm_ascend/ops/moe/fused_moe_prepare_and_finalize.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,11 @@ def finalize(self, hidden_states: torch.Tensor,
183183
self.moe_config.tp_group.device_group)
184184
hidden_states = torch.cat(self.split_hidden_states, dim=0)
185185

186+
# TODO: It is a quick bugfix for the memory explosion issue in eager mode.
187+
# If the cache is not cleared after `self.split_hidden_states` is created,
188+
# it can lead to the memory explosion in eager mode.
189+
del self.split_hidden_states
190+
186191
# Unpad if necessary
187192
if self.num_tokens < hidden_states.shape[0]:
188193
hidden_states = hidden_states[:self.num_tokens]
@@ -267,6 +272,11 @@ def finalize(self, hidden_states: torch.Tensor,
267272
self.moe_config.tp_group.device_group)
268273
hidden_states = torch.cat(self.split_hidden_states, dim=0)
269274

275+
# TODO: It is a quick bugfix for the memory explosion issue in eager mode.
276+
# If the cache is not cleared after `self.split_hidden_states` is created,
277+
# it can lead to the memory explosion in eager mode.
278+
del self.split_hidden_states
279+
270280
if self.num_tokens < hidden_states.shape[0]:
271281
hidden_states = hidden_states[:self.num_tokens]
272282

0 commit comments

Comments
 (0)