Skip to content

Commit 2ee9046

Browse files
authored
Fix e2e data parallel test: add resource release code (#1881)
### What this PR does / why we need it? Fix e2e data parallel test: add resource release code and give more time to engine to pause their processing loops before exiting. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@5895afd Signed-off-by: leo-pony <nengjunma@outlook.com>
1 parent b824525 commit 2ee9046

File tree

2 files changed

+18
-4
lines changed

2 files changed

+18
-4
lines changed

examples/offline_data_parallel.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,19 @@
5656

5757
import os
5858
from time import sleep
59+
import contextlib
60+
import gc
61+
62+
import torch
5963

6064
from vllm import LLM, SamplingParams
6165
from vllm.utils import get_open_port
66+
from vllm.distributed.parallel_state import ( # noqa E402
67+
destroy_distributed_environment, destroy_model_parallel)
6268

6369
os.environ["VLLM_USE_MODELSCOPE"] = "True"
6470
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
6571

66-
6772
def parse_args():
6873
import argparse
6974

@@ -110,6 +115,15 @@ def parse_args():
110115
return parser.parse_args()
111116

112117

118+
def cleanup_env_and_memory():
119+
destroy_model_parallel()
120+
destroy_distributed_environment()
121+
with contextlib.suppress(AssertionError):
122+
torch.distributed.destroy_process_group()
123+
gc.collect()
124+
torch.npu.empty_cache()
125+
torch.npu.reset_peak_memory_stats()
126+
113127
def main(
114128
model,
115129
dp_size,
@@ -185,8 +199,9 @@ def start(rank):
185199
f"Generated text: {generated_text!r}")
186200

187201
# Give engines time to pause their processing loops before exiting.
188-
sleep(1)
189-
202+
sleep(5)
203+
del llm
204+
cleanup_env_and_memory()
190205

191206
if __name__ == "__main__":
192207
args = parse_args()

tests/e2e/multicard/test_data_parallel.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@
3030
MODELS = ["Qwen/Qwen2.5-0.5B-Instruct"]
3131

3232

33-
@pytest.mark.skipif(True, reason="TODO: fix dp timeout error in ci")
3433
@pytest.mark.parametrize("model", MODELS)
3534
@pytest.mark.parametrize("max_tokens", [32])
3635
@patch.dict(os.environ, {"ASCEND_RT_VISIBLE_DEVICES": "0,1"})

0 commit comments

Comments
 (0)