[Bugfix] fix sleepmode level2 e2e test #4019

wangx700 · 2025-11-06T03:20:19Z

What this PR does / why we need it?

enable sleepmode level2 e2e test and add the check logic to ensure the nz is not enabled.

Does this PR introduce any user-facing change?

no

How was this patch tested?

use e2e tests

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

gemini-code-assist

Code Review

This pull request fixes a failing E2E test for sleep_mode_level=2 by disabling the VLLM_ASCEND_ENABLE_NZ optimization for that specific test. While this makes the test pass, it appears to be a workaround for an underlying incompatibility between sleep mode level 2's weight reloading mechanism and the FRACTAL_NZ weight format. My review points out this issue and suggests that the root cause should be addressed to ensure full test coverage for this feature combination. I have also recommended adding a code comment to explain the workaround.

gemini-code-assist · 2025-11-06T03:21:12Z

tests/e2e/multicard/test_external_launcher.py

    assert proc.returncode == 0


+@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_NZ": "0"})


Disabling VLLM_ASCEND_ENABLE_NZ makes this test pass, but it appears to be a workaround for a deeper issue. This change effectively prevents testing the sleep_mode_level=2 functionality when the FRACTAL_NZ weight format optimization is enabled, reducing test coverage.

The root cause seems to be that when sleep_mode_level=2 is used, the test manually reloads model weights from disk (see examples/offline_external_launcher.py, line 250). This reloading process does not account for the FRACTAL_NZ format that weights are converted to when VLLM_ASCEND_ENABLE_NZ=1. Consequently, the model receives weights in an unexpected format, leading to failure.

While this fix may be acceptable to unblock CI, the underlying incompatibility should be addressed. Possible long-term solutions include:

Updating the weight reloading logic in the test to correctly handle the FRACTAL_NZ format.

If sleep_mode_level=2 is fundamentally incompatible with VLLM_ASCEND_ENABLE_NZ, this should be documented and ideally enforced with a runtime check.

As an immediate step, please add a code comment above this decorator to explain why this environment variable is being disabled for this specific test. This will improve code clarity and help future developers understand the context.

github-actions · 2025-11-06T03:25:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

leo-pony · 2025-11-06T03:41:30Z

It look goods to me.
This is a test case for the RL scenario. In the RL scenario, NZ is not used, so there is no issue with the code. Here, we just need to ensure that the RL scenario is properly handled.

Signed-off-by: wangx700 <wangxin700@huawei.com>

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

github-actions bot added the module:tests label Nov 6, 2025

wangx700 force-pushed the fix_e2e branch from d843f29 to e734bb4 Compare November 6, 2025 03:49

leo-pony added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Nov 6, 2025

wangx700 force-pushed the fix_e2e branch from e734bb4 to e68e7ae Compare November 6, 2025 08:14

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Nov 6, 2025

wangx700 force-pushed the fix_e2e branch 4 times, most recently from cd244da to 46f1b64 Compare November 7, 2025 02:38

fix sleepmode level2 e2e test

c47576c

Signed-off-by: wangx700 <wangxin700@huawei.com>

wangx700 force-pushed the fix_e2e branch from 46f1b64 to c47576c Compare November 7, 2025 02:40

weijinqian0 approved these changes Nov 8, 2025

View reviewed changes

weijinqian0 merged commit 24d6314 into vllm-project:main Nov 8, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] fix sleepmode level2 e2e test #4019

[Bugfix] fix sleepmode level2 e2e test #4019

wangx700 commented Nov 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

leo-pony commented Nov 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		assert proc.returncode == 0


		@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_NZ": "0"})

[Bugfix] fix sleepmode level2 e2e test #4019

[Bugfix] fix sleepmode level2 e2e test #4019

Conversation

wangx700 commented Nov 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

leo-pony commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangx700 commented Nov 6, 2025 •

edited by github-actions bot

Loading

leo-pony commented Nov 6, 2025 •

edited

Loading