Skip to content

Conversation

wangxiyuan
Copy link
Collaborator

@wangxiyuan wangxiyuan commented Aug 8, 2025

Refactor E2E CI to make it clear and faster

  1. remove some uesless e2e test
  2. remove some uesless function
  3. Make sure all test runs with VLLMRunner to avoid oom error
  4. Make sure all ops test end with torch.empty_cache to avoid oom error
  5. run the test one by one to avoid resource limit error

Copy link

github-actions bot commented Aug 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

github-actions bot commented Aug 8, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added documentation Improvements or additions to documentation module:tests labels Aug 8, 2025
@github-actions github-actions bot removed merge-conflicts documentation Improvements or additions to documentation labels Aug 8, 2025
@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 2 times, most recently from 1ff0e9e to 46d8efa Compare August 8, 2025 08:15
@wangxiyuan wangxiyuan changed the title [CI] Refactor e2e CI [CI] [1/2] Refactor e2e CI - singlecard Aug 8, 2025
Copy link

codecov bot commented Aug 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.49%. Comparing base (0df059f) to head (95b11ff).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2276   +/-   ##
=======================================
  Coverage   73.49%   73.49%           
=======================================
  Files         151      151           
  Lines       21927    21927           
=======================================
  Hits        16116    16116           
  Misses       5811     5811           
Flag Coverage Δ
unittests 73.49% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines -79 to -94
def test_deepseek_raises_error(monkeypatch: pytest.MonkeyPatch) -> None:
with monkeypatch.context() as m:
m.setenv("VLLM_USE_MODELSCOPE", "True")
with pytest.raises(NotImplementedError) as excinfo:
VllmRunner("deepseek-ai/DeepSeek-V2-Lite-Chat",
max_model_len=1024,
enforce_eager=False)
assert "ACL Graph does not support deepseek" in str(excinfo.value)


@pytest.mark.parametrize("model", MODELS)
def test_ray_backend_sets_no_compilation(model: str) -> None:
runner = VllmRunner(model,
enforce_eager=False,
distributed_executor_backend="ray")
assert runner.model.llm_engine.vllm_config.compilation_config.level == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to move to ut

temperature=0.0,
)

vllm_model = LLM(model, long_prefill_token_threshold=4, enforce_eager=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems need to keep: #1172

assert torch.all(tensor == pynccl_comm.world_size).cpu().item()


def test_pyhccl():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to keep pyhccl e2e test if no more time cost, we can remove this after pyhccl removed

@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 5 times, most recently from be01f15 to 0e2b140 Compare August 10, 2025 11:02
@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 3 times, most recently from b6a3d08 to 0a8fe5f Compare August 10, 2025 13:09
@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 6 times, most recently from de360ff to 6907a94 Compare August 11, 2025 01:13
@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 4 times, most recently from e523103 to 65767d7 Compare August 29, 2025 08:57
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zzhx1
Copy link
Contributor

zzhx1 commented Sep 1, 2025

It seems that the error in this e2e, #2675 can be resolved.

@wangxiyuan wangxiyuan force-pushed the refactor_e2e branch 4 times, most recently from ba08c9e to 3045a1a Compare September 1, 2025 22:53
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
@wangxiyuan
Copy link
Collaborator Author

CI is blocked for quite long time. Let's merge this first and then recover the failure test asap.

@wangxiyuan wangxiyuan merged commit fef18b6 into vllm-project:main Sep 2, 2025
11 checks passed
zhangxinyuehfad pushed a commit to zhangxinyuehfad/vllm-ascend that referenced this pull request Sep 2, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
zhangxinyuehfad pushed a commit to zhangxinyuehfad/vllm-ascend that referenced this pull request Sep 2, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
zhangxinyuehfad pushed a commit to zhangxinyuehfad/vllm-ascend that referenced this pull request Sep 2, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
zhangxinyuehfad pushed a commit to zhangxinyuehfad/vllm-ascend that referenced this pull request Sep 2, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
@wangxiyuan wangxiyuan deleted the refactor_e2e branch September 4, 2025 06:52
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@a344a5a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants