Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions tests/e2e/vllm_interface/singlecard/test_sampler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
# Adapted from vllm/tests/entrypoints/llm/test_guided_generate.py
# Copyright 2023 The vLLM team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from vllm import SamplingParams

from tests.e2e.conftest import VllmRunner


def test_models_topk() -> None:
example_prompts = [
"The capital of France is",
]
sampling_params = SamplingParams(max_tokens=10,
temperature=0.0,
top_k=10,
top_p=0.9)

with VllmRunner("Qwen/Qwen3-0.6B",
max_model_len=4096,
gpu_memory_utilization=0.7) as runner:
runner.generate(example_prompts, sampling_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test case currently lacks assertions to verify the output of the runner.generate call. A test without assertions can only catch crashes but won't detect incorrect behavior or regressions in the model's output. Please add assertions to validate the generated text.

Also, note that with temperature=0.0, the sampling is greedy, and the top_k and top_p parameters will have no effect. If the goal is to test top_k/top_p functionality, you should use a temperature > 0.

Suggested change
runner.generate(example_prompts, sampling_params)
outputs = runner.generate(example_prompts, sampling_params)
assert outputs, "The model should generate output."
assert len(outputs) == len(example_prompts), "Should get one output for each prompt."
# With temperature=0.0, output is deterministic. It's highly recommended
# to assert on the exact expected output for a robust test.
generated_text = outputs[0][1][0]
assert len(generated_text) > len(example_prompts[0]), "The model should generate new text."

2 changes: 2 additions & 0 deletions tests/e2e/vllm_interface/vllm_test.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Base docker image used to build the vllm-ascend e2e test image, which is built in the vLLM repository
BASE_IMAGE_NAME="quay.io/ascend/cann:8.3.rc1.alpha002-910b-ubuntu22.04-py3.11"
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Based on the PR description, this file is intended to be downloaded and executed via source in a CI/CD pipeline. Sourcing shell scripts from a URL, especially one pointing to a personal fork (https://raw.githubusercontent.com/leo-pony/vllm-ascend/...), is a critical security vulnerability that can lead to arbitrary code execution. An attacker who gains control of the source repository can inject malicious commands into this file, which would then be executed by the build pipeline.

This configuration should be managed securely. For example, it could be part of the main repository, or fetched from a secure, trusted artifact storage. Please reconsider this approach.

Loading