Skip to content

[AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots #1718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 10, 2025

Conversation

MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Jul 10, 2025

What this PR does / why we need it?

Now there is no need to calculate num_draft_tokens when allocating slots.

This PR follows the changes in vllm: vllm-project/vllm#20701

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test

@@ -282,15 +282,10 @@ def skip_cur_request():
req_index += 1
continue

num_draft_tokens = max(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use vllm_version_is to keep it both work on 0.9.2 and main

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 10, 2025
Copy link

codecov bot commented Jul 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.57%. Comparing base (c30ddb8) to head (5eacc1b).
Report is 107 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1718       +/-   ##
===========================================
+ Coverage   27.39%   54.57%   +27.18%     
===========================================
  Files          56       80       +24     
  Lines        6191     9968     +3777     
===========================================
+ Hits         1696     5440     +3744     
- Misses       4495     4528       +33     
Flag Coverage Δ
unittests 54.57% <ø> (+27.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: MengqingCao <cmq0113@163.com>
MODELS = [
"Qwen/Qwen2.5-0.5B-Instruct",
# TODO: REVERT ME when oom is fixed
# "vllm-ascend/Qwen3-30B-A3B-Puring"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested locally, the test passed. I guess it's the resource release problem on CI system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, will raise another pr to fix it

@wangxiyuan wangxiyuan merged commit cc210f4 into vllm-project:main Jul 10, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants