-
Notifications
You must be signed in to change notification settings - Fork 267
[AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -282,15 +282,10 @@ def skip_cur_request(): | |||
req_index += 1 | |||
continue | |||
|
|||
num_draft_tokens = max( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use vllm_version_is to keep it both work on 0.9.2 and main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: MengqingCao <cmq0113@163.com>
d5051ff
to
06be93e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1718 +/- ##
===========================================
+ Coverage 27.39% 54.57% +27.18%
===========================================
Files 56 80 +24
Lines 6191 9968 +3777
===========================================
+ Hits 1696 5440 +3744
- Misses 4495 4528 +33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: MengqingCao <cmq0113@163.com>
MODELS = [ | ||
"Qwen/Qwen2.5-0.5B-Instruct", | ||
# TODO: REVERT ME when oom is fixed | ||
# "vllm-ascend/Qwen3-30B-A3B-Puring" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested locally, the test passed. I guess it's the resource release problem on CI system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, will raise another pr to fix it
What this PR does / why we need it?
Now there is no need to calculate
num_draft_tokens
when allocating slots.This PR follows the changes in vllm: vllm-project/vllm#20701
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
CI passed with existing test