-
Notifications
You must be signed in to change notification settings - Fork 487
feat: add mtp ut and fix some bugs #2453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables a unit test for MTP speculative decoding and includes several bug fixes and refactorings to support it. The changes span across testing, attention mechanism, model definition, and worker logic. The test case correctness check is improved by comparing token IDs instead of text. A critical bug in the attention metadata creation for decode requests is fixed. A new load_weights
method is added for the DeepSeek MTP model. My review focuses on the new load_weights
implementation, where I've identified a potentially buggy condition for handling expert weights. The rest of the changes appear correct and improve the codebase.
vllm_ascend/models/deepseek_mtp.py
Outdated
if (("mlp.experts." in name) and name not in params_dict): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition name not in params_dict
is likely incorrect or at least very confusing. name
is the key from the checkpoint, and it's not expected to be in params_dict
before the name is transformed. This makes the condition almost always true, so the check is effectively if "mlp.experts." in name:
.
A more robust implementation, as seen in other parts of vLLM, would be to check if the transformed name exists in params_dict
. This ensures that you only skip weights that are truly not part of the current model's non-expert MLP layers.
Consider changing the condition to be similar to the deepseek_v2
implementation in vLLM for clarity and correctness.
if (("mlp.experts." in name) and name not in params_dict): | |
continue | |
if "mlp.experts." in name and name.replace(weight_name, param_name) not in params_dict: | |
continue |
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
}, | ||
max_model_len=256, | ||
enforce_eager=True) | ||
spec_llm = LLM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use VLLMRunner, it can clean the memory automtic
vllm_ascend/models/deepseek_mtp.py
Outdated
inputs_embeds, spec_step_idx) | ||
return hidden_states | ||
|
||
def load_weights(self, weights: Iterable[tuple[str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please chekc that if v0.10.1 and main for vLLM has the problem, if not, please remove this override
vllm_ascend/models/deepseek_mtp.py
Outdated
inputs_embeds, spec_step_idx) | ||
return hidden_states | ||
|
||
def load_weights(self, weights: Iterable[tuple[str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this function is needed, please add this function also in ./torchair/models/torchair_deepseek_mtp.py
66d0a18
to
ab667d7
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2453 +/- ##
==========================================
+ Coverage 77.70% 78.04% +0.33%
==========================================
Files 132 132
Lines 17521 17557 +36
==========================================
+ Hits 13615 13702 +87
+ Misses 3906 3855 -51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
c1a3799
to
19b5a4b
Compare
Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com>
### What this PR does / why we need it? Fix mtp mode ut ### Does this PR introduce _any_ user-facing change? Nothing ### How was this patch tested? This can be tested in the same way as a unit test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@5341565 Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com> Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>
### What this PR does / why we need it? Fix mtp mode ut ### Does this PR introduce _any_ user-facing change? Nothing ### How was this patch tested? This can be tested in the same way as a unit test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@5341565 Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com> Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>
What this PR does / why we need it?
Fix mtp mode ut
Does this PR introduce any user-facing change?
Nothing
How was this patch tested?
This can be tested in the same way as a unit test.