-
Notifications
You must be signed in to change notification settings - Fork 452
[3/N][Refactor][Quantization]remove packed_modules_mapping from models #3021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3/N][Refactor][Quantization]remove packed_modules_mapping from models #3021
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the quantization logic by centralizing the packed_modules_mapping
from individual model files into a single dictionary in vllm_ascend/quantization/utils.py
. This is a good improvement for maintainability. However, I've found a critical issue where the mapping for the qwen2_5_vl
model was removed but not added to the new centralized map, which will likely break quantization for that model. I've also suggested a cleanup in get_quant_method
to remove a now-unused parameter, improving code clarity.
09b8b4e
to
c693211
Compare
"experts": | ||
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"] | ||
}, | ||
"qwen3_next": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the model file of qwen3-next and qwen2.5 vl is removed from vllm-ascend, this mapping can be removed as well
cc @wxsIcey please clean this qwen3-next
as well.
prefix: str) -> Optional["QuantizeMethodBase"]: | ||
vllm_config = get_current_vllm_config() | ||
model_type = vllm_config.model_config.hf_config.model_type | ||
if model_type in packed_modules_model_mapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if model_type in packed_modules_model_mapping: | |
if model_type in packed_modules_model_mapping.keys(): |
maybe should be this ?
Signed-off-by: 22dimensions <waitingwind@foxmail.com>
c693211
to
6a91933
Compare
What this PR does / why we need it?
Some custom models in vllm-ascend define packed_modules_mapping, which prevent keeping same model class with vllm community. So move these custom packed_modules_mapping to quant utils.py. After this pr, some custom models can be removed.
Does this PR introduce any user-facing change?
tested by CI
How was this patch tested?
tested by CI