Skip to content

Conversation

22dimensions
Copy link
Collaborator

@22dimensions 22dimensions commented Sep 19, 2025

What this PR does / why we need it?

Some custom models in vllm-ascend define packed_modules_mapping, which prevent keeping same model class with vllm community. So move these custom packed_modules_mapping to quant utils.py. After this pr, some custom models can be removed.

Does this PR introduce any user-facing change?

tested by CI

How was this patch tested?

tested by CI

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the quantization logic by centralizing the packed_modules_mapping from individual model files into a single dictionary in vllm_ascend/quantization/utils.py. This is a good improvement for maintainability. However, I've found a critical issue where the mapping for the qwen2_5_vl model was removed but not added to the new centralized map, which will likely break quantization for that model. I've also suggested a cleanup in get_quant_method to remove a now-unused parameter, improving code clarity.

@22dimensions 22dimensions force-pushed the remove_packed_module branch 4 times, most recently from 09b8b4e to c693211 Compare September 19, 2025 03:42
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 19, 2025
"experts":
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"]
},
"qwen3_next": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the model file of qwen3-next and qwen2.5 vl is removed from vllm-ascend, this mapping can be removed as well

cc @wxsIcey please clean this qwen3-next as well.

prefix: str) -> Optional["QuantizeMethodBase"]:
vllm_config = get_current_vllm_config()
model_type = vllm_config.model_config.hf_config.model_type
if model_type in packed_modules_model_mapping:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if model_type in packed_modules_model_mapping:
if model_type in packed_modules_model_mapping.keys():

maybe should be this ?

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
@wangxiyuan wangxiyuan merged commit 0942d9a into vllm-project:main Sep 19, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:quantization module:tests ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants