Skip to content

Conversation

Pr0Wh1teGivee
Copy link
Contributor

@Pr0Wh1teGivee Pr0Wh1teGivee commented Aug 29, 2025

What this PR does / why we need it?

In a mixed-precision scenario, quant_config is not None, but MoE needs to perform unquantized computation; however, quantized computation is currently being used. Therefore, we put the with_quant logic into forward, avoid misjudging in mix-precision scenarios.

Does this PR introduce any user-facing change?

no

How was this patch tested?

e2e & ut

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors how quantization is handled by removing the with_quant flag from the forward context and instead passing it as an explicit parameter down the call stack. This is a good improvement for code clarity and maintainability. The changes are consistent across the modified files. However, I found a critical syntax error due to a full-width comma being used instead of a standard one, which will break the code.

shared_gate_up=shared_gate_up,
shared_dequant_scale=shared_dequant_scale,
mc2_mask=kwargs.get("mc2_mask", None))
mc2_mask=kwargs.get("mc2_mask", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a syntax error here. A full-width comma () is used instead of a standard comma (,), which will cause a SyntaxError.

Suggested change
mc2_mask=kwargs.get("mc2_mask", None)
mc2_mask=kwargs.get("mc2_mask", None),

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Pr0Wh1teGivee Pr0Wh1teGivee force-pushed the fix_with_quant branch 2 times, most recently from e0175d8 to 4129bc8 Compare August 29, 2025 04:32
@Pr0Wh1teGivee Pr0Wh1teGivee changed the title Fix with quant [main] [bugfix] Fix with quant Aug 29, 2025
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.60%. Comparing base (600b08f) to head (8071af7).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2627      +/-   ##
==========================================
- Coverage   72.61%   72.60%   -0.01%     
==========================================
  Files         147      147              
  Lines       21805    21792      -13     
==========================================
- Hits        15833    15822      -11     
+ Misses       5972     5970       -2     
Flag Coverage Δ
unittests 72.60% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pr0Wh1teGivee
Copy link
Contributor Author

Error from e2e test is not introduced by this pr.

@Pr0Wh1teGivee Pr0Wh1teGivee changed the title [main] [bugfix] Fix with quant [main] [bugfix] Fix misjudging quantized/unquantized scenarios Aug 29, 2025
@wangxiyuan wangxiyuan merged commit 52aff9e into vllm-project:main Aug 29, 2025
31 of 35 checks passed
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Aug 29, 2025
…project#2627)

### What this PR does / why we need it?
In a mixed-precision scenario, quant_config is not None, but MoE needs
to perform unquantized computation; however, quantized computation is
currently being used. Therefore, we put the with_quant logic into
forward, avoid misjudging in mix-precision scenarios.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@98ac0cb

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Aug 29, 2025
…project#2627)

### What this PR does / why we need it?
In a mixed-precision scenario, quant_config is not None, but MoE needs
to perform unquantized computation; however, quantized computation is
currently being used. Therefore, we put the with_quant logic into
forward, avoid misjudging in mix-precision scenarios.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@98ac0cb

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
MengqingCao pushed a commit that referenced this pull request Sep 2, 2025
### What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor

Depends on PRs:
fused moe fix: #2627 
torchair multi DP fix:
#2626

### Does this PR introduce _any_ user-facing change?
when dp is enabled, to run mtp online server, need to disable server log
due to the current metrics does not support multi dp
`--disable-log-stats`
### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@7c8271c

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
wenba0 pushed a commit to wenba0/vllm-ascend that referenced this pull request Sep 5, 2025
…project#2627)

### What this PR does / why we need it?
In a mixed-precision scenario, quant_config is not None, but MoE needs
to perform unquantized computation; however, quantized computation is
currently being used. Therefore, we put the with_quant logic into
forward, avoid misjudging in mix-precision scenarios.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@98ac0cb

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
### What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor

Depends on PRs:
fused moe fix: vllm-project#2627
torchair multi DP fix:
vllm-project#2626

### Does this PR introduce _any_ user-facing change?
when dp is enabled, to run mtp online server, need to disable server log
due to the current metrics does not support multi dp
`--disable-log-stats`
### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@7c8271c

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
…project#2627)

### What this PR does / why we need it?
In a mixed-precision scenario, quant_config is not None, but MoE needs
to perform unquantized computation; however, quantized computation is
currently being used. Therefore, we put the with_quant logic into
forward, avoid misjudging in mix-precision scenarios.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@98ac0cb

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor

Depends on PRs:
fused moe fix: vllm-project#2627 
torchair multi DP fix:
vllm-project#2626

### Does this PR introduce _any_ user-facing change?
when dp is enabled, to run mtp online server, need to disable server log
due to the current metrics does not support multi dp
`--disable-log-stats`
### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@7c8271c

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…project#2627)

### What this PR does / why we need it?
In a mixed-precision scenario, quant_config is not None, but MoE needs
to perform unquantized computation; however, quantized computation is
currently being used. Therefore, we put the with_quant logic into
forward, avoid misjudging in mix-precision scenarios.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@98ac0cb

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor

Depends on PRs:
fused moe fix: vllm-project#2627 
torchair multi DP fix:
vllm-project#2626

### Does this PR introduce _any_ user-facing change?
when dp is enabled, to run mtp online server, need to disable server log
due to the current metrics does not support multi dp
`--disable-log-stats`
### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@7c8271c

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants