[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

rjg-lyh · 2025-07-15T07:56:12Z

What this PR does / why we need it?

Optimizes the performance of the Qwen3 quantization model by registering a custom model and adding the AddRmsNormQuant operation. Subsequent PRs will focus on performance optimizations based on this custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@3e04107

codecov · 2025-07-15T10:34:16Z

Codecov Report

Attention: Patch coverage is 49.38272% with 41 lines in your changes missing coverage. Please review.

Project coverage is 54.86%. Comparing base (bf25498) to head (3ee0a48).

Files with missing lines	Patch %	Lines
vllm_ascend/models/qwen3.py	48.43%	33 Missing ⚠️
vllm_ascend/ops/layernorm.py	27.27%	8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (49.38%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1806      +/-   ##
==========================================
- Coverage   54.93%   54.86%   -0.07%     
==========================================
  Files          80       81       +1     
  Lines        9712     9789      +77     
==========================================
+ Hits         5335     5371      +36     
- Misses       4377     4418      +41

Flag	Coverage Δ
unittests	`54.86% <49.38%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wangxiyuan · 2025-07-16T01:17:03Z

vllm_ascend/models/qwen3.py

@@ -0,0 +1,156 @@
+from collections.abc import Iterable


do not rewrite the model arch, if the change is only AddRMSNormW8A8Quant.

From 0.9.2, vllm support custom ops override, we can register our ops when setup vllm-ascend, take #1647 for example.

…s performance Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions bot added module:ops module:quantization labels Jul 15, 2025

rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 6ee87ad to 3ee0a48 Compare July 15, 2025 10:17

wangxiyuan reviewed Jul 16, 2025

View reviewed changes

rjg-lyh force-pushed the pr-addrmsnorm-main branch from 3ee0a48 to b50de2c Compare July 18, 2025 09:56

github-actions bot added the module:tests label Jul 18, 2025

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3'…

1b0e244

…s performance Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-addrmsnorm-main branch from b50de2c to 1b0e244 Compare July 19, 2025 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

rjg-lyh commented Jul 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jul 15, 2025

Uh oh!

wangxiyuan Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

Are you sure you want to change the base?

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

Conversation

rjg-lyh commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov bot commented Jul 15, 2025

Codecov Report

Uh oh!

wangxiyuan Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rjg-lyh commented Jul 15, 2025 •

edited by github-actions bot

Loading

wangxiyuan Jul 16, 2025 •

edited

Loading