Skip to content

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rjg-lyh
Copy link
Contributor

@rjg-lyh rjg-lyh commented Jul 15, 2025

What this PR does / why we need it?

Optimizes the performance of the Qwen3 quantization model by registering a custom model and adding the AddRmsNormQuant operation. Subsequent PRs will focus on performance optimizations based on this custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 6ee87ad to 3ee0a48 Compare July 15, 2025 10:17
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 49.38272% with 41 lines in your changes missing coverage. Please review.

Project coverage is 54.86%. Comparing base (bf25498) to head (3ee0a48).

Files with missing lines Patch % Lines
vllm_ascend/models/qwen3.py 48.43% 33 Missing ⚠️
vllm_ascend/ops/layernorm.py 27.27% 8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (49.38%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1806      +/-   ##
==========================================
- Coverage   54.93%   54.86%   -0.07%     
==========================================
  Files          80       81       +1     
  Lines        9712     9789      +77     
==========================================
+ Hits         5335     5371      +36     
- Misses       4377     4418      +41     
Flag Coverage Δ
unittests 54.86% <49.38%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -0,0 +1,156 @@
from collections.abc import Iterable
Copy link
Collaborator

@wangxiyuan wangxiyuan Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not rewrite the model arch, if the change is only AddRMSNormW8A8Quant.

From 0.9.2, vllm support custom ops override, we can register our ops when setup vllm-ascend, take #1647 for example.

…s performance

Signed-off-by: rjg-lyh <1318825571@qq.com>
@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch from b50de2c to 1b0e244 Compare July 19, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants