Skip to content

Conversation

anon189Ty
Copy link
Contributor

@anon189Ty anon189Ty commented Aug 28, 2025

What this PR does / why we need it?

Currently, when executing to the Linear layer of the model in vLLM-Ascend, the weights input format is ND in unquantized case and skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights of the Linear layer will be converted to FRACTAL_NZ, in both unquantized case and skipped ascend case.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization for unquantized linear layers on Ascend hardware by converting weights to the FRACTAL_NZ format for CANN 8.3. This is implemented through a new AscendUnquantizedLinearMethod. The changes correctly propagate this new method to various linear layers, including a new AscendQKVParallelLinear for QKV projections, and the tests are updated accordingly. However, I've found a couple of critical typos in type hints that would lead to NameErrors, and an indentation error in a test file that would cause a SyntaxError. These issues need to be addressed before merging.

Comment on lines 101 to 103
self.quant_method: Qptional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo in the type hint. Qptional should be Optional. This will cause a NameError at runtime as Qptional is not defined.

Suggested change
self.quant_method: Qptional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()
self.quant_method: Optional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()

Comment on lines 174 to 176
self.quant_method: Qptional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo in the type hint. Qptional should be Optional. This will cause a NameError at runtime as Qptional is not defined.

Suggested change
self.quant_method: Qptional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()
self.quant_method: Optional[
QuantizeMethodBase] = AscendUnquantizedLinearMethod()

expect_data = torch_npu.npu_format_cast(
expect_data, ACL_FORMAT_FRACTAL_NZ)
self.assertTrue(torch.equal(layer.weight.data, expect_data))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line contains only whitespace, but it's indented, which will cause an IndentationError. Please remove this line.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@anon189Ty anon189Ty changed the title Unquantized linear nz support [Feat] Unquantized linear nz support Aug 29, 2025

def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
super().process_weights_after_loading(layer)
if torch.version.cann.startswith("8.3"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a cudagraph check necessary?

@anon189Ty anon189Ty force-pushed the unquantized_linear_nz_support branch from 2ac497c to 24773fd Compare August 29, 2025 15:25
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@anon189Ty anon189Ty force-pushed the unquantized_linear_nz_support branch from 24773fd to 11e14dd Compare August 29, 2025 16:02
@anon189Ty anon189Ty force-pushed the unquantized_linear_nz_support branch 13 times, most recently from 864f9ae to c30eed9 Compare August 31, 2025 12:56
@anon189Ty anon189Ty force-pushed the unquantized_linear_nz_support branch 4 times, most recently from e5f110f to 66e6579 Compare September 3, 2025 02:26
Copy link

github-actions bot commented Sep 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

github-actions bot commented Sep 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
@anon189Ty anon189Ty force-pushed the unquantized_linear_nz_support branch from 53fdccd to 0bb2a32 Compare September 10, 2025 07:02
@MengqingCao MengqingCao added ready read for review ready-for-test start test by label for PR labels Sep 11, 2025
@yiz-liu
Copy link
Collaborator

yiz-liu commented Sep 11, 2025

@MengqingCao All clear, except for some known issues which is not introduced by this PR, please merge this ASAP.

@MengqingCao MengqingCao merged commit 7b2ecc1 into vllm-project:main Sep 11, 2025
34 of 35 checks passed
yiz-liu pushed a commit to linfeng-yuan/vllm-ascend that referenced this pull request Sep 12, 2025
### What this PR does / why we need it?
Currently, when executing to the Linear layer of the model in
vLLM-Ascend, the weights input format is ND in unquantized case and
skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When
VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights
of the Linear layer will be converted to FRACTAL_NZ, in both unquantized
case and skipped ascend case.

- vLLM version: main
- vLLM main:
vllm-project/vllm@267c80d

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Yikun added a commit to Yikun/vllm-ascend that referenced this pull request Sep 12, 2025
Yikun added a commit to Yikun/vllm-ascend that referenced this pull request Sep 12, 2025
This reverts commit 7b2ecc1.

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
anon189Ty added a commit to anon189Ty/vllm-ascend that referenced this pull request Sep 12, 2025
This reverts commit 7b2ecc1.

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
wangxiyuan pushed a commit that referenced this pull request Sep 12, 2025
### What this PR does / why we need it?
This reverts commit 7b2ecc1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: main
- vLLM main:
vllm-project/vllm@64d90c3

Closes: #2890
Closes: #2887
Closes: #2885

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
### What this PR does / why we need it?
Currently, when executing to the Linear layer of the model in
vLLM-Ascend, the weights input format is ND in unquantized case and
skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When
VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights
of the Linear layer will be converted to FRACTAL_NZ, in both unquantized
case and skipped ascend case.

- vLLM version: main
- vLLM main:
vllm-project/vllm@267c80d

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…lm-project#2896)

### What this PR does / why we need it?
This reverts commit 7b2ecc1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: main
- vLLM main:
vllm-project/vllm@64d90c3

Closes: vllm-project#2890
Closes: vllm-project#2887
Closes: vllm-project#2885

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
Currently, when executing to the Linear layer of the model in
vLLM-Ascend, the weights input format is ND in unquantized case and
skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When
VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights
of the Linear layer will be converted to FRACTAL_NZ, in both unquantized
case and skipped ascend case.

- vLLM version: main
- vLLM main:
vllm-project/vllm@267c80d

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
…lm-project#2896)

### What this PR does / why we need it?
This reverts commit 7b2ecc1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: main
- vLLM main:
vllm-project/vllm@64d90c3

Closes: vllm-project#2890
Closes: vllm-project#2887
Closes: vllm-project#2885

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
Currently, when executing to the Linear layer of the model in
vLLM-Ascend, the weights input format is ND in unquantized case and
skipped ascend case, which is slower than FRACTAL_NZ.
This PR supplements the execution logic for Linear layer. When
VLLM_ASCEND_ENABLE_MLP_OPTIMIZE=1 and CANN version is 8.3, the weights
of the Linear layer will be converted to FRACTAL_NZ, in both unquantized
case and skipped ascend case.

- vLLM version: main
- vLLM main:
vllm-project/vllm@267c80d

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…lm-project#2896)

### What this PR does / why we need it?
This reverts commit 7b2ecc1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: main
- vLLM main:
vllm-project/vllm@64d90c3

Closes: vllm-project#2890
Closes: vllm-project#2887
Closes: vllm-project#2885

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants