[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

l1cacheDell · 2025-02-11T15:12:20Z

PR types

New features | Others: add new kernels for LLM prefilling accelerating.

PR changes

See file changes in csrc/gpu/sage_attn_kernels/*

Description

This PR added sage attention support kernels implemented for Paddle and PaddleNLP.

Sage Attention for 8-bit acceleration on attention inference, similar work to FA2 & FA3 but speed up 1.1 - 2.1 x compared to Flash Attention, with even lower gpu memory allocation. See SageAttention Official Repository and their paper (accepted by ICLR 2025) for more info.

This PR is currently working in progress.

Reviewing this PR may takes a lot...

paddle-bot · 2025-02-11T15:12:26Z

Thanks for your contribution!

codecov · 2025-02-11T15:47:56Z

Codecov Report

Attention: Patch coverage is 0.32787% with 304 lines in your changes missing coverage. Please review.

Project coverage is 50.35%. Comparing base (a381674) to head (2004a8a).
Report is 311 commits behind head on develop.

Files with missing lines	Patch %	Lines
...ddlenlp/experimental/transformers/sageattention.py	0.00%	271 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	33 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9848      +/-   ##
===========================================
- Coverage    50.48%   50.35%   -0.13%     
===========================================
  Files          755      756       +1     
  Lines       121257   121559     +302     
===========================================
- Hits         61215    61214       -1     
- Misses       60042    60345     +303

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

paddlenlp/experimental/transformers/fused_transformer_layers.py

csrc/gpu/sage_attn_kernels/sageattn_fused.cu

yuanlehome

LGTM

add sage attn sm90 kernels

3e9fe77

paddle-bot bot added the contributor label Feb 11, 2025

paddle-bot bot assigned ZHUI Feb 11, 2025

l1cacheDell marked this pull request as draft February 11, 2025 16:13

l1cacheDell added 3 commits February 12, 2025 18:54

fix

ca60579

add ds sageattn kernel

7bb2c79

update kernels

0a52545

l1cacheDell marked this pull request as ready for review February 25, 2025 06:29

update setup_cuda.py

787011b

l1cacheDell changed the title ~~[Feat] Sage Attention Kernels Support for sm89, sm90~~ [Feat] Sage Attention Kernels Support for sm80, sm89, sm90 Feb 25, 2025

l1cacheDell added 10 commits February 25, 2025 17:20

update dsk MLA kernel

1b9e4e0

Merge branch 'develop' into sageattn_support_ds

ac29e3c

clean PR branch

c460cce

fix sa usage

452a9de

bugfix

ec505b6

modify, for static mode inference SA

17a6bd8

add license info

7035e1e

add license info for py file

38ea097

modify license info

c42d1f5

modify license info

22894d7

chang-wenbin approved these changes Mar 4, 2025

View reviewed changes

yuanlehome reviewed Mar 5, 2025

View reviewed changes

paddlenlp/experimental/transformers/fused_transformer_layers.py Outdated Show resolved Hide resolved

yuanlehome reviewed Mar 5, 2025

View reviewed changes

csrc/gpu/sage_attn_kernels/sageattn_fused.cu Outdated Show resolved Hide resolved

l1cacheDell added 5 commits March 6, 2025 12:19

bsz=1 assert

41f2900

fix kernel

44350a1

move to import line

490db5a

Merge remote-tracking branch 'nlpp/develop' into exp_sa_merge

7e9730d

merge develop & support wint8&fp8

2004a8a

yuanlehome approved these changes Mar 6, 2025

View reviewed changes

ZHUI merged commit b36d306 into PaddlePaddle:develop Mar 6, 2025
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

Uh oh!

l1cacheDell commented Feb 11, 2025

Uh oh!

paddle-bot bot commented Feb 11, 2025

Uh oh!

codecov bot commented Feb 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yuanlehome left a comment

Uh oh!

Uh oh!

Uh oh!

[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

Uh oh!

Conversation

l1cacheDell commented Feb 11, 2025

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Feb 11, 2025

Uh oh!

codecov bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 11, 2025 •

edited

Loading