Skip to content

[Feat] Sage Attention Kernels Support for sm80, sm89, sm90 #9848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 6, 2025

Conversation

l1cacheDell
Copy link
Contributor

PR types

New features | Others: add new kernels for LLM prefilling accelerating.

PR changes

See file changes in csrc/gpu/sage_attn_kernels/*

Description

This PR added sage attention support kernels implemented for Paddle and PaddleNLP.

Sage Attention for 8-bit acceleration on attention inference, similar work to FA2 & FA3 but speed up 1.1 - 2.1 x compared to Flash Attention, with even lower gpu memory allocation. See SageAttention Official Repository and their paper (accepted by ICLR 2025) for more info.

This PR is currently working in progress.

Reviewing this PR may takes a lot...

Copy link

paddle-bot bot commented Feb 11, 2025

Thanks for your contribution!

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 0.32787% with 304 lines in your changes missing coverage. Please review.

Project coverage is 50.35%. Comparing base (a381674) to head (2004a8a).
Report is 311 commits behind head on develop.

Files with missing lines Patch % Lines
...ddlenlp/experimental/transformers/sageattention.py 0.00% 271 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 33 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9848      +/-   ##
===========================================
- Coverage    50.48%   50.35%   -0.13%     
===========================================
  Files          755      756       +1     
  Lines       121257   121559     +302     
===========================================
- Hits         61215    61214       -1     
- Misses       60042    60345     +303     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@l1cacheDell l1cacheDell marked this pull request as draft February 11, 2025 16:13
@l1cacheDell l1cacheDell marked this pull request as ready for review February 25, 2025 06:29
@l1cacheDell l1cacheDell changed the title [Feat] Sage Attention Kernels Support for sm89, sm90 [Feat] Sage Attention Kernels Support for sm80, sm89, sm90 Feb 25, 2025
Copy link
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit b36d306 into PaddlePaddle:develop Mar 6, 2025
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants