We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
可以考虑增加flashattention
matmul和scaled的分块运算就跳过了
flashattention 的关键是softmax分块运算核心是延迟全局依赖计算:
The text was updated successfully, but these errors were encountered:
cu实现参考
https://github.com/tspeterkim/flash-attention-minimal
Sorry, something went wrong.
No branches or pull requests
可以考虑增加flashattention
1.flashattention (v1版本)实现要点
matmul和scaled的分块运算就跳过了
flashattention 的关键是softmax分块运算核心是延迟全局依赖计算:
The text was updated successfully, but these errors were encountered: