You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[📖Toy-HGEMM Library⚡️⚡️](./kernels/hgemm) is a library that write many HGEMM kernels from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API, thus, can achieve `98%~100%` performance of **cuBLAS**. The codes here are source from 📖[CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) and exported as a standalone library, please checkout [CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) for latest updates. Welcome to 🌟👆🏻star this repo to support me, thanks ~ 🎉🎉
6
+
[📖Toy-HGEMM Library⚡️⚡️](./kernels/hgemm) is a library that write many HGEMM kernels from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API, thus, can achieve `98%~100%` performance of **cuBLAS**. The codes here are source from 📖[CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) and exported as a standalone library, please checkout [CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) for latest updates. Welcome to 🌟👆🏻star this repo to support me, many thanks ~ 🎉🎉
7
7
8
8
<divid="hgemm-sgemm"></div>
9
9
@@ -83,26 +83,34 @@ void hgemm_mma_stages_block_swizzle_tn_cute(torch::Tensor a, torch::Tensor b, to
83
83
84
84
## 📖 目录
85
85
86
+
- [📖 Prerequisites](#prerequisites)
86
87
- [📖 Installation](#install)
87
-
- [📖 Python/C++ Testing](#test)
88
+
- [📖 Python Testing](#test)
89
+
- [📖 C++ Testing](#test-cpp)
88
90
- [📖 NVIDIA L20 bench](#perf-l20)
89
91
- [📖 NVIDIA RTX 4090 bench](#perf-4090)
90
92
- [📖 NVIDIA RTX 3080 Laptop bench](#perf-3080)
91
93
- [📖 Docs](#opt-docs)
92
94
- [📖 References](#ref)
93
95
96
+
## 📖 Prerequisites
97
+
<div id="prerequisites"></div>
98
+
99
+
- PyTorch >= 2.0, CUDA >= 12.0
100
+
- Recommended: PyTorch >= 2.5.1, CUDA >= 12.6
101
+
94
102
## 📖 Installation
95
103
96
104
<div id="install"></div>
97
105
98
-
The HGEMM implemented in this repo can be install as a python library, namely, `toy-hgemm` library (optional)
106
+
The HGEMM implemented in this repo can be install as a python library, namely, `toy-hgemm` library (optional).
0 commit comments