What's Changed
Fingerprint
- [Fingerprint] Unify fingerprint calculation by @jiazhihao in #171
Grace Hopper Support
- Grace Hopper: let users assign tasks to different warp groups by @xinhaoc in #165
- Set num_warp_groups and pipeline_stages with default value in generate_cuda_program() by @xinhaoc in #179
- Fix MMA Threadlayout issue by @xinhaoc in #197
- Hopper: Add bf16 and fix some corner cases by @xinhaoc in #198
QWen2.5 Demo
- DeepSeek / Qwen demo by @jiazhihao in #192
- [Demo] DeepSeek Demo Part 2 by @jiazhihao in #212
- [Demo] DeepSeek demo Part 3 by @jiazhihao in #215
New operators
Triton backend
- Triton warnings fix & triton kernel added by @NorthmanPKU in #181
- Triton kernel for RoPE by @NorthmanPKU in #218
- [Triton] Move Triton runtime under triton_transpiler/ instead of transpiler/ by @NorthmanPKU in #227
Others
- Fix build issue caused by the previous pr by @xinhaoc in #169
- CI Tests by @yzhou442 in #203
- Profiler by @xinhaoc in #199
New Contributors
- @olivia111 made their first contribution in #186
- @Theorem411 made their first contribution in #208
Full Changelog: v0.2.3...v0.2.4