Skip to content

Commit b155b15

Browse files
JieRen98XuehaiPan
andauthored
perf(acc_op): further acceleration with CUDA unroll (#112)
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
1 parent 8ef1bea commit b155b15

File tree

3 files changed

+305
-169
lines changed

3 files changed

+305
-169
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313

1414
### Added
1515

16+
- Add unroll pragma for CUDA OPs by [@JieRen98](https://github.com/JieRen98) and [@XuehaiPan](https://github.com/XuehaiPan) in [#112](https://github.com/metaopt/torchopt/pull/112).
1617
- Add Python implementation of accelerated OP and pure-Python wheels by [@XuehaiPan](https://github.com/XuehaiPan) in [#67](https://github.com/metaopt/torchopt/pull/67).
1718
- Add `nan_to_num` hook and gradient transformation by [@XuehaiPan](https://github.com/XuehaiPan) in [#119](https://github.com/metaopt/torchopt/pull/119).
1819
- Add matrix inversion linear solver with neumann series approximation by [@Benjamin-eecs](https://github.com/Benjamin-eecs) and [@XuehaiPan](https://github.com/XuehaiPan) in [#98](https://github.com/metaopt/torchopt/pull/98).

src/adam_op/adam_op_impl_cpu.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ void adamForwardNuCPUKernel(const scalar_t *__restrict__ updates_ptr,
135135
const scalar_t updates = updates_ptr[tid];
136136
const scalar_t nu = nu_ptr[tid];
137137

138-
const scalar_t nu_out = b2 * nu + (1 - b2) * pow(updates, 2);
138+
const scalar_t nu_out = b2 * nu + (1 - b2) * updates * updates;
139139
nu_out_ptr[tid] = nu_out;
140140
}
141141
}

0 commit comments

Comments
 (0)