Skip to content

Conversation

@yoyolicoris
Copy link
Member

@yoyolicoris yoyolicoris commented Jan 24, 2025

C++ kernel for LPC adapted from #10. Will close #10 after merged.
Compiled with OpenMP if available.
The following is the benchmark (order=20) compared to v0.6.

[--------------------------  --------------------------]
                                |  numba_lpc  |  lpc_cpu
4 threads: ---------------------------------------------
      bs_1__n_16384__threads_4  |     341.0   |    307.1
      bs_2__n_16384__threads_4  |     343.5   |    392.9
      bs_4__n_16384__threads_4  |     374.7   |    423.4
      bs_8__n_16384__threads_4  |    2434.8   |   3099.4

Times are in microseconds (us).

The inner for-loop is not parallelised, though.
I tried some SIMD instructions with OpenMP but didn't succeed.
It should be fine since the runtime is just slightly slower than Numba.

Copy link
Member

@christhetree christhetree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, left two comments, but the changes look good. Feel free to close my PR if you like. Maybe benchmark_forward.py could be useful from that PR, but otherwise this covers everything.

@yoyolicoris yoyolicoris merged commit d372cee into main Jan 29, 2025
5 checks passed
@yoyolicoris yoyolicoris deleted the feat-lpc-cpu branch January 29, 2025 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants