You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to the FastDense optimizations in SciML/DiffEqFlux.jl#671, this library can definitely benefit from having pre-cached versions of the operations since the neural networks are generally small. In addition, the plan_fft part could be cached and reused for subsequent calls. Given the amount of reuse, direct control of the planning could be helpful:
The flags argument is a bitwise-or of FFTW planner flags, defaulting to FFTW.ESTIMATE. e.g. passing FFTW.MEASURE or FFTW.PATIENT will instead spend several seconds (or more) benchmarking different possible FFT algorithms and picking the fastest one; see the FFTW manual for more information on planner flags. The optional timelimit argument specifies a rough upper bound on the allowed planning time, in seconds. Passing FFTW.MEASURE or FFTW.PATIENT may cause the input array A to be overwritten with zeros during plan creation.
Note that the precaching only removes allocations in cases with a single forward before reverse. A separate pointer bumping method would be necessary to precache a whole batch of test inputs, if multiple batches are used in one loss equation.
The text was updated successfully, but these errors were encountered:
I did toy around with FFT Plans at an earlier point (#11#14) but then put it off since it turned out to be too much hassle for me at that time - does that cover something different?
Similar to the
FastDense
optimizations in SciML/DiffEqFlux.jl#671, this library can definitely benefit from having pre-cached versions of the operations since the neural networks are generally small. In addition, theplan_fft
part could be cached and reused for subsequent calls. Given the amount of reuse, direct control of the planning could be helpful:Note that the precaching only removes allocations in cases with a single forward before reverse. A separate pointer bumping method would be necessary to precache a whole batch of test inputs, if multiple batches are used in one loss equation.
The text was updated successfully, but these errors were encountered: