-
Notifications
You must be signed in to change notification settings - Fork 123
[FEATURE REQUEST] Add DCT #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
People tend to run the same sized DCT many times in their code, so we can alternatively do def mkDCT(N):
L_N = 2 * np.exp(-1j*np.pi*np.arange(N)/(2*N))
def DCT_N(x):
y = np.zeros(x.shape)
y[:(N-1)//2+1] = x[::2]
if N % 2: # odd length
y[(N-1)//2+1:] = x[-2::-2]
else: # even length
y[(N-1)//2+1:] = x[::-2]
Y = np.fft.fft(y)
return (Y.real * L_N.real) - (Y.imag * L_N.imag)
return DCT_N So we can have memoization, and also have it garbage collected when it's not needed. This breaks correspondence with the scipy API though. |
If compatibility with Having said that, if you think that there is interest in this function, I would be happy to incorporate it here. If you want to boost the visibility of the issue, you can also post in https://github.com/v923z/micropython-ulab/discussions. |
It would be very nice if this can be even faster! I've been looking at the source and documentation spectrogram and FFT, and figuring out a way to not create the However, it seems most of the time is spent in FFT and in generating the
def mkDCT(N):
L_N = 2 * np.exp(-1j*np.pi*np.arange(N)/(2*N))
y = np.zeros(N)
@timer
def DCT_N(x):
y[:(N-1)//2+1] = x[::2]
if N % 2: # odd length
y[(N-1)//2+1:] = x[-2::-2]
else: # even length
y[(N-1)//2+1:] = x[::-2]
Y = np.fft.fft(y)
return (Y.real * L_N.real) - (Y.imag * L_N.imag)
return DCT_N |
I'm not quite sure I follow: in the FFT, you return a new array, because you want to leave micropython-ulab/code/utils/utils.c Lines 339 to 350 in 73ed8cc
On a microcrontoller, that's not the most efficient code: you calculate the exponential of a purely imaginary number, and then you also take the real and imaginary parts at the very end of your function. The same can be achieved by taking the Also note that the instruction
And after all this, you then take This is really a lot of hassle, when you actually want something like this: t = np.linspace(0, 0.5*np.pi, num=N)
...
Y.real *= np.cos(t)
Y.imag *= np.sin(t)
Y.real -= Y.imag
Y.real *= 2
return Y.real If you really want to, you can save one memory allocation by doing this: t = np.linspace(0, 0.5*np.pi, num=N)
...
scratchpad = np.zeros(N)
np.cos(t, out=scratchpad)
Y.real *= scratchpad
np.sint, out=scratchpad)
Y.imag *= scratchpad
Y.real -= Y.imag
Y.real *= 2
return Y.real I don't know where the most time is spent, but since the operations are all in-place, your RAM won't be fragmented. I expect this to be very processor-specific, thus, hard numbers won't probably mean too much. |
I thought I knew how to write performant code and, reading this reply, I'm just amazed. I've never seriously written for micro-controllers, but now I wish I had. I'm going to try out these ideas first thing in the morning! Thank you so much! |
As I said, gains are specific to the hardware, even to the compiler settings, so you might have to experiment. But here are a couple of comments: https://micropython-ulab.readthedocs.io/en/latest/ulab-tricks.html, and I would also highly recommend Damien George's talk at pyconau: https://www.youtube.com/watch?v=hHec4qL00x0. Part of the problem here is that the word |
Uh oh!
There was an error while loading. Please reload this page.
Add DCT-2 (Type-II Discrete Cosine Transform) functionality to ulab, similar to SciPy's implementation. DCT is widely used in image and audio processing applications.
Wikipedia: https://en.wikipedia.org/wiki/Discrete_cosine_transform
SciPy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.fftpack.dct.html
Technical Details
Benefits
Benchmark
DCT_2N_M
is the mirrored versionDCT_2N_P
is the padded version.np.exp(...)
(refer to code below) , which can be memoized.I would recommend DCT_N (
DCT_FFTN
) in code with theL = 2 * np.exp(...)
memoized for values of N for the last few calls. FFT compute andL = 2 * np.exp(...)
cache memory occupied are the least for it.If there is interest and drive to implement this feature, I will help find FFT implementations for other forms of DCT. I would implement this and create a PR, but wanted to check the waters first, is there interest in the community for this?
Code
The text was updated successfully, but these errors were encountered: