-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[mypyc] librt base64: use existing SIMD CPU dispatch by customizing build flags #20253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b611c27 to
067b1b8
Compare
This comment has been minimized.
This comment has been minimized.
0ce45ae to
cca1dc2
Compare
This comment has been minimized.
This comment has been minimized.
c584ff9 to
6e96889
Compare
This comment has been minimized.
This comment has been minimized.
6e96889 to
d270f09
Compare
This comment has been minimized.
This comment has been minimized.
d270f09 to
a0c90f5
Compare
This comment has been minimized.
This comment has been minimized.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
I think there should also be a documented flag for setting the hardware floor (mostly useful for avx2 in order to avoid CPU dispatch if you know your hardware supports it) |
Thank you for the review @jhance ; is adding a flag for setting the hardware floor a blocker for merging? According to upstream, and confirmed by my review of the code, the codec choice (as a result of the CPU detection) is only done once and is saved for the lifetime of the program. mypy/mypyc/lib-rt/base64/lib.c Lines 12 to 15 in 094f66d
Prior to this PR, the CPU detection was already being run: on X86_64 systems mypy/mypyc/lib-rt/base64/codec_choose.c Lines 254 to 264 in 094f66d
In my opinion there is no performance advantage for bypassing the one-time CPU detection on X86_64 systems. |
|
Benchmarking results (n=100, AMD EPYC 9454 48-Core Processor) |
Fixes the current SSE4.2 requirement added in 1b6ebb1 / #20244
This PR fully enables the existing x86-64 CPU detection and dispatch code for SSSE3, SSE4.1, SSE4.2, AVX, and AVX2 in the base64 module.
To use the existing CPU dispatch from the upstream base64 code, one needs to compile the sources in each of the CPU specific codec directories with a specific compiler flag; alas this is difficult to do with setuptools, but I found a solution inspired by https://stackoverflow.com/a/68508804
Note that I did not enable the AVX512 path with this PR, as many intel CPUs that support AVX512 can come with a performance hit if AVX512 is sporadically used; the performance of the AVX512 (encoding) path need to be evaluated in the context of how mypyc uses base64 in various realistic scenarios. (There is no AVX512 accelerated decoding path in the upstream base64 codebase, it falls back to the avx2 decoder).
If there are additional performance concerns, then I suggest benchmarking with the openmp feature of base64 turned on, for multi-core processing.