Skip to content

Conversation

@mr-c
Copy link
Contributor

@mr-c mr-c commented Nov 18, 2025

Fixes the current SSE4.2 requirement added in 1b6ebb1 / #20244

This PR fully enables the existing x86-64 CPU detection and dispatch code for SSSE3, SSE4.1, SSE4.2, AVX, and AVX2 in the base64 module.

To use the existing CPU dispatch from the upstream base64 code, one needs to compile the sources in each of the CPU specific codec directories with a specific compiler flag; alas this is difficult to do with setuptools, but I found a solution inspired by https://stackoverflow.com/a/68508804

Note that I did not enable the AVX512 path with this PR, as many intel CPUs that support AVX512 can come with a performance hit if AVX512 is sporadically used; the performance of the AVX512 (encoding) path need to be evaluated in the context of how mypyc uses base64 in various realistic scenarios. (There is no AVX512 accelerated decoding path in the upstream base64 codebase, it falls back to the avx2 decoder).

If there are additional performance concerns, then I suggest benchmarking with the openmp feature of base64 turned on, for multi-core processing.

@mr-c mr-c force-pushed the librt_base64_simd_cpu_dispatch branch from b611c27 to 067b1b8 Compare November 18, 2025 12:24
@github-actions

This comment has been minimized.

@mr-c mr-c force-pushed the librt_base64_simd_cpu_dispatch branch from 0ce45ae to cca1dc2 Compare November 18, 2025 14:17
@github-actions

This comment has been minimized.

@mr-c mr-c force-pushed the librt_base64_simd_cpu_dispatch branch 2 times, most recently from c584ff9 to 6e96889 Compare November 18, 2025 15:00
@github-actions

This comment has been minimized.

@mr-c mr-c force-pushed the librt_base64_simd_cpu_dispatch branch from 6e96889 to d270f09 Compare November 18, 2025 16:42
@mr-c mr-c changed the title librt base64: use existing SIMD CPU dispatch by customizing build flags [mypyc] librt base64: use existing SIMD CPU dispatch by customizing build flags Nov 18, 2025
@github-actions

This comment has been minimized.

@mr-c mr-c force-pushed the librt_base64_simd_cpu_dispatch branch from d270f09 to a0c90f5 Compare November 18, 2025 17:27
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@jhance
Copy link
Collaborator

jhance commented Nov 19, 2025

I think there should also be a documented flag for setting the hardware floor (mostly useful for avx2 in order to avoid CPU dispatch if you know your hardware supports it)

@mr-c
Copy link
Contributor Author

mr-c commented Nov 19, 2025

I think there should also be a documented flag for setting the hardware floor (mostly useful for avx2 in order to avoid CPU dispatch if you know your hardware supports it)

Thank you for the review @jhance ; is adding a flag for setting the hardware floor a blocker for merging?

According to upstream, and confirmed by my review of the code, the codec choice (as a result of the CPU detection) is only done once and is saved for the lifetime of the program.

// These static function pointers are initialized once when the library is
// first used, and remain in use for the remaining lifetime of the program.
// The idea being that CPU features don't change at runtime.
static struct codec codec = { NULL, NULL };

Prior to this PR, the CPU detection was already being run: on X86_64 systems BASE64_WITH_SSE42 was always defined, therefore HAVE_SSE42 was always defined prior to confirming the CPU support for SSE4.2

#if HAVE_SSE42
// Check for SSE42 support:
if (max_level >= 1) {
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_SSE42) {
codec->enc = base64_stream_encode_sse42;
codec->dec = base64_stream_decode_sse42;
return true;
}
}
#endif

In my opinion there is no performance advantage for bypassing the one-time CPU detection on X86_64 systems.

@mr-c
Copy link
Contributor Author

mr-c commented Nov 19, 2025

Benchmarking results (n=100, AMD EPYC 9454 48-Core Processor)

master                         6.098s (0.0%)  | stdev 0.059s 
librt_base64_simd_cpu_dispatch 6.076s (-0.4%) | stdev 0.070s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants