Optimize bg4 prediction #308

hoytak · 2025-05-10T00:59:52Z

This PR optimizes the bg4 prediction used in the compression scheme analysis by bypassing the per-byte popcnt call, which can be quite slow. This was shown to be a bottleneck; this PR speeds this operation up.

On Arm/neon, this uses an intrinsic vector popcnt instruction, and on intel it uses a bit twiddling method to calculate the per-byte popcount on u128s that is much more efficient than subsequent popcnt calls.

hoytak · 2025-05-10T20:05:34Z

Method gives a 2x speedup.

hoytak added 2 commits May 9, 2025 17:36

Added optimization to bg4 predection to use simd-like processing.

8288b44

Added check for fallback method.

3ddfbbc

hoytak requested review from ylow and seanses May 10, 2025 00:59

hoytak added 2 commits May 10, 2025 12:12

Checkpoint; not that great.

dea560f

Added missing files.

9cd92ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize bg4 prediction #308

Optimize bg4 prediction #308

hoytak commented May 10, 2025

hoytak commented May 10, 2025

Optimize bg4 prediction #308

Are you sure you want to change the base?

Optimize bg4 prediction #308

Conversation

hoytak commented May 10, 2025

hoytak commented May 10, 2025