perf: optimize instruction cache indexing to reduce local hotspots #459

quake · 2025-03-07T01:03:17Z

The prior approach relied heavily on the lower 8 bits of pc and some higher bits (bit 12 and beyond). While this worked for local operations, it often resulted in local hotspots, where certain cache indices were accessed disproportionately. (we may output all pc values and run a script to check the cache utilization)

This PR introduces bit shifts (>> 5 and << 1) and XOR (^) to improve cache index distribution:

pc >> 5 ensures that higher bits contribute to indexing, reducing excessive clustering in local address ranges.
pc << 1 spreads lower-bit information across a broader index range, improving cache efficiency.
^ (XOR) further disperses address patterns, minimizing cache collisions and improving hit rates.

I tested on 3 different machines, got a 2%~4% improvement

cargo bench "interpret secp256k1_bench via assembly" --features asm`

interpret secp256k1_bench via assembly
                        time:   [3.5024 ms 3.5040 ms 3.5058 ms]
                        change: [-2.9627% -2.8742% -2.7923%] (p = 0.00 < 0.05)
                        Performance has improved.

mohanson · 2025-03-07T01:46:35Z

My local test results are consistent with the description of the PR 👏

xxuejie · 2025-03-07T02:59:52Z

@quake Can you also cherry-pick this to the develop branch?

perf: optimize instruction cache indexing to reduce local hotspots

5ee2237

quake requested review from xxuejie, XuJiandong and mohanson March 7, 2025 01:03

xxuejie approved these changes Mar 7, 2025

View reviewed changes

mohanson approved these changes Mar 7, 2025

View reviewed changes

xxuejie merged commit e768fd9 into release-0.24 Mar 7, 2025
22 checks passed

xxuejie deleted the quake/tweak-cache-key branch March 7, 2025 02:59

quake mentioned this pull request Mar 7, 2025

perf: optimize instruction cache indexing to reduce local hotspots #460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize instruction cache indexing to reduce local hotspots #459

perf: optimize instruction cache indexing to reduce local hotspots #459

Uh oh!

quake commented Mar 7, 2025

Uh oh!

mohanson commented Mar 7, 2025

Uh oh!

Uh oh!

xxuejie commented Mar 7, 2025

Uh oh!

Uh oh!

perf: optimize instruction cache indexing to reduce local hotspots #459

perf: optimize instruction cache indexing to reduce local hotspots #459

Uh oh!

Conversation

quake commented Mar 7, 2025

Uh oh!

mohanson commented Mar 7, 2025

Uh oh!

Uh oh!

xxuejie commented Mar 7, 2025

Uh oh!

Uh oh!