Skip to content

Conversation

quake
Copy link
Member

@quake quake commented Mar 7, 2025

The prior approach relied heavily on the lower 8 bits of pc and some higher bits (bit 12 and beyond). While this worked for local operations, it often resulted in local hotspots, where certain cache indices were accessed disproportionately. (we may output all pc values and run a script to check the cache utilization)

This PR introduces bit shifts (>> 5 and << 1) and XOR (^) to improve cache index distribution:

  • pc >> 5 ensures that higher bits contribute to indexing, reducing excessive clustering in local address ranges.
  • pc << 1 spreads lower-bit information across a broader index range, improving cache efficiency.
  • ^ (XOR) further disperses address patterns, minimizing cache collisions and improving hit rates.

I tested on 3 different machines, got a 2%~4% improvement

cargo bench "interpret secp256k1_bench via assembly" --features asm`

interpret secp256k1_bench via assembly
                        time:   [3.5024 ms 3.5040 ms 3.5058 ms]
                        change: [-2.9627% -2.8742% -2.7923%] (p = 0.00 < 0.05)
                        Performance has improved.

@quake quake requested review from xxuejie, XuJiandong and mohanson March 7, 2025 01:03
@mohanson
Copy link
Collaborator

mohanson commented Mar 7, 2025

My local test results are consistent with the description of the PR 👏

@xxuejie xxuejie merged commit e768fd9 into release-0.24 Mar 7, 2025
22 checks passed
@xxuejie xxuejie deleted the quake/tweak-cache-key branch March 7, 2025 02:59
@xxuejie
Copy link
Collaborator

xxuejie commented Mar 7, 2025

@quake Can you also cherry-pick this to the develop branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants