perf: optimize instruction cache indexing to reduce local hotspots #459
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The prior approach relied heavily on the lower 8 bits of
pc
and some higher bits (bit 12 and beyond). While this worked for local operations, it often resulted in local hotspots, where certain cache indices were accessed disproportionately. (we may output all pc values and run a script to check the cache utilization)This PR introduces bit shifts (
>> 5
and<< 1
) and XOR (^
) to improve cache index distribution:pc >> 5
ensures that higher bits contribute to indexing, reducing excessive clustering in local address ranges.pc << 1
spreads lower-bit information across a broader index range, improving cache efficiency.^
(XOR) further disperses address patterns, minimizing cache collisions and improving hit rates.I tested on 3 different machines, got a 2%~4% improvement