Skip to content

snapshot: non-temporal memcpy #4869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ripatel-fd
Copy link
Contributor

@ripatel-fd ripatel-fd commented Apr 24, 2025

Closes #4815

@ripatel-fd ripatel-fd force-pushed the ripatel/snapshot-nt-copy branch 2 times, most recently from 126e399 to bff49fd Compare April 24, 2025 20:21
@ripatel-fd ripatel-fd force-pushed the ripatel/snapshot-nt-copy branch from bff49fd to d4f1189 Compare April 24, 2025 20:23
/* NT copy */
while( sz>=512 ) {
# if FD_HAS_AVX512
_mm512_stream_si512( (void *)( dst+ 0UL ), _mm512_loadu_si512( (void const *)( src+ 0UL ) ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit faster to do the stores a few instructions after the loads. The secret is interleaving. This keeps the memory pipeline full.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the CPU's op reorder buffer already do this optimization for us? I'm almost certain the CPU should be able to schedule additional loads whenever it's waiting for a load to complete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deep is the speculative execution buffer?

@jumpsiegel jumpsiegel requested a review from asiegel-jt April 29, 2025 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Snapshot Performance] Investigate non-temporal memory copies
4 participants