-
Notifications
You must be signed in to change notification settings - Fork 77
pref: Optimize memory prefetch strategy by replacing prefetcht2 with prefetchnta #464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
quake
wants to merge
1
commit into
nervosnetwork:develop
Choose a base branch
from
quake:quake/prefetchnta
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On my $ rm Cargo.lock; cargo bench
Running benches/bits_benchmark.rs (target/release/deps/bits_benchmark-2198b6531a9750c2)
Gnuplot not found, using plotters backend
roundup via remainder time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
roundup via bit ops time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
roundup via multication time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
roundup via remainder #2
time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
roundup via bit ops #2 time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
roundup via multication #2
time: [0.0000 ps 0.0000 ps 0.0000 ps]
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
Running benches/vm_benchmark.rs (target/release/deps/vm_benchmark-fbea187d4c738a4c)
Gnuplot not found, using plotters backend
interpret secp256k1_bench
time: [6.0670 ms 6.0788 ms 6.0925 ms]
Found 20 outliers among 100 measurements (20.00%)
8 (8.00%) high mild
12 (12.00%) high severe This PR: Running benches/bits_benchmark.rs (target/release/deps/bits_benchmark-2198b6531a9750c2)
Gnuplot not found, using plotters backend
roundup via remainder time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-47.895% -3.1511% +79.757%] (p = 0.92 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
roundup via bit ops time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-77.445% -53.415% +14.119%] (p = 0.13 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
roundup via multication time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-46.332% +2.2388% +92.464%] (p = 0.95 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
roundup via remainder #2
time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-46.500% -0.8975% +87.882%] (p = 0.98 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
roundup via bit ops #2 time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-42.725% +14.514% +133.60%] (p = 0.75 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
roundup via multication #2
time: [0.0000 ps 0.0000 ps 0.0000 ps]
change: [-48.268% +0.2155% +91.318%] (p = 0.99 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
Running benches/vm_benchmark.rs (target/release/deps/vm_benchmark-fbea187d4c738a4c)
Gnuplot not found, using plotters backend
interpret secp256k1_bench
time: [6.0170 ms 6.0249 ms 6.0342 ms]
change: [-1.1492% -0.8870% -0.6528%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) high mild
13 (13.00%) high severe
|
Executing On develop Running benches/bits_benchmark.rs (target/release/deps/bits_benchmark-d81b136bca03814f)
Gnuplot not found, using plotters backend
Running benches/vm_benchmark.rs (target/release/deps/vm_benchmark-64854b411dd08e91)
Gnuplot not found, using plotters backend
Benchmarking interpret secp256k1_bench via assembly: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
interpret secp256k1_bench via assembly
time: [1.6080 ms 1.6105 ms 1.6133 ms]
change: [-0.1044% +0.1182% +0.3405%] (p = 0.29 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
interpret secp256k1_bench via assembly mop
time: [1.5962 ms 1.5991 ms 1.6025 ms]
change: [+0.0404% +0.2952% +0.5713%] (p = 0.05 > 0.05)
No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop (memoized decoder): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
Benchmarking interpret secp256k1_bench via assembly mop (memoized decoder): Collecting 100 samples in estimated 6.8175 s (
interpret secp256k1_bench via assembly mop (memoized decoder)
time: [1.3446 ms 1.3472 ms 1.3502 ms]
change: [-0.8764% -0.2432% +0.3953%] (p = 0.49 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop (memoized dynamic length decoder): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
Benchmarking interpret secp256k1_bench via assembly mop (memoized dynamic length decoder): Collecting 100 samples in estim
interpret secp256k1_bench via assembly mop (memoized dynamic length decoder)
time: [1.0916 ms 1.0938 ms 1.0966 ms]
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe
This PR: Running benches/bits_benchmark.rs (target/release/deps/bits_benchmark-d81b136bca03814f)
Gnuplot not found, using plotters backend
Running benches/vm_benchmark.rs (target/release/deps/vm_benchmark-64854b411dd08e91)
Gnuplot not found, using plotters backend
Benchmarking interpret secp256k1_bench via assembly: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50.
interpret secp256k1_bench via assembly
time: [1.5726 ms 1.5750 ms 1.5777 ms]
change: [-2.4700% -2.2590% -2.0520%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
interpret secp256k1_bench via assembly mop
time: [1.5896 ms 1.5922 ms 1.5951 ms]
change: [-0.9286% -0.6165% -0.3233%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
7 (7.00%) high mild
9 (9.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop (memoized decoder): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
Benchmarking interpret secp256k1_bench via assembly mop (memoized decoder): Collecting 100 samples in estimated 6.7910 s (
interpret secp256k1_bench via assembly mop (memoized decoder)
time: [1.3417 ms 1.3441 ms 1.3469 ms]
change: [-0.8756% -0.2123% +0.4776%] (p = 0.55 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Benchmarking interpret secp256k1_bench via assembly mop (memoized dynamic length decoder): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 60.
Benchmarking interpret secp256k1_bench via assembly mop (memoized dynamic length decoder): Collecting 100 samples in estim
interpret secp256k1_bench via assembly mop (memoized dynamic length decoder)
time: [1.1151 ms 1.1174 ms 1.1202 ms]
change: [+1.2550% +2.1229% +3.0103%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe |
I created a bash script to run #!/usr/bin/env bash
set -e
for i in {0..20}; do
echo git checkout to develop
git checkout develop
cargo bench "interpret secp256k1_bench via assembly" --features asm
echo git checkout to quake/prefetchnta
git checkout quake/prefetchnta
cargo bench "interpret secp256k1_bench via assembly" --features asm
done The bench result log file: bench.log |
eval-exec
approved these changes
Mar 11, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The prefetchnta instruction is better suited for our trace data access pattern because:
run benchmark multiple times, shows measurable improvements on two different x86 cpus (low and medium spec)