-
Notifications
You must be signed in to change notification settings - Fork 31
Simplify gen_matrix() and poly_rej_uniform_x4() #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rod-chapman Can you elaborate on your motivation for making the change to poly_rej_uniform_x4()
? Is it enabling further work? We should not spend time micro-optimizing proofs that are already acceptably fast (which 20s is to me).
For me, it's just much more obviously correct this way. It removes the need to understand (and the reliance on) how arrays are laid out in memory and the need to write over more than one array element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change of signature mlk_poly_rej_uniform_x4()
is fine to me if it helps further experimentation on #906. Please revert the other unrelated change -- I think it's a micro-optimization that hinders readability.
5b99285
to
575696b
Compare
I will disagree-and-commit on this issue. Reverted, squashed, commit message updated, and pushed. |
575696b
to
5831ec5
Compare
1. Simplify poly_rej_uniform_x4() API to explicitly take 4 objects to write, rather than a slice of a larger array. 2. Modify gen_matrix() in light of that. 3. Update test/bench_components_mlkem.c Signed-off-by: Rod Chapman <rodchap@amazon.com>
5831ec5
to
624f56d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
12306 cycles |
12302 cycles |
1.00 |
ML-KEM-512 encaps |
14704 cycles |
14894 cycles |
0.99 |
ML-KEM-512 decaps |
19247 cycles |
19242 cycles |
1.00 |
ML-KEM-768 keypair |
21360 cycles |
21359 cycles |
1.00 |
ML-KEM-768 encaps |
23571 cycles |
23565 cycles |
1.00 |
ML-KEM-768 decaps |
30139 cycles |
30141 cycles |
1.00 |
ML-KEM-1024 keypair |
30353 cycles |
30338 cycles |
1.00 |
ML-KEM-1024 encaps |
34539 cycles |
34661 cycles |
1.00 |
ML-KEM-1024 decaps |
44338 cycles |
44158 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
9693 cycles |
9671 cycles |
1.00 |
ML-KEM-512 encaps |
10878 cycles |
10829 cycles |
1.00 |
ML-KEM-512 decaps |
15067 cycles |
15038 cycles |
1.00 |
ML-KEM-768 keypair |
16394 cycles |
16282 cycles |
1.01 |
ML-KEM-768 encaps |
17495 cycles |
17358 cycles |
1.01 |
ML-KEM-768 decaps |
23168 cycles |
23008 cycles |
1.01 |
ML-KEM-1024 keypair |
21623 cycles |
21745 cycles |
0.99 |
ML-KEM-1024 encaps |
23681 cycles |
23718 cycles |
1.00 |
ML-KEM-1024 decaps |
31481 cycles |
31460 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28977 cycles |
28708 cycles |
1.01 |
ML-KEM-512 encaps |
34410 cycles |
34633 cycles |
0.99 |
ML-KEM-512 decaps |
44364 cycles |
44353 cycles |
1.00 |
ML-KEM-768 keypair |
48316 cycles |
48085 cycles |
1.00 |
ML-KEM-768 encaps |
56278 cycles |
55726 cycles |
1.01 |
ML-KEM-768 decaps |
68193 cycles |
67492 cycles |
1.01 |
ML-KEM-1024 keypair |
71819 cycles |
73152 cycles |
0.98 |
ML-KEM-1024 encaps |
84168 cycles |
84937 cycles |
0.99 |
ML-KEM-1024 decaps |
99666 cycles |
99269 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
52412 cycles |
52109 cycles |
1.01 |
ML-KEM-512 encaps |
60198 cycles |
59857 cycles |
1.01 |
ML-KEM-512 decaps |
76713 cycles |
76813 cycles |
1.00 |
ML-KEM-768 keypair |
89158 cycles |
89346 cycles |
1.00 |
ML-KEM-768 encaps |
96482 cycles |
97715 cycles |
0.99 |
ML-KEM-768 decaps |
119992 cycles |
120612 cycles |
0.99 |
ML-KEM-1024 keypair |
132324 cycles |
131604 cycles |
1.01 |
ML-KEM-1024 encaps |
145133 cycles |
144921 cycles |
1.00 |
ML-KEM-1024 decaps |
178034 cycles |
177617 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
11664 cycles |
11593 cycles |
1.01 |
ML-KEM-512 encaps |
13307 cycles |
13263 cycles |
1.00 |
ML-KEM-512 decaps |
18042 cycles |
18133 cycles |
0.99 |
ML-KEM-768 keypair |
20062 cycles |
20193 cycles |
0.99 |
ML-KEM-768 encaps |
21078 cycles |
21097 cycles |
1.00 |
ML-KEM-768 decaps |
28216 cycles |
28317 cycles |
1.00 |
ML-KEM-1024 keypair |
27317 cycles |
27042 cycles |
1.01 |
ML-KEM-1024 encaps |
29063 cycles |
29073 cycles |
1.00 |
ML-KEM-1024 decaps |
38734 cycles |
38561 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
17248 cycles |
17106 cycles |
1.01 |
ML-KEM-512 encaps |
18991 cycles |
18845 cycles |
1.01 |
ML-KEM-512 decaps |
24425 cycles |
24238 cycles |
1.01 |
ML-KEM-768 keypair |
29361 cycles |
29373 cycles |
1.00 |
ML-KEM-768 encaps |
30528 cycles |
30558 cycles |
1.00 |
ML-KEM-768 decaps |
38370 cycles |
38399 cycles |
1.00 |
ML-KEM-1024 keypair |
43031 cycles |
42700 cycles |
1.01 |
ML-KEM-1024 encaps |
44881 cycles |
44927 cycles |
1.00 |
ML-KEM-1024 decaps |
55396 cycles |
55394 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
36681 cycles |
36203 cycles |
1.01 |
ML-KEM-512 encaps |
43137 cycles |
42835 cycles |
1.01 |
ML-KEM-512 decaps |
56101 cycles |
55853 cycles |
1.00 |
ML-KEM-768 keypair |
59829 cycles |
59952 cycles |
1.00 |
ML-KEM-768 encaps |
68131 cycles |
68249 cycles |
1.00 |
ML-KEM-768 decaps |
85739 cycles |
85678 cycles |
1.00 |
ML-KEM-1024 keypair |
87413 cycles |
87685 cycles |
1.00 |
ML-KEM-1024 encaps |
99563 cycles |
99600 cycles |
1.00 |
ML-KEM-1024 decaps |
121279 cycles |
121303 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28869 cycles |
28853 cycles |
1.00 |
ML-KEM-512 encaps |
34131 cycles |
34023 cycles |
1.00 |
ML-KEM-512 decaps |
44556 cycles |
44635 cycles |
1.00 |
ML-KEM-768 keypair |
49208 cycles |
49201 cycles |
1.00 |
ML-KEM-768 encaps |
54279 cycles |
54287 cycles |
1.00 |
ML-KEM-768 decaps |
69129 cycles |
69150 cycles |
1.00 |
ML-KEM-1024 keypair |
71654 cycles |
71497 cycles |
1.00 |
ML-KEM-1024 encaps |
79879 cycles |
79875 cycles |
1.00 |
ML-KEM-1024 decaps |
99907 cycles |
100023 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
17912 cycles |
17917 cycles |
1.00 |
ML-KEM-512 encaps |
21015 cycles |
20978 cycles |
1.00 |
ML-KEM-512 decaps |
27649 cycles |
27615 cycles |
1.00 |
ML-KEM-768 keypair |
30898 cycles |
30900 cycles |
1.00 |
ML-KEM-768 encaps |
33555 cycles |
33566 cycles |
1.00 |
ML-KEM-768 decaps |
43132 cycles |
43170 cycles |
1.00 |
ML-KEM-1024 keypair |
44618 cycles |
44636 cycles |
1.00 |
ML-KEM-1024 encaps |
49604 cycles |
49587 cycles |
1.00 |
ML-KEM-1024 decaps |
62595 cycles |
62606 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
38419 cycles |
38511 cycles |
1.00 |
ML-KEM-512 encaps |
46679 cycles |
46823 cycles |
1.00 |
ML-KEM-512 decaps |
60024 cycles |
60266 cycles |
1.00 |
ML-KEM-768 keypair |
63829 cycles |
63647 cycles |
1.00 |
ML-KEM-768 encaps |
74355 cycles |
74267 cycles |
1.00 |
ML-KEM-768 decaps |
92303 cycles |
92275 cycles |
1.00 |
ML-KEM-1024 keypair |
94273 cycles |
94093 cycles |
1.00 |
ML-KEM-1024 encaps |
108245 cycles |
107834 cycles |
1.00 |
ML-KEM-1024 decaps |
130849 cycles |
130382 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
16089 cycles |
16066 cycles |
1.00 |
ML-KEM-512 encaps |
18258 cycles |
18228 cycles |
1.00 |
ML-KEM-512 decaps |
24727 cycles |
24691 cycles |
1.00 |
ML-KEM-768 keypair |
27733 cycles |
29892 cycles |
0.93 |
ML-KEM-768 encaps |
29418 cycles |
29462 cycles |
1.00 |
ML-KEM-768 decaps |
38796 cycles |
38836 cycles |
1.00 |
ML-KEM-1024 keypair |
36937 cycles |
37346 cycles |
0.99 |
ML-KEM-1024 encaps |
40217 cycles |
39511 cycles |
1.02 |
ML-KEM-1024 decaps |
52980 cycles |
52972 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
19078 cycles |
19079 cycles |
1.00 |
ML-KEM-512 encaps |
22313 cycles |
22323 cycles |
1.00 |
ML-KEM-512 decaps |
29558 cycles |
29568 cycles |
1.00 |
ML-KEM-768 keypair |
32615 cycles |
32597 cycles |
1.00 |
ML-KEM-768 encaps |
35639 cycles |
35682 cycles |
1.00 |
ML-KEM-768 decaps |
45969 cycles |
46008 cycles |
1.00 |
ML-KEM-1024 keypair |
46903 cycles |
46884 cycles |
1.00 |
ML-KEM-1024 encaps |
52137 cycles |
52150 cycles |
1.00 |
ML-KEM-1024 decaps |
66024 cycles |
66043 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
35775 cycles |
35803 cycles |
1.00 |
ML-KEM-512 encaps |
40670 cycles |
40661 cycles |
1.00 |
ML-KEM-512 decaps |
52114 cycles |
52075 cycles |
1.00 |
ML-KEM-768 keypair |
59529 cycles |
59552 cycles |
1.00 |
ML-KEM-768 encaps |
66475 cycles |
66613 cycles |
1.00 |
ML-KEM-768 decaps |
81517 cycles |
81054 cycles |
1.01 |
ML-KEM-1024 keypair |
88515 cycles |
88499 cycles |
1.00 |
ML-KEM-1024 encaps |
98498 cycles |
98567 cycles |
1.00 |
ML-KEM-1024 decaps |
117306 cycles |
117295 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
46399 cycles |
46317 cycles |
1.00 |
ML-KEM-512 encaps |
54832 cycles |
54579 cycles |
1.00 |
ML-KEM-512 decaps |
70291 cycles |
70062 cycles |
1.00 |
ML-KEM-768 keypair |
75918 cycles |
75201 cycles |
1.01 |
ML-KEM-768 encaps |
86789 cycles |
86339 cycles |
1.01 |
ML-KEM-768 decaps |
106772 cycles |
106477 cycles |
1.00 |
ML-KEM-1024 keypair |
111181 cycles |
110996 cycles |
1.00 |
ML-KEM-1024 encaps |
125003 cycles |
125147 cycles |
1.00 |
ML-KEM-1024 decaps |
150348 cycles |
150615 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
59351 cycles |
59295 cycles |
1.00 |
ML-KEM-512 encaps |
67938 cycles |
67869 cycles |
1.00 |
ML-KEM-512 decaps |
86627 cycles |
86570 cycles |
1.00 |
ML-KEM-768 keypair |
100193 cycles |
98822 cycles |
1.01 |
ML-KEM-768 encaps |
110278 cycles |
109982 cycles |
1.00 |
ML-KEM-768 decaps |
134707 cycles |
135092 cycles |
1.00 |
ML-KEM-1024 keypair |
149548 cycles |
148315 cycles |
1.01 |
ML-KEM-1024 encaps |
164750 cycles |
163796 cycles |
1.01 |
ML-KEM-1024 decaps |
196178 cycles |
195279 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
38862 cycles |
38951 cycles |
1.00 |
ML-KEM-512 encaps |
44668 cycles |
44913 cycles |
0.99 |
ML-KEM-512 decaps |
56479 cycles |
56689 cycles |
1.00 |
ML-KEM-768 keypair |
64398 cycles |
64407 cycles |
1.00 |
ML-KEM-768 encaps |
71585 cycles |
71672 cycles |
1.00 |
ML-KEM-768 decaps |
87533 cycles |
88074 cycles |
0.99 |
ML-KEM-1024 keypair |
95977 cycles |
95496 cycles |
1.01 |
ML-KEM-1024 encaps |
106223 cycles |
106325 cycles |
1.00 |
ML-KEM-1024 decaps |
126885 cycles |
126840 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
59514 cycles |
59457 cycles |
1.00 |
ML-KEM-512 encaps |
66792 cycles |
66622 cycles |
1.00 |
ML-KEM-512 decaps |
85337 cycles |
85254 cycles |
1.00 |
ML-KEM-768 keypair |
101306 cycles |
101378 cycles |
1.00 |
ML-KEM-768 encaps |
112204 cycles |
112384 cycles |
1.00 |
ML-KEM-768 decaps |
138953 cycles |
138766 cycles |
1.00 |
ML-KEM-1024 keypair |
153745 cycles |
153640 cycles |
1.00 |
ML-KEM-1024 encaps |
170807 cycles |
171122 cycles |
1.00 |
ML-KEM-1024 decaps |
207018 cycles |
206445 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rod-chapman. The change as it stands right now makes sense to me.
It aligns the poly_rej_uniform_x4
signature with the poly_getnoise_etaX_x4
signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
226703 cycles |
226252 cycles |
1.00 |
ML-KEM-512 encaps |
270691 cycles |
270334 cycles |
1.00 |
ML-KEM-512 decaps |
345000 cycles |
344528 cycles |
1.00 |
ML-KEM-768 keypair |
375749 cycles |
377767 cycles |
0.99 |
ML-KEM-768 encaps |
432806 cycles |
434624 cycles |
1.00 |
ML-KEM-768 decaps |
529256 cycles |
532038 cycles |
0.99 |
ML-KEM-1024 keypair |
556116 cycles |
556966 cycles |
1.00 |
ML-KEM-1024 encaps |
632570 cycles |
632956 cycles |
1.00 |
ML-KEM-1024 decaps |
754827 cycles |
754220 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Benchmark suite | Current: 624f56d | Previous: 65e4512 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28832 cycles |
28848 cycles |
1.00 |
ML-KEM-512 encaps |
34044 cycles |
34093 cycles |
1.00 |
ML-KEM-512 decaps |
44647 cycles |
44556 cycles |
1.00 |
ML-KEM-768 keypair |
49185 cycles |
49194 cycles |
1.00 |
ML-KEM-768 encaps |
54269 cycles |
54242 cycles |
1.00 |
ML-KEM-768 decaps |
69126 cycles |
69158 cycles |
1.00 |
ML-KEM-1024 keypair |
71615 cycles |
71608 cycles |
1.00 |
ML-KEM-1024 encaps |
79961 cycles |
79843 cycles |
1.00 |
ML-KEM-1024 decaps |
99989 cycles |
99967 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Simplify gen_matrix() and poly_rej_uniform_x4()
Proof time of poly_rej_uniform_x4() reduces from 20s to 9s on Apple M1.
Proof time of gen_matrix() unchanged at 8s
Runtime performance benchmarks show no significant change.