-
-
Notifications
You must be signed in to change notification settings - Fork 470
Description
@tarcieri has recently implemented ChaCha*Rng via the chacha20
crate. docs are here. Breaking out into a new issue this isn't really the topic of #872...
This brings the question: should we prefer this over the current (@kazcw's) implementation?
First things first, the rand_chacha
crate has 44 reverse dependencies, is clearly a rand family crate and is recommended in our book. There is no plan to retire this crate.
Second, lets look at a few stats.
Unsafe
Here are the results of running cargo geiger
on the current rand_core
(after #931):
$ cargo geiger
<snip>
Metric output format: x/y
x = unsafe code used by the build
y = total unsafe code found in the crate
Symbols:
:) = No `unsafe` usage found, declares #![forbid(unsafe_code)]
? = No `unsafe` usage found, missing #![forbid(unsafe_code)]
! = `unsafe` usage found
Functions Expressions Impls Traits Methods Dependency
0/0 0/0 0/0 0/0 0/0 ? rand_chacha 0.2.2
2/2 557/626 0/0 0/0 14/22 ! ├── ppv-lite86 0.2.6
0/0 22/22 0/0 0/0 0/0 ! └── rand_core 0.5.1
2/2 579/648 0/0 0/0 14/22
And on chacha20
:
$ cargo geiger --no-default-features --features rng
<snip>
Functions Expressions Impls Traits Methods Dependency
7/14 199/427 0/0 0/0 1/2 ! chacha20 0.3.1
0/0 22/22 0/0 0/0 0/0 ! ├── rand_core 0.5.1
2/4 50/150 1/1 0/0 3/3 ! │ └── getrandom 0.1.14
0/0 0/0 0/0 0/0 0/0 ? │ ├── cfg-if 0.1.10
2/2 73/95 0/0 0/0 5/11 ! │ └── libc 0.2.66
0/0 0/0 0/0 0/0 0/0 ? └── stream-cipher 0.3.2
0/0 0/0 0/0 0/0 0/0 ? ├── blobby 0.1.2
0/1 0/231 0/0 0/0 0/0 ? │ └── byteorder 1.3.2
0/1 0/223 0/18 0/8 0/5 ? └── generic-array 0.12.3
0/0 0/51 0/0 0/0 0/0 ? └── typenum 1.11.2
11/22 344/1199 1/19 0/8 9/21
Something's wrong here: stream-chiper
is only a dev-dependency and rand_core
is depended on in exactly the same way as in rand_chacha
. So only the first line is relevant.
Lines of code
Tokei output for rand_chacha
:
$ tokei rand/rand_chacha/ cryptocorrosion/utils-simd/ppv-lite86/
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
Markdown 2 70 70 0 0
Rust 9 4285 3864 164 257
TOML 2 48 42 0 6
-------------------------------------------------------------------------------
Total 13 4403 3976 164 263
-------------------------------------------------------------------------------
(83% of this is from ppv-lite86
).
For chacha20
:
$ tokei
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
Markdown 2 170 170 0 0
Rust 12 1617 1163 203 251
TOML 1 47 40 0 7
-------------------------------------------------------------------------------
Total 15 1834 1373 203 258
-------------------------------------------------------------------------------
There are no dependencies we need, so that's it. A nice improvement.
Benchmarks
(64-bit Haswell)
Here's rand_chacha
:
$ cargo bench --bench generators chacha
Compiling ...
running 16 tests
test gen_bytes_chacha12 ... bench: 356,133 ns/iter (+/- 5,261) = 2875 MB/s
test gen_bytes_chacha20 ... bench: 539,102 ns/iter (+/- 8,237) = 1899 MB/s
test gen_bytes_chacha8 ... bench: 263,023 ns/iter (+/- 15,029) = 3893 MB/s
test gen_u32_chacha12 ... bench: 1,689 ns/iter (+/- 168) = 2368 MB/s
test gen_u32_chacha20 ... bench: 2,475 ns/iter (+/- 60) = 1616 MB/s
test gen_u32_chacha8 ... bench: 1,300 ns/iter (+/- 56) = 3076 MB/s
test gen_u64_chacha12 ... bench: 4,058 ns/iter (+/- 324) = 1971 MB/s
test gen_u64_chacha20 ... bench: 4,472 ns/iter (+/- 647) = 1788 MB/s
test gen_u64_chacha8 ... bench: 3,252 ns/iter (+/- 421) = 2460 MB/s
test init_chacha ... bench: 30 ns/iter (+/- 7)
and here's chacha20
:
running 16 tests
test gen_bytes_chacha12 ... bench: 1,184,161 ns/iter (+/- 75,680) = 864 MB/s
test gen_bytes_chacha20 ... bench: 1,894,244 ns/iter (+/- 16,309) = 540 MB/s
test gen_bytes_chacha8 ... bench: 824,644 ns/iter (+/- 52,944) = 1241 MB/s
test gen_u32_chacha12 ... bench: 4,607 ns/iter (+/- 97) = 868 MB/s
test gen_u32_chacha20 ... bench: 7,351 ns/iter (+/- 121) = 544 MB/s
test gen_u32_chacha8 ... bench: 3,257 ns/iter (+/- 407) = 1228 MB/s
test gen_u64_chacha12 ... bench: 9,396 ns/iter (+/- 1,104) = 851 MB/s
test gen_u64_chacha20 ... bench: 14,856 ns/iter (+/- 750) = 538 MB/s
test gen_u64_chacha8 ... bench: 6,431 ns/iter (+/- 625) = 1243 MB/s
test init_chacha ... bench: 14 ns/iter (+/- 0)
Hmm, looks like chacha20
needs some help:
$ export RUSTFLAGS="-C target-cpu=native"
$ cargo bench --bench generators chacha
...
test gen_bytes_chacha12 ... bench: 500,699 ns/iter (+/- 28,157) = 2045 MB/s
test gen_bytes_chacha20 ... bench: 789,107 ns/iter (+/- 52,585) = 1297 MB/s
test gen_bytes_chacha8 ... bench: 357,024 ns/iter (+/- 23,501) = 2868 MB/s
test gen_u32_chacha12 ... bench: 2,139 ns/iter (+/- 56) = 1870 MB/s
test gen_u32_chacha20 ... bench: 3,260 ns/iter (+/- 116) = 1226 MB/s
test gen_u32_chacha8 ... bench: 1,571 ns/iter (+/- 32) = 2546 MB/s
test gen_u64_chacha12 ... bench: 4,472 ns/iter (+/- 55) = 1788 MB/s
test gen_u64_chacha20 ... bench: 6,714 ns/iter (+/- 421) = 1191 MB/s
test gen_u64_chacha8 ... bench: 3,338 ns/iter (+/- 46) = 2396 MB/s
Closer, but still behind rand_chacha
(which gets negligible boost from target-cpu=native
thanks to auto-detection).
Running chacha20
's built-in benchmarks, I get around 6-6.2 cycles/byte without target-cpu=native
, and 2.5-2.7 with; this is significantly short of the 1.4 cycles/byte @tarcieri claims so something gives here (perhaps just CPU-specific optimisations).
Of course, there's more to this than a few stats, and number-of-unsafe
-usages is not a particularly useful comparison (since it says nothing about the size of the unsafe
blocks). This is all I have time for right now. Thanks to all authors (also significantly @newpavlov).