Skip to content

Replacing rand_chacha with chacha20 #934

@dhardy

Description

@dhardy

@tarcieri has recently implemented ChaCha*Rng via the chacha20 crate. docs are here. Breaking out into a new issue this isn't really the topic of #872...

This brings the question: should we prefer this over the current (@kazcw's) implementation?

First things first, the rand_chacha crate has 44 reverse dependencies, is clearly a rand family crate and is recommended in our book. There is no plan to retire this crate.

Second, lets look at a few stats.

Unsafe

Here are the results of running cargo geiger on the current rand_core (after #931):

$ cargo geiger
<snip>
Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    :) = No `unsafe` usage found, declares #![forbid(unsafe_code)]
    ?  = No `unsafe` usage found, missing #![forbid(unsafe_code)]
    !  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        0/0          0/0    0/0     0/0      ?  rand_chacha 0.2.2
2/2        557/626      0/0    0/0     14/22    !  ├── ppv-lite86 0.2.6
0/0        22/22        0/0    0/0     0/0      !  └── rand_core 0.5.1

2/2        579/648      0/0    0/0     14/22  

And on chacha20:

$ cargo geiger --no-default-features --features rng
<snip>
Functions  Expressions  Impls  Traits  Methods  Dependency

7/14       199/427      0/0    0/0     1/2      !  chacha20 0.3.1
0/0        22/22        0/0    0/0     0/0      !  ├── rand_core 0.5.1
2/4        50/150       1/1    0/0     3/3      !  │   └── getrandom 0.1.14
0/0        0/0          0/0    0/0     0/0      ?  │       ├── cfg-if 0.1.10
2/2        73/95        0/0    0/0     5/11     !  │       └── libc 0.2.66
0/0        0/0          0/0    0/0     0/0      ?  └── stream-cipher 0.3.2
0/0        0/0          0/0    0/0     0/0      ?      ├── blobby 0.1.2
0/1        0/231        0/0    0/0     0/0      ?      │   └── byteorder 1.3.2
0/1        0/223        0/18   0/8     0/5      ?      └── generic-array 0.12.3
0/0        0/51         0/0    0/0     0/0      ?          └── typenum 1.11.2

11/22      344/1199     1/19   0/8     9/21   

Something's wrong here: stream-chiper is only a dev-dependency and rand_core is depended on in exactly the same way as in rand_chacha. So only the first line is relevant.

Lines of code

Tokei output for rand_chacha:

$ tokei rand/rand_chacha/ cryptocorrosion/utils-simd/ppv-lite86/
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Markdown                2           70           70            0            0
 Rust                    9         4285         3864          164          257
 TOML                    2           48           42            0            6
-------------------------------------------------------------------------------
 Total                  13         4403         3976          164          263
-------------------------------------------------------------------------------

(83% of this is from ppv-lite86).

For chacha20:

$ tokei
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Markdown                2          170          170            0            0
 Rust                   12         1617         1163          203          251
 TOML                    1           47           40            0            7
-------------------------------------------------------------------------------
 Total                  15         1834         1373          203          258
-------------------------------------------------------------------------------

There are no dependencies we need, so that's it. A nice improvement.

Benchmarks

(64-bit Haswell)

Here's rand_chacha:

$ cargo bench --bench generators chacha
   Compiling ...

running 16 tests
test gen_bytes_chacha12      ... bench:     356,133 ns/iter (+/- 5,261) = 2875 MB/s
test gen_bytes_chacha20      ... bench:     539,102 ns/iter (+/- 8,237) = 1899 MB/s
test gen_bytes_chacha8       ... bench:     263,023 ns/iter (+/- 15,029) = 3893 MB/s
test gen_u32_chacha12        ... bench:       1,689 ns/iter (+/- 168) = 2368 MB/s
test gen_u32_chacha20        ... bench:       2,475 ns/iter (+/- 60) = 1616 MB/s
test gen_u32_chacha8         ... bench:       1,300 ns/iter (+/- 56) = 3076 MB/s
test gen_u64_chacha12        ... bench:       4,058 ns/iter (+/- 324) = 1971 MB/s
test gen_u64_chacha20        ... bench:       4,472 ns/iter (+/- 647) = 1788 MB/s
test gen_u64_chacha8         ... bench:       3,252 ns/iter (+/- 421) = 2460 MB/s
test init_chacha             ... bench:          30 ns/iter (+/- 7)

and here's chacha20:

running 16 tests
test gen_bytes_chacha12      ... bench:   1,184,161 ns/iter (+/- 75,680) = 864 MB/s
test gen_bytes_chacha20      ... bench:   1,894,244 ns/iter (+/- 16,309) = 540 MB/s
test gen_bytes_chacha8       ... bench:     824,644 ns/iter (+/- 52,944) = 1241 MB/s
test gen_u32_chacha12        ... bench:       4,607 ns/iter (+/- 97) = 868 MB/s
test gen_u32_chacha20        ... bench:       7,351 ns/iter (+/- 121) = 544 MB/s
test gen_u32_chacha8         ... bench:       3,257 ns/iter (+/- 407) = 1228 MB/s
test gen_u64_chacha12        ... bench:       9,396 ns/iter (+/- 1,104) = 851 MB/s
test gen_u64_chacha20        ... bench:      14,856 ns/iter (+/- 750) = 538 MB/s
test gen_u64_chacha8         ... bench:       6,431 ns/iter (+/- 625) = 1243 MB/s
test init_chacha             ... bench:          14 ns/iter (+/- 0)

Hmm, looks like chacha20 needs some help:

$ export RUSTFLAGS="-C target-cpu=native"
$ cargo bench --bench generators chacha
...
test gen_bytes_chacha12      ... bench:     500,699 ns/iter (+/- 28,157) = 2045 MB/s
test gen_bytes_chacha20      ... bench:     789,107 ns/iter (+/- 52,585) = 1297 MB/s
test gen_bytes_chacha8       ... bench:     357,024 ns/iter (+/- 23,501) = 2868 MB/s
test gen_u32_chacha12        ... bench:       2,139 ns/iter (+/- 56) = 1870 MB/s
test gen_u32_chacha20        ... bench:       3,260 ns/iter (+/- 116) = 1226 MB/s
test gen_u32_chacha8         ... bench:       1,571 ns/iter (+/- 32) = 2546 MB/s
test gen_u64_chacha12        ... bench:       4,472 ns/iter (+/- 55) = 1788 MB/s
test gen_u64_chacha20        ... bench:       6,714 ns/iter (+/- 421) = 1191 MB/s
test gen_u64_chacha8         ... bench:       3,338 ns/iter (+/- 46) = 2396 MB/s

Closer, but still behind rand_chacha (which gets negligible boost from target-cpu=native thanks to auto-detection).

Running chacha20's built-in benchmarks, I get around 6-6.2 cycles/byte without target-cpu=native, and 2.5-2.7 with; this is significantly short of the 1.4 cycles/byte @tarcieri claims so something gives here (perhaps just CPU-specific optimisations).


Of course, there's more to this than a few stats, and number-of-unsafe-usages is not a particularly useful comparison (since it says nothing about the size of the unsafe blocks). This is all I have time for right now. Thanks to all authors (also significantly @newpavlov).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions