perf(decode): optimize Vec alloc/resize#179
perf(decode): optimize Vec alloc/resize#179AaronO wants to merge 1 commit intomarshallpierce:masterfrom
Conversation
Avoid calloc overhead of initializing buffer we'll write into Unfortunately requires unsafe Vec::set_len so we can get a mutable ref to the uninit portion of the Vec's buffer
|
FYI this also fixes a "bug" in let mut buffer = Vec::<u8>::with_capacity(input.as_ref().len() * 4 / 3);This approximately allocates with a ratio of 4/3 instead of 3/4 (without accounting for padding, etc...) |
| buffer.reserve(len_estimate); | ||
| unsafe { | ||
| buffer.set_len(total_len_estimate); | ||
| } |
There was a problem hiding this comment.
The safety documentation of set_len is pretty clear that this is UB. Namely it states
The elements at
old_len..new_lenmust be initialized.
But total_len_estimate = starting_output_len + len_estimate, which means the last len_estimate elements are not properly initialized.
There was a problem hiding this comment.
@jonasbb It's not UB or unsafe in aggregate since write into that "unintialized" portion of the buffer and then truncate to the true length.
This avoids the overhead of callocing/initializing a buffer that we'll immediately overwrite
There was a problem hiding this comment.
The documentation mentions that as a safety requirement, and this code is not upholding it. Later, the code also creates a reference to this uninitialized memory, which is also not allowed.
Some functions like decode could also panic and unwind before truncate could set the correct length. Now, the drop implementation of Vec might potentially read the uninitialized values.
If you really need uninitialized memory you need to write to it via a pointer and update the length only after writing. Alternatively, using MaybeUninit is an option. If the buffer would be of type Vec<MaybeUninit<u8>>, then I think the set_len would be ok, but it isn't.
|
Thanks for this work. However, I'm not sure who the target user would be -- if you're ok with unsafe, the forthcoming AVX2 version will surely outperform it, and if you're not, you'd probably want the existing safe version. 🤔 Good catch on the over-allocation, too! |
|
Hi. There is another crate base64-simd for performance. It is highly unsafe but much faster than |
Avoid calloc overhead of initializing buffer we'll write into, improving
decode_small_input/decode/3by -33%Unfortunately requires unsafe Vec::set_len so we can get a mutable ref to the uninit portion of the Vec's buffer
Before
After