Skip to content

Refactor Diff.common_prefix and common_suffix #6

@pzingg

Description

@pzingg

Did some benchmarking on a pair of very long strings and found that the previously sped-up common_prefix could be made 4x faster and use 7x less memory by recursing over codepoints in the strings (e.g. a charlist) rather than Unicode graphemes. There is more logic involved in getting grapheme clusters than parsing out codepoints.

Benchmarks for common_prefix (200_000 character prefix):

Name                 ips        average  deviation         median         99th %
codepoints         87.01       11.49 ms    ±26.02%       11.40 ms       18.60 ms
graphemes          22.35       44.73 ms    ±48.00%       40.46 ms      174.91 ms

Comparison: 
codepoints         87.01
graphemes          22.35 - 3.89x slower +33.24 ms

Memory usage statistics:

Name          Memory usage
codepoints         8.12 MB
graphemes         54.93 MB - 6.77x memory usage +46.81 MB

Also applied the same charlist algorithm to common_suffix

Benchmarks for common_suffix (4_000 character suffix, the "old" benchmark used Enum.reduce and String.char_at instead of recursion).:

Name                 ips        average  deviation         median         99th %
codepoints        5.66 K      176.70 μs    ±45.23%      190.93 μs      365.56 μs
graphemes         1.18 K      844.28 μs    ±26.94%      757.29 μs     1577.57 μs
old            0.00200 K   500887.68 μs     ±4.16%   496024.38 μs   528829.38 μs

Comparison: 
codepoints        5.66 K
graphemes         1.18 K - 4.78x slower +667.58 μs
old            0.00200 K - 2834.64x slower +500710.98 μs

Memory usage statistics:

Name          Memory usage
codepoints      0.00015 GB
graphemes       0.00119 GB - 7.72x memory usage +0.00103 GB
old                1.43 GB - 9317.40x memory usage +1.43 GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions