Refactor Diff.common_prefix and common_suffix

Did some benchmarking on a pair of very long strings and found that the previously sped-up common_prefix could be made 4x faster and use 7x less memory by recursing over codepoints in the strings (e.g. a charlist) rather than Unicode graphemes. There is more logic involved in getting grapheme clusters than parsing out codepoints. 

Benchmarks for common_prefix (200_000 character prefix):

```
Name                 ips        average  deviation         median         99th %
codepoints         87.01       11.49 ms    ±26.02%       11.40 ms       18.60 ms
graphemes          22.35       44.73 ms    ±48.00%       40.46 ms      174.91 ms

Comparison: 
codepoints         87.01
graphemes          22.35 - 3.89x slower +33.24 ms

Memory usage statistics:

Name          Memory usage
codepoints         8.12 MB
graphemes         54.93 MB - 6.77x memory usage +46.81 MB
```

Also applied the same charlist algorithm to common_suffix 

Benchmarks for common_suffix (4_000 character suffix, the "old" benchmark used `Enum.reduce` and `String.char_at` instead of recursion).:

```
Name                 ips        average  deviation         median         99th %
codepoints        5.66 K      176.70 μs    ±45.23%      190.93 μs      365.56 μs
graphemes         1.18 K      844.28 μs    ±26.94%      757.29 μs     1577.57 μs
old            0.00200 K   500887.68 μs     ±4.16%   496024.38 μs   528829.38 μs

Comparison: 
codepoints        5.66 K
graphemes         1.18 K - 4.78x slower +667.58 μs
old            0.00200 K - 2834.64x slower +500710.98 μs

Memory usage statistics:

Name          Memory usage
codepoints      0.00015 GB
graphemes       0.00119 GB - 7.72x memory usage +0.00103 GB
old                1.43 GB - 9317.40x memory usage +1.43 GB
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Diff.common_prefix and common_suffix #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Refactor Diff.common_prefix and common_suffix #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions