Open
Description
Hi and thanks for the great package!
I found the following snippet (modified from (dist::Partial)(s1, s2)
) useful and I was wondering if it would be worth adding to StringDistances? I could make a PR if so.
julia> function Base.findmin(s1, s2, dist::Partial; max_dist = 1.0)
s1, s2 = StringDistances.reorder(s1, s2)
len1, len2 = length(s1), length(s2)
len1 == len2 && return dist.dist(s1, s2, max_dist), firstindex(s2):lastindex(s2)
len1 == 0 && return max_dist+1, 1:0
out = max_dist+1
out_idx = 0
for (i, x) in enumerate(qgrams(s2, len1))
curr = dist.dist(s1, x, max_dist)
out_idx = ifelse(curr < out, i, out_idx)
out = min(out, curr)
max_dist = min(out, max_dist)
end
return out, nextind(s2, 0, out_idx):nextind(s2, 0, out_idx+len1-1)
end
julia> findmin("βabc", "βadcacαaXXcαγ", Partial(DamerauLevenshtein()))
(0.25, 1:5)
julia> "βadcacαaXXcαγ"[1:5]
"βadc"
Also, I am new to working with unicode strings, so it's possible I haven't used nextind
correctly.
Metadata
Metadata
Assignees
Labels
No labels