Skip to content

Base.findmin(s1, s2, dist::Partial) #29

Open
@ericphanson

Description

@ericphanson

Hi and thanks for the great package!

I found the following snippet (modified from (dist::Partial)(s1, s2)) useful and I was wondering if it would be worth adding to StringDistances? I could make a PR if so.

julia> function Base.findmin(s1, s2, dist::Partial; max_dist = 1.0)
    s1, s2 = StringDistances.reorder(s1, s2)
    len1, len2 = length(s1), length(s2)
    len1 == len2 && return dist.dist(s1, s2, max_dist), firstindex(s2):lastindex(s2)
    len1 == 0 && return max_dist+1, 1:0
    out = max_dist+1
    out_idx = 0
    for (i, x) in enumerate(qgrams(s2, len1))
        curr = dist.dist(s1, x, max_dist)
        out_idx = ifelse(curr < out, i, out_idx)
        out = min(out, curr)
        max_dist = min(out, max_dist)
    end
    return out, nextind(s2, 0, out_idx):nextind(s2, 0, out_idx+len1-1)
end

julia>  findmin("βabc", "βadcacαaXXcαγ", Partial(DamerauLevenshtein()))
(0.25, 1:5)

julia> "βadcacαaXXcαγ"[1:5]
"βadc"

Also, I am new to working with unicode strings, so it's possible I haven't used nextind correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions