Open
Description
Some QGram distances today return NaN if one of the input strings are shorter than the q-gram length (q) while others don't:
julia> using StringDistances
julia> isnan(Cosine(2)("", "bb"))
true
julia> isnan(Cosine(2)("a", "bb"))
true
julia> Jaccard(2)("a", "bb")
1.0
julia> filter(d -> isnan(d(2)("", "bb")), [QGram, Cosine, Jaccard, Overlap, SorensenDice, MorisitaOverlap, NMD])
3-element Vector{DataType}:
Cosine
Overlap
MorisitaOverlap
Maybe it is better to have a consistent behaviour for such inputs? Returning an ArgumentError might be better and then the caller has to decide how to handle such situations.
Metadata
Metadata
Assignees
Labels
No labels