Skip to content

Optionally exclude some special characters from being filtered #19

@kivilahtio

Description

@kivilahtio

For example
15 § Rakennusjärjestyksen hyväksyminen

is tokenized to
15 rakennusjärjestys hyväksyä
which is all good.

However it is impossible to search from the search index using search term
§
because I presume it is filtered out along with lowercasing and punctuation filtering.
Is it possible to affect the way punctuations are filtered?

I guess we could add the character § here to include it in the tokenizations, but then it would cause problems in cases where § is mistyped, for example
15 §Rakennusjärjestyksen
Could not be understood by the Voikko -stemming rules.

Have you encountered similar requests from your clients?
Any advice on how this could be achieved in such a way that the change could be incorporated in this plugin?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions