Accept Callables as Tokenizers for InMemoryDocumentStore #4695
Closed
farhanhubble
started this conversation in
Ideas
Replies: 1 comment
-
Thanks for the feedback, I'm going to create a feature request from this discussion, contributions are welcome! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
InMemoryDocumentStore
currently only accepts a tokenizing pattern through the argumentbm25_tokenization_regex: str = r"(?u)\b\w\w+\b"
. The underlying BM25 supports acallable
though. Removing this restriction will enable correct tokenization of a larger variety of corpora. I ran into this limitation trying to index JSON documents that contain key-value pairs, like:Beta Was this translation helpful? Give feedback.
All reactions