-
Notifications
You must be signed in to change notification settings - Fork 99
Several improvements to token filter types #4291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@JoshMock I'm having some doubts about this one, I think while it's good to have the full list of stopwords, some of the specific tokenizers only accept part of it? For example, this is the code for the JapaneseStopTokenFilter, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JoshMock 🙂 Just a minor comment.
@@ -56,7 +56,7 @@ export class FingerprintAnalyzer { | |||
* | |||
* @server_default _none_ | |||
*/ | |||
stopwords?: StopWords | |||
stopwords?: StopWordLanguage | string[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause inconveniences for users in statically typed languages.
Could we please continue using the StopWords
type here?
StopWords
is currently defined as:
export type StopWords = string | string[]
Changing that to:
export type StopWords = StopWordLanguage | string[]
while keeping this file "as is", should do the trick.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't comment on this because the java client simplifies enum | string
to string
, so I assumed other static clients would have something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For .NET I don’t do this since enums are way nicer to use.
In this case it’s as well string[]. Do you simplify that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it also works as enum | string[]
-> string[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect now! Thank you @JoshMock
(cherry picked from commit f5dca08)
(cherry picked from commit f5dca08)
(cherry picked from commit f5dca08)
An issue on the JS repo was opened about a missing
cjk_bigram
token filter type. I did an audit of the token filter docs and found a few missing types, so I:Fixes #4251
Fixes elastic/elasticsearch-js#2726