-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Abstracted how Text fields use Keyword fields inside of Text fields #132430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
private final IndexMode indexMode; | ||
private final IndexSortConfig indexSortConfig; | ||
private final boolean hasDocValuesSkipper; | ||
private final String originalName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was moved up to StringFieldType
as its a commonly used function
@@ -1349,7 +1377,7 @@ protected SyntheticSourceSupport syntheticSourceSupport() { | |||
return super.syntheticSourceSupport(); | |||
} | |||
|
|||
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String fullFieldName, String leafFieldName) { | |||
public CompositeSyntheticFieldLoader syntheticFieldLoader(String fullFieldName, String leafFieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed this to be more explicit in order to merge this CompositeSyntheticFieldLoader
with the one in TextFieldMapper
/MatchOnlyTextFieldMapper
@@ -238,11 +240,9 @@ private static FielddataFrequencyFilter parseFrequencyFilter(String name, Mappin | |||
public static class Builder extends FieldMapper.Builder { | |||
|
|||
private final IndexVersion indexCreatedVersion; | |||
private final Parameter<Boolean> store; | |||
|
|||
private final boolean isSyntheticSourceEnabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer used
this.withinMultiField = withinMultiField; | ||
this.store = Parameter.storeParam(m -> ((TextFieldMapper) m).store, () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer need to do this. TextFieldMapper
now decides whether we need an extra StoredField
during parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I wonder if there are any customers that rely on store = true
by default. We initially set it that way to support synthetic source. However, given thats done elsewhere now, we might be able to release this without too many concerns. I'll discuss with the team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that depends on whether retrieving stored fields fail for requests like: GET my-index/_doc/1?stored_fields=message
? I suspect it would no longer retrieve fields?
Now technically that is breaking and we should open a breaking change issue for this.
* not running in synthetic _source or synthetic source doesn't need it. | ||
*/ | ||
private final KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate; | ||
private final Optional<KeywordFieldMapper.KeywordFieldType> syntheticSourceDelegate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with Optional
here, its now more clear that this delegate can be null. Which enforces null checks.
4f03296
to
511735f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good overall. It is not a small change but i think it makes control flow more clear in this case.
...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/CompositeSyntheticFieldLoader.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/CompositeSyntheticFieldLoader.java
Outdated
Show resolved
Hide resolved
this.withinMultiField = withinMultiField; | ||
this.store = Parameter.storeParam(m -> ((TextFieldMapper) m).store, () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a breaking change?
server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java
Outdated
Show resolved
Hide resolved
9c95662
to
a11f607
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first review pass. My two main points are:
- I was hoping we could utilize ignored source as I mentioned in the comment in TextFieldmapper. With #132142 reading performance should improve, which was a bit of a concern before.
- Looks like this PR also changes to parse logic to read utf8 bytes. Let's do that in a separate PR.
@@ -84,28 +88,26 @@ public static class Builder extends FieldMapper.Builder { | |||
final Parameter<String> indexOptions = TextParams.textIndexOptions(m -> builder(m).indexOptions.getValue()); | |||
final Parameter<Boolean> norms = TextParams.norms(true, m -> builder(m).norms.getValue()); | |||
final Parameter<String> termVectors = TextParams.termVectors(m -> builder(m).termVectors.getValue()); | |||
private final Parameter<Boolean> store = Parameter.storeParam(m -> builder(m).store.getValue(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add a name mapping parameter for this field type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come? We had one before - see line 104 that was removed.
...ext/src/main/java/org/elasticsearch/index/mapper/annotatedtext/AnnotatedTextFieldMapper.java
Outdated
Show resolved
Hide resolved
this.withinMultiField = withinMultiField; | ||
this.store = Parameter.storeParam(m -> ((TextFieldMapper) m).store, () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that depends on whether retrieving stored fields fail for requests like: GET my-index/_doc/1?stored_fields=message
? I suspect it would no longer retrieve fields?
Now technically that is breaking and we should open a breaking change issue for this.
var utfBytes = value.bytes(); | ||
var bytesRef = new BytesRef(utfBytes.bytes(), utfBytes.offset(), utfBytes.length()); | ||
final String fieldName = fieldType().syntheticSourceFallbackFieldName(true); | ||
context.doc().add(new StoredField(fieldName, bytesRef)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just use ignored source here? I think falling back to ignored source is a more simplistic approach.
So instead something like:
context.addIgnoredField(new IgnoredSourceFieldMapper.NameValue(fullPath(), fullPath().lastIndexOf(leafName()), bytesRef, context.doc()));
This way we can reuse the fallback synthetic source block loader and synthetic source ignored source support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Now that I've seen how ignored source is implemented around synthetic source, it makes sense to reuse that code.
One thing we discussed is how "ignored source" is the wrong name here and may confuse others. I will add a simple wrapper function that makes this more clear. That should suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: ignored source will not work because all fields with the same name, under the same document, will need to be added to ignored source. This is because we skip using field loaders for fields that are in ignored source. This is fine for the fields that are genuinely added to ignored source, but not for fields that are not added.
For example, if our mappings include a text
field with a keyword
multi field + ignore_above, and we're given a document that contains two of such fields: one trips ignore_above and one doesn't. For the tripped one, we'll add the parent text
field into ignored source. However, for the other one, we'll rely on the keyword
multi field for synthetic source. Because the first field is already recorded in ignored source, we'll skip using the field loader for the second field and it will be missing from our result.
I will go back to the original code: creating StoredFields
directly in parseCreateField()
.
server/src/main/java/org/elasticsearch/index/mapper/TextFamilyFieldMapper.java
Show resolved
Hide resolved
…tic source delegate
This is a small refactor + bug for fix 131282. In the final PR, I will separate the refactor from the bug fix for clarity.
The refactor changes how
Text
andMatchOnlyText
fields useKeyword
multi fields for synthetic source. Currently, this is done via the hasSyntheticSourceCompatibleKeywordField argument, where we set a boolean flag to indicate whether there is a keyword multi field that is either stored or has doc values. This is not a good approach for addressing 131282 because we want to disable the following logic for multi fields. With that disabled, the parent fields will no longer have a multi field to use for synthetic source.We could designate one of the keyword fields as some kind of "synthetic source provider" for the parent. This way the field will always create a
StoredField
whenignore_above
is tripped. However, this is poor approach since it exposes how text fields are implemented to the keyword field. In my opinion, the parent field should be the one responsible for deciding what is and what isn't stored, as opposed to its multi fields.To achieve the above, I've removed
hasSyntheticSourceCompatibleKeywordField
and instead relied on thesyntheticSourceDelegate
. With the addition of a new methodisIgnored()
, which is called during parsing on the suppliedvalue
, we can determine whether a particular keyword multi field is a valid supporter of synthetic source. If it isn't, then the parent field can explicitly create aStoredField
for that.Next, I've changed how
SyntheticSourceSupport
works in the text family of fields. Since we no longer know whether a text field is truly stored or not, we now do something similar to what keyword fields do: create aSyntheticFieldLoader
with multiple layers. Each layer points to a potential source for synthetic source. For example, the first layer may contain the field loader for the parent field, while the next layer may contain the field loader for the keyword multi field.The following, additional, changes were also made to clean things up a bit:
TextFamilyFieldMapper
, which contains some common logic shared byText
,MatchOnlyText
, andAnnotatedText
when it comes to synthetic sourceMapperBuilderContext
isSyntheticSource
from Builders as that information comes fromMapperBuilderContext
anywayisIgnoreAboveSet()
for code readabilityoriginalFieldName
up toStringFieldType
and renamed it tosyntheticSourceFallbackFieldName()
. This removes a lot of duplicated code and clarifies what this "original" name even issyntheticSourceDelegate
in anOptional
to make it clear that it can be nullMatchOnlyTextFieldMapper