-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Fix Semantic Query Rewrite Interception Drops Boosts #129282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix Semantic Query Rewrite Interception Drops Boosts #129282
Conversation
Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @Samiul-TheSoccerFan ! Can we please add some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see a solution based on copy constructors. Delegating the responsibility of generating a complete copy to the caller is error-prone and hard to maintain, which is how this class of bugs got introduced in the first place.
@elasticmachine update branch |
merge conflict between base and head |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good! Just a couple of things to clean up and we're good to merge :)
...c/test/java/org/elasticsearch/index/query/SemanticKnnVectorQueryRewriteInterceptorTests.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/index/query/SemanticKnnVectorQueryRewriteInterceptorTests.java
Show resolved
Hide resolved
...e/src/test/java/org/elasticsearch/index/query/SemanticMatchQueryRewriteInterceptorTests.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java
Show resolved
Hide resolved
|
||
- match: { hits.total.value: 1 } | ||
- match: { hits.hits.0._id: "doc_1" } | ||
- close_to: { hits.hits.0._score: { value: 0.9984111, error: 1e15 } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is still wrong here
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice iterations @Samiul-TheSoccerFan - I've left some feedback, much of which is non blocking. I think it's very close to being ready to go though!
private MatchQueryBuilder createTestQueryBuilder() { | ||
return new MatchQueryBuilder(FIELD_NAME, VALUE); | ||
} | ||
|
||
private MatchQueryBuilder createTestQueryBuilderWithBoostAndQueryName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nitpick: Since this is pretty simple and only used once, it doesn't need to be its own method. Maybe more readable to include the test query builder inside the test itself.
@@ -52,62 +52,68 @@ public void cleanup() { | |||
} | |||
|
|||
public void testSparseVectorQueryOnInferenceFieldIsInterceptedAndRewritten() throws IOException { | |||
float boost = randomFloatBetween(1, 10, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a randomBoolean()
check to determine whether to set these boosts and names at all?
} | ||
|
||
public void testSparseVectorQueryOnInferenceFieldWithoutInferenceIdIsInterceptedAndRewritten() throws IOException { | ||
float boost = randomFloatBetween(1, 10, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note on randomBoolean()
assertEquals(QUERY, sparseVectorQueryBuilder.getQuery()); | ||
original.boost(boost); | ||
original.queryName(queryName); | ||
testRewrittenInferenceQuery(context, original); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice consolidation 🙌
|
||
- do: | ||
indices.create: | ||
index: test-sparse-index-random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we name this random?
index: test-sparse-index-random | ||
body: | ||
settings: | ||
number_of_shards: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get away with not setting shards/replicas here? I can understand why we need to do it when we're directly comparing scores, but if we don't absolutely need to do this, it's a more robust test without specifying it. For this particular test, since there's only one document, let's experiment with not needing it? (Suggestion applies to the other yaml tests as well).
If we do need it for score, perhaps we could add another test that doesn't compare scores, just names and successful searches.
return boolQueryBuilder; | ||
} | ||
|
||
@Override | ||
public String getQueryName() { | ||
return MatchQueryBuilder.NAME; | ||
} | ||
|
||
private MatchQueryBuilder copyMatchQueryBuilder(MatchQueryBuilder queryBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking feedback: This PR has been around for a while, and I'm OK with this as written if we just want to get it in. However, future maintainability would be a concern of mine (what if we add a new param to match, such as prefiltering?). I'd almost suggest creating a new constructor in MatchQueryBuilder
instead of this private method.
Closes ##128696
Match query: boost only on inference field
Match query: boost on both semantic_text and text fields
KNN
Sparse