Fix Semantic Query Rewrite Interception Drops Boosts #129282

Samiul-TheSoccerFan · 2025-06-11T18:37:21Z

Match query: boost only on inference field

PUT my-index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "text"
      },
      "summary": {
        "type": "text",
        "copy_to": "semantic_summary"
      },
      "semantic_summary": {
        "type": "semantic_text"
      },
      "moral_lesson": {
        "type": "text",
        "copy_to": "semantic_moral"
      },
      "semantic_moral": {
        "type": "semantic_text"
      }
    }
  }
}

POST my-index/_doc/1
{
  "title": "Cinderella",
  "author": "Charles Perrault",
  "summary": "Cinderella is a young girl mistreated by her stepmother and stepsisters until she meets a fairy godmother.",
  "moral_lesson": "Goodness and kindness will always be rewarded."
}

POST my-index/_doc/2
{
  "title": "Little Red Riding Hood",
  "author": "Brothers Grimm",
  "summary": "A young girl encounters a cunning wolf on her way to visit her grandmother.",
  "moral_lesson": "Beware of strangers."
}

GET my-index/_search
{
  "query": {
    "match": {
      "semantic_moral": {
        "query": "danger",
        "boost": 2.0
      }
    }
  }
}

Match query: boost on both semantic_text and text fields

PUT normal-text-index/
{
  "mappings": {
    "properties": {
      "semantic1": {
        "type": "text"
      },
      "semantic2": {
        "type": "text"
      }
    }
  }
}

POST normal-text-index/_doc/1
{
  "semantic1": "Cinderella is a young girl mistreated by her stepmother and stepsisters until she meets a fairy godmother.",
  "semantic2": "Goodness and kindness will always be rewarded."
}

PUT semantic-text-index/
{
  "mappings": {
    "properties": {
      "semantic1": {
        "type": "semantic_text"
      },
      "semantic2": {
        "type": "semantic_text"
      }
    }
  }
}

POST semantic-text-index/_doc/1
{
  "semantic1": "A young girl encounters a cunning wolf on her way to visit her grandmother.",
  "semantic2": "Beware of strangers."
}

GET normal-text-index,semantic-text-index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "semantic1": {
              "query": "wolf",
              "boost": 2.0
            }
          }
        },
        {
          "match": {
            "semantic2": {
              "query": "strangers",
              "boost": 1.0
            }
          }
        }
      ]
    }
  }
}

KNN

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small"
  }
}

PUT test-knn-semantic-boost
{
  "mappings": {
    "properties": {
      "sem_field": {
        "type": "semantic_text",
        "inference_id": "my-e5-model"
      }
    }
  }
}

POST test-knn-semantic-boost/_doc/1?refresh=true
{
  "sem_field": "star wars droids"
}

POST test-knn-semantic-boost/_doc/2?refresh=true
{
  "sem_field": "fairy tale godmother"
}

// no boost given
GET test-knn-semantic-boost/_search
{
  "query": {
    "knn": {
      "field": "sem_field",
      "query_vector": [0.1, 0.2, 0.3],
      "k": 2,
      "num_candidates": 10
    }
  }
}

// boost preserve
GET test-knn-semantic-boost/_search
{
  "query": {
    "knn": {
      "field": "sem_field",
      "query_vector": [0.1, 0.2, 0.3], // the query throws error which is expected but the boost is propagated properly
      "k": 2,
      "num_candidates": 10,
      "boost": 5.0
    }
  }
}

Sparse

PUT test-sparse-semantic-boost
{
  "mappings": {
    "properties": {
      "sem_field": {
        "type": "semantic_text"
      }
    }
  }
}

POST test-sparse-semantic-boost/_doc/1?refresh=true
{
  "sem_field": "star wars droids"
}

POST test-sparse-semantic-boost/_doc/2?refresh=true
{
  "sem_field": "fairy tale godmother"
}

// sparse without boost
GET test-sparse-semantic-boost/_search
{
  "query": {
    "sparse_vector": {
      "field": "sem_field",
      "query": "driods"
    }
  }
}

// boost preserved
GET test-sparse-semantic-boost/_search
{
  "query": {
    "sparse_vector": {
      "field": "sem_field",
      "query": "driods",
      "boost": 5.0
    }
  }
}

elasticsearchmachine · 2025-06-11T18:39:06Z

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

kderusso

Looks great @Samiul-TheSoccerFan ! Can we please add some tests?

Mikep86

I'd prefer to see a solution based on copy constructors. Delegating the responsibility of generating a complete copy to the caller is error-prone and hard to maintain, which is how this class of bugs got introduced in the first place.

Samiul-TheSoccerFan · 2025-06-13T06:52:56Z

@elasticmachine update branch

elasticmachine · 2025-06-13T06:52:58Z

merge conflict between base and head

Samiul-TheSoccerFan · 2025-07-09T17:14:41Z

@elasticmachine update branch

Samiul-TheSoccerFan · 2025-07-10T12:09:10Z

@elasticmachine update branch

Samiul-TheSoccerFan · 2025-07-11T12:05:55Z

@elasticmachine update branch

Samiul-TheSoccerFan · 2025-07-11T16:02:01Z

@elasticmachine update branch

Mikep86

The implementation looks good! Just a couple of things to clean up and we're good to merge :)

...c/test/java/org/elasticsearch/index/query/SemanticKnnVectorQueryRewriteInterceptorTests.java

...e/src/test/java/org/elasticsearch/index/query/SemanticMatchQueryRewriteInterceptorTests.java

...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java

Mikep86 · 2025-07-11T18:27:40Z

...n/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/47_semantic_text_knn.yml

+
+  - match: { hits.total.value: 1 }
+  - match: { hits.hits.0._id: "doc_1" }
+  - close_to: { hits.hits.0._score: { value: 0.9984111, error: 1e15 } }


The error is still wrong here

Samiul-TheSoccerFan · 2025-07-14T14:30:14Z

@elasticmachine update branch

Samiul-TheSoccerFan · 2025-07-14T17:31:52Z

@elasticmachine update branch

kderusso

Nice iterations @Samiul-TheSoccerFan - I've left some feedback, much of which is non blocking. I think it's very close to being ready to go though!

kderusso · 2025-07-14T19:38:48Z

...e/src/test/java/org/elasticsearch/index/query/SemanticMatchQueryRewriteInterceptorTests.java

    private MatchQueryBuilder createTestQueryBuilder() {
        return new MatchQueryBuilder(FIELD_NAME, VALUE);
    }

+    private MatchQueryBuilder createTestQueryBuilderWithBoostAndQueryName() {


Non-blocking nitpick: Since this is pretty simple and only used once, it doesn't need to be its own method. Maybe more readable to include the test query builder inside the test itself.

kderusso · 2025-07-14T19:39:30Z

...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java

@@ -52,62 +52,68 @@ public void cleanup() {
    }

    public void testSparseVectorQueryOnInferenceFieldIsInterceptedAndRewritten() throws IOException {
+        float boost = randomFloatBetween(1, 10, true);


Can we do a randomBoolean() check to determine whether to set these boosts and names at all?

kderusso · 2025-07-14T19:39:59Z

...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java

    }

    public void testSparseVectorQueryOnInferenceFieldWithoutInferenceIdIsInterceptedAndRewritten() throws IOException {
+        float boost = randomFloatBetween(1, 10, true);


Same note on randomBoolean()

kderusso · 2025-07-14T19:40:20Z

...est/java/org/elasticsearch/index/query/SemanticSparseVectorQueryRewriteInterceptorTests.java

-        assertEquals(QUERY, sparseVectorQueryBuilder.getQuery());
+        original.boost(boost);
+        original.queryName(queryName);
+        testRewrittenInferenceQuery(context, original);


Nice consolidation 🙌

kderusso · 2025-07-14T19:41:00Z

...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/45_semantic_text_match.yml

+
+  - do:
+      indices.create:
+        index: test-sparse-index-random


Why did we name this random?

kderusso · 2025-07-14T19:42:43Z

...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/45_semantic_text_match.yml

+        index: test-sparse-index-random
+        body:
+          settings:
+            number_of_shards: 1


Can we get away with not setting shards/replicas here? I can understand why we need to do it when we're directly comparing scores, but if we don't absolutely need to do this, it's a more robust test without specifying it. For this particular test, since there's only one document, let's experiment with not needing it? (Suggestion applies to the other yaml tests as well).

If we do need it for score, perhaps we could add another test that doesn't compare scores, just names and successful searches.

kderusso · 2025-07-14T19:47:53Z

...ain/java/org/elasticsearch/xpack/inference/queries/SemanticMatchQueryRewriteInterceptor.java

        return boolQueryBuilder;
    }

    @Override
    public String getQueryName() {
        return MatchQueryBuilder.NAME;
    }
+
+    private MatchQueryBuilder copyMatchQueryBuilder(MatchQueryBuilder queryBuilder) {


Non-blocking feedback: This PR has been around for a while, and I'm OK with this as written if we just want to get it in. However, future maintainability would be a concern of mine (what if we add a new param to match, such as prefiltering?). I'd almost suggest creating a new constructor in MatchQueryBuilder instead of this private method.

Samiul-TheSoccerFan added 5 commits June 11, 2025 14:29

fix boosting for knn

ccd64ae

Fixing for match query

9338cd5

fixing for match subquery

370931d

fix for sparse vector query boost

b85abda

fix linting issues

5db2686

elasticsearchmachine added the v9.1.0 label Jun 11, 2025

Samiul-TheSoccerFan added >bug auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 v8.19.0 :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Jun 11, 2025

Update docs/changelog/129282.yaml

2ce691e

github-actions bot deployed to docs-preview June 11, 2025 18:39 View deployment

update changelog

4100200

github-actions bot deployed to docs-preview June 11, 2025 18:45 View deployment

kderusso reviewed Jun 11, 2025

View reviewed changes

Mikep86 reviewed Jun 11, 2025

View reviewed changes

Samiul-TheSoccerFan added 7 commits June 12, 2025 15:59

Copy constructor with match query

3406ae1

util function to create sparseVectorBuilder for sparse query

d07952a

util function for knn query to support boost

f133632

adding unit tests for all intercepted query terms

a9048f0

Adding yaml test for match,sparse, and knn

5a1dab9

Adding queryname support for nested query

6cef441

fix code styles

faa35ea

github-actions bot deployed to docs-preview June 13, 2025 06:52 View deployment

merge from main

675fb22

resolve conflicts from main

cde55d1

github-actions bot deployed to docs-preview July 9, 2025 15:47 View deployment

[CI] Auto commit changes from spotless

8ddda3c

github-actions bot deployed to docs-preview July 9, 2025 16:02 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

873efdb

github-actions bot deployed to docs-preview July 9, 2025 17:15 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

44b8aa9

github-actions bot deployed to docs-preview July 10, 2025 12:09 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

104f16b

github-actions bot deployed to docs-preview July 11, 2025 12:06 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

2a96d52

github-actions bot deployed to docs-preview July 11, 2025 16:02 View deployment

Mikep86 reviewed Jul 11, 2025

View reviewed changes

Samiul-TheSoccerFan added 2 commits July 11, 2025 17:11

fix unit tests

5dcfc1b

update yaml tests

469f598

github-actions bot deployed to docs-preview July 11, 2025 21:25 View deployment

fix match yaml test

375ae36

github-actions bot deployed to docs-preview July 11, 2025 21:27 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

c81f184

github-actions bot deployed to docs-preview July 14, 2025 14:31 View deployment

Merge branch 'main' into fix-semantic-query-rewrite-boost-issue

394f43a

github-actions bot deployed to docs-preview July 14, 2025 17:33 View deployment

Samiul-TheSoccerFan requested a review from Mikep86 July 14, 2025 18:50

kderusso reviewed Jul 14, 2025

View reviewed changes

Fix Semantic Query Rewrite Interception Drops Boosts #129282

Are you sure you want to change the base?

Fix Semantic Query Rewrite Interception Drops Boosts #129282

Conversation

Samiul-TheSoccerFan commented Jun 11, 2025

Match query: boost only on inference field

Match query: boost on both semantic_text and text fields

KNN

Sparse

Uh oh!

elasticsearchmachine commented Jun 11, 2025

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Samiul-TheSoccerFan commented Jun 13, 2025

Uh oh!

elasticmachine commented Jun 13, 2025

Uh oh!

Samiul-TheSoccerFan commented Jul 9, 2025

Uh oh!

Samiul-TheSoccerFan commented Jul 10, 2025

Uh oh!

Samiul-TheSoccerFan commented Jul 11, 2025

Uh oh!

Samiul-TheSoccerFan commented Jul 11, 2025

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Samiul-TheSoccerFan commented Jul 14, 2025

Uh oh!

Samiul-TheSoccerFan commented Jul 14, 2025

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!