Skip to content

Adding _source and schema merging to index_mappings #1101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
May 13, 2025

Conversation

ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Apr 9, 2025

Description

Adding _source and schema merging to index_mappings so that customers can customize customize the index mapping for the OpenSearch index that stores the Flint index data created via Spark SQL.
This PR added the _source feature so that user can customize whether they want to enable _source in index_mappings
For example:

index_mappings: '{ "_source": { "enabled": false }'

This PR also added the schema merging feature so that user can Disable doc_values for specific fields to save space.

For exmaple:

"properties": { "test_field": {"index": false} } }

CREATE MATERIALIZED VIEW mv_test
AS
SELECT ... 
WITH (
  index_mappings: '{ "_source": { "enabled": false }, "properties": { "test_field": {"index": false} } }'
)

Test Results

Before disabling _source:
Input:

GET flint_mys3_default_mv_event_count2/_search

Output:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "flint_mys3_default_mv_event_count2",
        "_id": "g1ZLJ5YB8KjDxFt7EYUZ",
        "_score": 1,
        "_source": {
          "event": "click",
          "cnt": 10000
        }
      }
    ]
  }
}

After disabling _source:
Input:

GET flint_mys3_default_mv_event_count3/_search 

Output:

{
  "took": 510,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "flint_mys3_default_mv_event_count3",
        "_id": "h1ZfJ5YB8KjDxFt7D4X5",
        "_score": 1
      }
    ]
  }
}

Before adding index_mappings: '{ "_source": { "enabled": false }, "properties": { "cnt": {"index": false} } }'

Input:

GET flint_mys3_default_mv_event_count2/_mappings

Output:

{
  "flint_mys3_default_mv_event_count2": {
    "mappings": {
      "_meta": {
        "kind": "mv",
        "indexedColumns": [
          {
            "columnType": "string",
            "columnName": "event"
          },
          {
            "columnType": "bigint",
            "columnName": "cnt"
          }
        ],
        "name": "mys3.default.mv_event_count2",
        "options": {
          "auto_refresh": "false",
          "incremental_refresh": "false"
        },
        "source": "SELECT event, COUNT(*) AS cnt FROM default.test_table GROUP BY event",
        "version": "1.0.0",
        "properties": {
          "sourceTables": [
            "mys3.default.test_table"
          ],
        }
      },
      "properties": {
        "cnt": {
          "type": "long"
        },
        "event": {
          "type": "keyword"
        }
      }
    }
  }
}

After

Input:

GET flint_mys3_default_mv_event_count4/_mappings

Output:

{
  "flint_mys3_default_mv_event_count4": {
    "mappings": {
      "_meta": {
        "kind": "mv",
        "indexedColumns": [
          {
            "columnType": "string",
            "columnName": "event"
          },
          {
            "columnType": "bigint",
            "columnName": "cnt"
          }
        ],
        "name": "mys3.default.mv_event_count4",
        "options": {
          "auto_refresh": "false",
          "incremental_refresh": "false",
          "index_mappings": "{ \"_source\": { \"enabled\": false }, \"properties\": { \"cnt\": {\"index\": false} } }"
        },
        "source": "SELECT event, COUNT(*) AS cnt FROM default.test_table GROUP BY event",
        "version": "1.0.0",
        "properties": {
          "sourceTables": [
            "mys3.default.test_table"
          ],
        }
      },
      "_source": {
        "enabled": false
      },
      "properties": {
        "cnt": {
          "type": "long",
          "index": false
        },
        "event": {
          "type": "keyword"
        }
      }
    }
  }
}

Related Issues

Resolves #772
Pending implementations for Support index mapping option in create index statement #772

  1. Force certain field types (e.g., IP type) as a temporary workaround until SparkSQL supports them.

@ahkcs ahkcs force-pushed the feat/index_mappings branch from ebff6e3 to 20022d6 Compare April 9, 2025 21:29
@ahkcs ahkcs marked this pull request as draft April 9, 2025 21:37
@ahkcs ahkcs force-pushed the feat/index_mappings branch 2 times, most recently from c545ef9 to bfcaf5b Compare April 9, 2025 21:58
@ahkcs ahkcs marked this pull request as ready for review April 14, 2025 21:43
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove spark-warehouse. I think it's generated by Spark test and normally should be removed after test complete automatically.

@dai-chen dai-chen added the enhancement New feature or request label Apr 14, 2025
LantaoJin and others added 11 commits April 15, 2025 15:14
* Fix antlr4 parser issues

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Case insensitive lexer

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert useless change

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* remove tokens file

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ed unnecessary code

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested review from dai-chen and noCharger May 5, 2025 23:35
ahkcs added 4 commits May 5, 2025 16:37
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

I recall current schema merge logic only supports limited field name, right? Could you create follow-up issue for improvements or updating doc with this limitation?

@ahkcs ahkcs mentioned this pull request May 8, 2025
Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added 2 commits May 12, 2025 11:10
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested review from noCharger and dai-chen May 12, 2025 20:01
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested a review from noCharger May 12, 2025 22:19
Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahkcs please check the e2e test failing

@noCharger noCharger added backport 0.x Backport to 0.x branch (stable branch) backport 0.7 labels May 13, 2025
@dai-chen dai-chen merged commit 76d35e2 into opensearch-project:main May 13, 2025
6 of 7 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 13, 2025
* Fix antlr4 parser issues (#1094)

* Fix antlr4 parser issues

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Case insensitive lexer

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert useless change

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* remove tokens file

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* adding _source to index_mappings

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Apply scalafmt

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added index_mapping as an option in index.md, applied scalafmtAll

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* improve readability

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Removed index_mappings from FlintMetaData.scala, Modified index.md

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Removed indexMappingsSourceEnabled from FlintMetadata.scala and removed unnecessary code

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added some test cases to test serialzie() and fixed some formatting issues

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added some test cases for FlintOpenSearchIndexMetadataServiceSuite.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added schema merging to index_mappings, added some test cases

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* updated test cases

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Minor format fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* minor fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* added nested schema merging logic, moved mergeSchema to serialize, updated test cases,  fixed some minor issues

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* updated some comments

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed some formatting issues based on the comments

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed the FlintSparkSkippingIndexITSuite

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixing schema merging limitation

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* less scala/java conversion

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* style fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix unnecessary casting

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 76d35e2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 13, 2025
* Fix antlr4 parser issues (#1094)

* Fix antlr4 parser issues

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Case insensitive lexer

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert useless change

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* remove tokens file

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* adding _source to index_mappings

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Apply scalafmt

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added index_mapping as an option in index.md, applied scalafmtAll

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* improve readability

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Removed index_mappings from FlintMetaData.scala, Modified index.md

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removed indexMappingsSourceEnabled from FlintMetadata.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Removed indexMappingsSourceEnabled from FlintMetadata.scala and removed unnecessary code

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added some test cases to test serialzie() and fixed some formatting issues

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added some test cases for FlintOpenSearchIndexMetadataServiceSuite.scala

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Added schema merging to index_mappings, added some test cases

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* updated test cases

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Minor format fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* minor fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* added nested schema merging logic, moved mergeSchema to serialize, updated test cases,  fixed some minor issues

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* updated some comments

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed some formatting issues based on the comments

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* syntax issue

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixed the FlintSparkSkippingIndexITSuite

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixing schema merging limitation

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* less scala/java conversion

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* style fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix unnecessary casting

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 76d35e2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
noCharger pushed a commit that referenced this pull request May 14, 2025
* Fix antlr4 parser issues (#1094)

* Fix antlr4 parser issues



* Case insensitive lexer



* revert useless change



* remove tokens file



---------




* adding _source to index_mappings



* syntax fix



* Apply scalafmt



* Added index_mapping as an option in index.md, applied scalafmtAll



* improve readability



* Removed index_mappings from FlintMetaData.scala, Modified index.md



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* Removed indexMappingsSourceEnabled from FlintMetadata.scala and removed unnecessary code



* Added some test cases to test serialzie() and fixed some formatting issues



* Added some test cases for FlintOpenSearchIndexMetadataServiceSuite.scala



* Added schema merging to index_mappings, added some test cases



* updated test cases



* Minor format fix



* minor fixes



* added nested schema merging logic, moved mergeSchema to serialize, updated test cases,  fixed some minor issues



* updated some comments



* fixed some formatting issues based on the comments



* fixed syntax issue



* syntax issue



* syntax issue



* fixed the FlintSparkSkippingIndexITSuite



* fixing schema merging limitation



* less scala/java conversion



* style fix



* fix unnecessary casting



---------




(cherry picked from commit 76d35e2)

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
noCharger pushed a commit that referenced this pull request May 14, 2025
* Fix antlr4 parser issues (#1094)

* Fix antlr4 parser issues



* Case insensitive lexer



* revert useless change



* remove tokens file



---------




* adding _source to index_mappings



* syntax fix



* Apply scalafmt



* Added index_mapping as an option in index.md, applied scalafmtAll



* improve readability



* Removed index_mappings from FlintMetaData.scala, Modified index.md



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* removed indexMappingsSourceEnabled from FlintMetadata.scala



* Removed indexMappingsSourceEnabled from FlintMetadata.scala and removed unnecessary code



* Added some test cases to test serialzie() and fixed some formatting issues



* Added some test cases for FlintOpenSearchIndexMetadataServiceSuite.scala



* Added schema merging to index_mappings, added some test cases



* updated test cases



* Minor format fix



* minor fixes



* added nested schema merging logic, moved mergeSchema to serialize, updated test cases,  fixed some minor issues



* updated some comments



* fixed some formatting issues based on the comments



* fixed syntax issue



* syntax issue



* syntax issue



* fixed the FlintSparkSkippingIndexITSuite



* fixing schema merging limitation



* less scala/java conversion



* style fix



* fix unnecessary casting



---------




(cherry picked from commit 76d35e2)

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 0.x Backport to 0.x branch (stable branch) backport 0.7 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support index mapping option in create index statement
6 participants