-
Notifications
You must be signed in to change notification settings - Fork 178
Mvexpand feature #4675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Mvexpand feature #4675
Conversation
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth29.9@gmail.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java
Outdated
Show resolved
Hide resolved
docs/user/ppl/cmd/mvexpand.rst
Outdated
|
|
||
| Example 5: Large Arrays and Memory Limits | ||
| ---------------------------------------- | ||
| If an array exceeds configured memory/resource limits, mvexpand returns an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this config already supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no mvexpand-specific config. Resource limits are enforced by the engine/cluster (e.g. circuit breakers and SQL/PPL query size/timeouts). Use mvexpand's limit data to avoid hitting those limits; if you want I can add references to the exact cluster/SQL settings for our release.
Let me add the same to the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean SQL/PPL's circuit breaker or OpenSearch DSL's? I think our circuit breaker today only protect scan operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified in the mvexpand docs that resource limits are enforced by OpenSearch node/SQL-PPL limits (no mvexpand-specific config) and documented using limit + prefiltering as the recommended mitigation;
https://docs.opensearch.org/1.0/search-plugins/ppl/settings/
Referring to this:
plugins.query.memory_limit | Set heap memory usage limit. If a query crosses this limit, it’s terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I recall in our V3 (Calcite-based) engine, only index scan operator is protected. Circuit breaking other operators are under research. I think probably we just keep "To avoid failures when expanding large arrays:" section which is very helpful.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
|
#4675 (comment) |
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Refactored as requested. |
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
| context.relBuilder.variable(correlVariable::set); | ||
| // New generic helper: builds Uncollect + Correlate using a provided left node (so caller | ||
| // can ensure left rowType is fixed). | ||
| private void buildUnnestForLeft( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we no longer need this method as the logic is unified. We might want to extract some portion of buildExpandRelNode, but we should extract with fewer parameters in that case.
| } | ||
|
|
||
| @Override | ||
| public RelNode visitMvExpand(MvExpand node, CalcitePlanContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's move this method next to visitExpand as it is closely related.
| arrayFieldRex = context.rexBuilder.makeInputRef(currentRowType, fld.getIndex()); | ||
| } else { | ||
| throw new IllegalArgumentException( | ||
| "buildExpandRelNode: expected RexInputRef or resolvable field name: " + arrayFieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected to be visible to the user? In that case, error message should be meaningful for the user.
| } else { | ||
| RelDataType currentRowType = context.relBuilder.peek().getRowType(); | ||
| RelDataTypeField fld = currentRowType.getField(arrayFieldName, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of case falls into this branch? Which test case associated with this branch?
| bulkInsert( | ||
| INDEX, | ||
| "{\"username\":\"happy\",\"skills\":[{\"name\":\"python\"},{\"name\":\"java\"},{\"name\":\"sql\"}]}", | ||
| "{\"username\":\"single\",\"skills\":[{\"name\":\"go\"}]}", | ||
| "{\"username\":\"empty\",\"skills\":[]}", | ||
| "{\"username\":\"nullskills\",\"skills\":null}", | ||
| "{\"username\":\"noskills\"}", | ||
| "{\"username\":\"missingattr\",\"skills\":[{\"name\":\"c\"},{\"level\":\"advanced\"}]}", | ||
| "{\"username\":\"complex\",\"skills\":[{\"name\":\"ml\",\"level\":\"expert\"},{\"name\":\"ai\"},{\"level\":\"novice\"}]}", | ||
| "{\"username\":\"duplicate\",\"skills\":[{\"name\":\"dup\"},{\"name\":\"dup\"}]}", | ||
| "{\"username\":\"large\",\"skills\":[{\"name\":\"s1\"},{\"name\":\"s2\"},{\"name\":\"s3\"},{\"name\":\"s4\"},{\"name\":\"s5\"},{\"name\":\"s6\"},{\"name\":\"s7\"},{\"name\":\"s8\"},{\"name\":\"s9\"},{\"name\":\"s10\"}]}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some records are not used in tests. Should we add test cases or remove unused records?
Description
This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.
Key features introduced:
Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.
Related Issues
Resolves #4439
#4439
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.