Skip to content

Conversation

@srikanthpadakanti
Copy link

Description

This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.

Key features introduced:

Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.

Related Issues

Resolves #4439
#4439

Check List

  • [ X] New functionality includes testing.
  • [ X] New functionality has been documented.
  • [ X] New functionality has javadoc added.
  • [ X] New functionality has a user manual doc added.
  • [ X] New PPL command checklist all confirmed.
  • [ X] API changes companion pull request created.
  • [ X] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Srikanth Padakanti and others added 9 commits October 27, 2025 10:44
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Srikanth Padakanti added 3 commits October 30, 2025 15:46
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Srikanth Padakanti and others added 6 commits October 30, 2025 23:29
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth29.9@gmail.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti
Copy link
Author

srikanthpadakanti commented Nov 3, 2025

Hello @dai-chen There are some tests failed in Security Plugin IT - unrelated to my mvexpand changes.
@ykmr1224


Example 5: Large Arrays and Memory Limits
----------------------------------------
If an array exceeds configured memory/resource limits, mvexpand returns an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this config already supported?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no mvexpand-specific config. Resource limits are enforced by the engine/cluster (e.g. circuit breakers and SQL/PPL query size/timeouts). Use mvexpand's limit data to avoid hitting those limits; if you want I can add references to the exact cluster/SQL settings for our release.

Let me add the same to the documentation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean SQL/PPL's circuit breaker or OpenSearch DSL's? I think our circuit breaker today only protect scan operator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified in the mvexpand docs that resource limits are enforced by OpenSearch node/SQL-PPL limits (no mvexpand-specific config) and documented using limit + prefiltering as the recommended mitigation;

https://docs.opensearch.org/1.0/search-plugins/ppl/settings/

Referring to this:
plugins.query.memory_limit | Set heap memory usage limit. If a query crosses this limit, it’s terminated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I recall in our V3 (Calcite-based) engine, only index scan operator is protected. Circuit breaking other operators are under research. I think probably we just keep "To avoid failures when expanding large arrays:" section which is very helpful.

@ykmr1224
Copy link
Collaborator

ykmr1224 commented Nov 6, 2025

#4675 (comment)
From this comment, it looks like an improvement for existing expand command.
Why don't we improve expand at the same time, and consider mvexpand as and alias of expand?

@srikanthpadakanti
Copy link
Author

#4675 (comment) From this comment, it looks like an improvement for existing expand command. Why don't we improve expand at the same time, and consider mvexpand as and alias of expand?

Refactored as requested.
Improved the "expand" command and aliased the - "mvexpand" to it.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

context.relBuilder.variable(correlVariable::set);
// New generic helper: builds Uncollect + Correlate using a provided left node (so caller
// can ensure left rowType is fixed).
private void buildUnnestForLeft(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we no longer need this method as the logic is unified. We might want to extract some portion of buildExpandRelNode, but we should extract with fewer parameters in that case.

}

@Override
public RelNode visitMvExpand(MvExpand node, CalcitePlanContext context) {
Copy link
Collaborator

@ykmr1224 ykmr1224 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's move this method next to visitExpand as it is closely related.

arrayFieldRex = context.rexBuilder.makeInputRef(currentRowType, fld.getIndex());
} else {
throw new IllegalArgumentException(
"buildExpandRelNode: expected RexInputRef or resolvable field name: " + arrayFieldName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to be visible to the user? In that case, error message should be meaningful for the user.

Comment on lines +3402 to +3404
} else {
RelDataType currentRowType = context.relBuilder.peek().getRowType();
RelDataTypeField fld = currentRowType.getField(arrayFieldName, false, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of case falls into this branch? Which test case associated with this branch?

Comment on lines +35 to +45
bulkInsert(
INDEX,
"{\"username\":\"happy\",\"skills\":[{\"name\":\"python\"},{\"name\":\"java\"},{\"name\":\"sql\"}]}",
"{\"username\":\"single\",\"skills\":[{\"name\":\"go\"}]}",
"{\"username\":\"empty\",\"skills\":[]}",
"{\"username\":\"nullskills\",\"skills\":null}",
"{\"username\":\"noskills\"}",
"{\"username\":\"missingattr\",\"skills\":[{\"name\":\"c\"},{\"level\":\"advanced\"}]}",
"{\"username\":\"complex\",\"skills\":[{\"name\":\"ml\",\"level\":\"expert\"},{\"name\":\"ai\"},{\"level\":\"novice\"}]}",
"{\"username\":\"duplicate\",\"skills\":[{\"name\":\"dup\"},{\"name\":\"dup\"}]}",
"{\"username\":\"large\",\"skills\":[{\"name\":\"s1\"},{\"name\":\"s2\"},{\"name\":\"s3\"},{\"name\":\"s4\"},{\"name\":\"s5\"},{\"name\":\"s6\"},{\"name\":\"s7\"},{\"name\":\"s8\"},{\"name\":\"s9\"},{\"name\":\"s10\"}]}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some records are not used in tests. Should we add test cases or remove unused records?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native support for mvexpand command in PPL

3 participants