Skip to content

Implement predicate pushdown for cidrmatch function #1110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ykmr1224
Copy link
Collaborator

@ykmr1224 ykmr1224 commented Apr 14, 2025

Description

  • Implement predicate pushdown for cidrmatch function
  • Ideally, IpEqual expression should be able to be directly considered to be predicate and pushed down, but it was not possible due to following reasons
    • To enable pushdown, we need to convert UDF to ApplyFunctionExpression within Catalyst optimization rule so V2ExpressionBuilder can convert it to V2Expression
    • ApplyFunctionExpression doesn't implement Predicate interface, and it cannot be considered as Predicate.
    • When we specify UDF returning boolean like WHERE cidrmatch(ip, '192.168.0.10/32'), it won't be considered as predicate due to above reason.
    • Even if we try to workaround like WHERE cidrmatch(ip, '192.168.0.10/32') = true, spark automatically convert it to WHERE cidrmatch(ip, '192.168.0.10/32') and fail with the same reason.
  • To workaround above blocker, I made Catalyst optimizer rule to convert cidrmatch(ip, '192.168.0.10/32') to ip_compare(ip, '192.168.0.10/32') = 0 so the condition would generate EqualTo predicate instead of ApplyFunctionExpression
  • Previous PR to implement IPAddress type: Support OpenSearch ip field type (UDT) #1100

Related Issues

#1034

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff
  • Add backport 0.x label if it is a stable change which won't break existing feature

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ykmr1224 ykmr1224 force-pushed the os-ds/ip-udt-pushdown branch 2 times, most recently from 233293c to 2708013 Compare April 17, 2025 16:32
@ykmr1224 ykmr1224 changed the title Implement predicate pushdown for ip_equal function Implement predicate pushdown for cidrmatch function Apr 17, 2025
@ykmr1224 ykmr1224 added the backport 0.x Backport to 0.x branch (stable branch) label Apr 17, 2025
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
@ykmr1224 ykmr1224 force-pushed the os-ds/ip-udt-pushdown branch from 2708013 to 838b617 Compare April 17, 2025 16:41
@ykmr1224 ykmr1224 marked this pull request as ready for review April 17, 2025 16:47
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
* pushdown logic. This is a workaround to convert to a EqualTo predicate. The converted predicate
* will be pushed down to OpenSearch query by {@link FlintQueryCompiler}.
*/
object OpenSearchCidrMatchConvertRule extends Rule[LogicalPlan] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is really complicated. With more UDF registered in PPL in the future, we have to convert all of them to Predicate, right? If this is the right approach, any chance to make this generic enough to work with any UDF?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workaround is needed only for UDF which returns boolean. I think we need to fix Spark code to simplify this.

I think it can be somewhat generalized. Can we work on it as separate task?

case Some(keywordField) =>
Some(s"""{"term":{"$keywordField":{"value":$value}}}""")
case None => None // Return None for unsupported text fields
if (isIpCompare(p)) {
Copy link
Collaborator

@dai-chen dai-chen Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make each UDF an individual Predicate instead of deep nested conditional checks in this single case?

Copy link
Collaborator Author

@ykmr1224 ykmr1224 Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was something I tried, but it was not possible due to current implementation of Spark. It is only for the case when we want to filter by UDF returning boolean, but Spark convert UDF to UserDefinedScalarFunc, and it cannot be considered as Predicate, and it won't be pushed down.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 0.x Backport to 0.x branch (stable branch) backport 0.7 DataSource:OpenSearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants