-
Notifications
You must be signed in to change notification settings - Fork 40
Implement predicate pushdown for cidrmatch function #1110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
233293c
to
2708013
Compare
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
2708013
to
838b617
Compare
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
...park-integration/src/main/scala/org/apache/spark/sql/flint/FlintPartitionReaderFactory.scala
Outdated
Show resolved
Hide resolved
flint-spark-integration/src/main/scala/org/apache/spark/sql/flint/FlintScanBuilder.scala
Outdated
Show resolved
Hide resolved
* pushdown logic. This is a workaround to convert to a EqualTo predicate. The converted predicate | ||
* will be pushed down to OpenSearch query by {@link FlintQueryCompiler}. | ||
*/ | ||
object OpenSearchCidrMatchConvertRule extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is really complicated. With more UDF registered in PPL in the future, we have to convert all of them to Predicate, right? If this is the right approach, any chance to make this generic enough to work with any UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround is needed only for UDF which returns boolean. I think we need to fix Spark code to simplify this.
I think it can be somewhat generalized. Can we work on it as separate task?
case Some(keywordField) => | ||
Some(s"""{"term":{"$keywordField":{"value":$value}}}""") | ||
case None => None // Return None for unsupported text fields | ||
if (isIpCompare(p)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make each UDF an individual Predicate instead of deep nested conditional checks in this single case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was something I tried, but it was not possible due to current implementation of Spark. It is only for the case when we want to filter by UDF returning boolean, but Spark convert UDF to UserDefinedScalarFunc, and it cannot be considered as Predicate, and it won't be pushed down.
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Description
WHERE cidrmatch(ip, '192.168.0.10/32')
, it won't be considered as predicate due to above reason.WHERE cidrmatch(ip, '192.168.0.10/32') = true
, spark automatically convert it toWHERE cidrmatch(ip, '192.168.0.10/32')
and fail with the same reason.cidrmatch(ip, '192.168.0.10/32')
toip_compare(ip, '192.168.0.10/32') = 0
so the condition would generateEqualTo
predicate instead ofApplyFunctionExpression
Related Issues
#1034
Check List
--signoff
backport 0.x
label if it is a stable change which won't break existing featureBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.