-
Notifications
You must be signed in to change notification settings - Fork 178
Fix binning udf resolution / Add type coercion support for binning UDFs #4742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binning udf resolution / Add type coercion support for binning UDFs #4742
Conversation
1adc46f to
2c662ed
Compare
|
@ahkcs CI is failing. I retried but still failed. Could you double check? |
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
3ae31ab to
a93dcf8
Compare
Unit tests passed after rebase |
Signed-off-by: Kai Huang <ahkcs@amazon.com>
c2d3dd0 to
d0ddb19
Compare
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
docs/user/ppl/cmd/bin.rst
Outdated
| Example 20: Type coercion with string fields | ||
| ============================================== | ||
|
|
||
| The ``bin`` command automatically converts string fields containing valid numeric values to numeric types, enabling binning operations on keyword/text fields. Only strings that represent valid numbers (e.g., "25", "30.5", "100") are supported. Non-numeric strings will cause an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to remove comments
docs/user/ppl/cmd/bin.rst
Outdated
| bin <field> [span=<interval>] [minspan=<interval>] [bins=<count>] [aligntime=(earliest | latest | <time-specifier>)] [start=<value>] [end=<value>] | ||
|
|
||
| * field: mandatory. The numeric field to bin. | ||
| * field: mandatory. The field to bin. Accepts numeric fields, time-based fields, or string fields containing valid numeric values (e.g., "25", "30.5"). String fields with numeric values are automatically coerced to numeric type. Non-numeric strings will result in an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest remove this section.
Type conversion is a generic feature applied across all commands and functions, so it does not need to be explicitly mentioned in each one.
Additionally, enabling binning operations should target string-type fields, not keyword or text fields specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to remove this section
| throw new IllegalArgumentException( | ||
| "Cannot apply binning: field contains non-numeric string values. " | ||
| + "Only numeric types or string types with valid numeric values are supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return a null value. For example, if only one document contains an invalid field, the entire query should not fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, I have updated to remove the validation logic. I kept the validation logic in BinnableField to only verify if it's supported type(Reject truly unsupported types (e.g., BOOLEAN, ARRAY, MAP))
…Fs (#4742) * Fix binning udf resolution Signed-off-by: Kai Huang <ahkcs@amazon.com> * spotless fix Signed-off-by: Kai Huang <ahkcs@amazon.com> * update tes Signed-off-by: Kai Huang <ahkcs@amazon.com> * add comments Signed-off-by: Kai Huang <ahkcs@amazon.com> * removal Signed-off-by: Kai Huang <ahkcs@amazon.com> * update yaml test Signed-off-by: Kai Huang <ahkcs@amazon.com> * rerun Signed-off-by: Kai Huang <ahkcs@amazon.com> * update support for type coercion Signed-off-by: Kai Huang <ahkcs@amazon.com> * update doc Signed-off-by: Kai Huang <ahkcs@amazon.com> * enhance error handling and doc Signed-off-by: Kai Huang <ahkcs@amazon.com> * fixes Signed-off-by: Kai Huang <ahkcs@amazon.com> * fixes Signed-off-by: Kai Huang <ahkcs@amazon.com> * fixes Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix tests Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> (cherry picked from commit 20f2234) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
Fixes #4740 - Binning UDFs are now resolved through
PPLFuncImpTable.resolve()instead of being instantiated directly viamakeCall().Implemented automatic type coercion to allow string fields with numeric values to work seamlessly with all binning operations (bins, span, minspan, range).
Related Issues
PPLFuncImpTable#4740Type coercion support for binning UDFs #4356