-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
Decimal numbers with zero scale and some fractional digits in the input string are being incorrectly/inconsistently parsed. For instance parsing "1.0" with precision 3 and scale 0 will return 10 (instead of 1), and parsing "123.0" with precision 3 and scale 0 will panic with parse decimal overflow (123.0) (instead of returning 123).
To Reproduce
Add this set of test cases to test_parse_decimal_with_parameter in arrow-cast
let zero_scale_tests = [
("1.0", 1), // 10
("1.2", 1), // 12
("1.00", 1), // 100
("1.23", 1), // 123
("1.000", 1), // "parse decimal overflow (1.000)"
("1.123", 1), // "parse decimal overflow (1.123)"
("123.0", 123), // "parse decimal overflow (123.0)"
("123.4", 123), // "parse decimal overflow (123.4)"
("123.00", 123), // "parse decimal overflow (123.00)"
("123.45", 123), // "parse decimal overflow (123.45)"
];
for (s, i) in zero_scale_tests {
let result_128 = parse_decimal::<Decimal128Type>(s, 3, 0).unwrap();
assert_eq!(i, result_128);
}Expected behavior
None of the above cases returns the (in some cases maybe disputably) expected results. The comment besides each case shows what the returned value/err is.
Depending on how liberal/flexible parse_decimal is intended to be, I can see there being a couple of modalities when it comes to expected behavior of above cases (i.e. when there are decimal points but the scale is zero)
- error out in all such cases
- only numbers with a single zero in the decimal part can be parsed, others must error out
- only numbers with all zeros in the decimal part can be parsed, others must error out
- all above numbers can be parsed
Given that parse_decimal already does some truncation
arrow-rs/arrow-cast/src/parse.rs
Line 2529 in d519bb8
| ("123.1234", 123123i128), |
Additional context
One way this problem is manifested is when parsing min/max values in Delta tables. We've ran into a case where a decimal/numeric type with precision 10 and scale 0 has one of those values set as "1026600838.0", which in turn crashes the json reader when trying to parse the log file due to a overflow, so the table ends up broken. (cc @ion-elgreco, relatedly it is also somewhat unfortunate that delta-rs picked f64 to serialize the decimal type)