Skip to content

Incorrect behavior of parse_decimal with zero scale #8699

@gruuya

Description

@gruuya

Describe the bug
Decimal numbers with zero scale and some fractional digits in the input string are being incorrectly/inconsistently parsed. For instance parsing "1.0" with precision 3 and scale 0 will return 10 (instead of 1), and parsing "123.0" with precision 3 and scale 0 will panic with parse decimal overflow (123.0) (instead of returning 123).

To Reproduce
Add this set of test cases to test_parse_decimal_with_parameter in arrow-cast

        let zero_scale_tests = [
            ("1.0", 1),      // 10
            ("1.2", 1),      // 12
            ("1.00", 1),     // 100
            ("1.23", 1),     // 123
            ("1.000", 1),    // "parse decimal overflow (1.000)"
            ("1.123", 1),    // "parse decimal overflow (1.123)"
            ("123.0", 123),  // "parse decimal overflow (123.0)"
            ("123.4", 123),  // "parse decimal overflow (123.4)"
            ("123.00", 123), // "parse decimal overflow (123.00)"
            ("123.45", 123), // "parse decimal overflow (123.45)"
        ];
        for (s, i) in zero_scale_tests {
            let result_128 = parse_decimal::<Decimal128Type>(s, 3, 0).unwrap();
            assert_eq!(i, result_128);
        }

Expected behavior
None of the above cases returns the (in some cases maybe disputably) expected results. The comment besides each case shows what the returned value/err is.

Depending on how liberal/flexible parse_decimal is intended to be, I can see there being a couple of modalities when it comes to expected behavior of above cases (i.e. when there are decimal points but the scale is zero)

  1. error out in all such cases
  2. only numbers with a single zero in the decimal part can be parsed, others must error out
  3. only numbers with all zeros in the decimal part can be parsed, others must error out
  4. all above numbers can be parsed

Given that parse_decimal already does some truncation

("123.1234", 123123i128),
I'm inclined to think the last modality might be fine. If not, then I think at least modality 1 makes sense at minimum.

Additional context
One way this problem is manifested is when parsing min/max values in Delta tables. We've ran into a case where a decimal/numeric type with precision 10 and scale 0 has one of those values set as "1026600838.0", which in turn crashes the json reader when trying to parse the log file due to a overflow, so the table ends up broken. (cc @ion-elgreco, relatedly it is also somewhat unfortunate that delta-rs picked f64 to serialize the decimal type)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions