Skip to content

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Oct 24, 2025

8702: To review by AI

Summary by CodeRabbit

  • Bug Fixes

    • Improved decimal number parsing to properly handle edge cases with scale 0, including better error reporting for unsupported formats.
  • Tests

    • Extended test coverage for decimal parsing edge cases and overflow scenarios.

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

Refactors decimal parsing by replacing a local iterator variable with an explicit iterator and updating loop traversals. Adds conditional logic to handle scale-0 decimal parsing: when fractional digits exist and input ends with ".0", the zero is folded into the integer part; otherwise a parse error is returned. Extends test coverage with overflow and zero-scale scenarios.

Changes

Cohort / File(s) Summary
Iterator refactoring and decimal scale-0 handling
arrow-cast/src/parse.rs
Replaces local iterator variable bs with bs_iter throughout decimal parsing loops. Adds explicit scale-0 handling for decimals with fractional digits: accepts ".0" suffix by folding into integer part, rejects other fractional cases.
Test coverage expansion
arrow-cast/src/parse.rs
Adds overflow test case "12345678900.0" and introduces zero-scale acceptance/rejection tests for Decimal128Type covering strings with trailing ".0" and various zero-scale parsing scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Parser as Decimal Parser
    participant Logic as Scale-0 Logic
    participant Result as Parse Result

    Parser->>Logic: is_e_notation == false AND scale == 0?
    alt Scale is 0 and fractional digits exist
        Logic->>Logic: Check if input ends with ".0"
        alt Input ends with ".0" (single fractional digit)
            Logic->>Logic: Fold zero into integer part
            Logic->>Logic: Clear fractional state
            Logic->>Result: Return success
        else Other fractional digits present
            Logic->>Result: Return parse error<br/>(cannot parse with scale 0)
        end
    else Scale is not 0 or no fractional digits
        Logic->>Result: Continue standard parsing
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes involve careful logic modifications to decimal parsing with conditional branching, iterator pattern adjustments requiring verification across multiple code paths, and meaningful test coverage changes. Review demands understanding of decimal number parsing semantics and validation of the scale-0 edge case handling.

Poem

🐰 Hoppy parsing hops along,
Iterators now more strong,
Scale-zero decimals find their way,
Tests keep bugs at bay,
Numbers dance in perfect song!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description "8702: To review by AI" is extremely minimal and does not follow the required template structure. The description is missing all critical sections specified in the template: it does not include which issue it closes (beyond the inference from the title), lacks any rationale for the change, provides no summary of the individual changes, does not confirm whether tests are included, and does not address user-facing changes or breaking changes. The description is largely incomplete and reads as a placeholder rather than a substantive PR description. The PR description must be substantially expanded to follow the required template. Please add the following sections: a clear statement of which issue this PR closes (e.g., "Closes apache#8702"), a rationale explaining why this strict parsing fix is necessary, a summary of the changes made, confirmation that tests have been added (including specifics about what scenarios are tested), and any information about user-facing changes or API impacts.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "8702: fix: strict parsing of decimals with zero scale" clearly summarizes the main change in the changeset. According to the raw summary, the primary changes involve adding explicit handling for parsing decimals with scale 0 and extending test coverage for zero-scale scenarios. The title is concise, specific, and accurately reflects this core objective without vague language or unnecessary noise.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-8702-2025-10-24-10-56-59

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the arrow label Oct 24, 2025
Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

@claude
Copy link

claude bot commented Oct 24, 2025

Pull Request Review

Summary

This PR fixes strict parsing of decimals with zero scale by adding validation to reject fractional values (except .0) when scale=0. The implementation is sound with good test coverage.


Code Quality and Best Practices ✅

Positive aspects:

  • Variable naming improvement: The rename from bs to bs_iter (line 878, 898) improves code clarity by making it explicit that we're working with an iterator.
  • Clear comments: The added comment (lines 967-969) explains the edge case logic well.
  • Consistent error messages: New error message follows the existing pattern in the codebase.

Implementation Analysis ✅

The core logic (lines 966-979):

The code checks if scale is 0 but fractional digits were parsed, and handles the special case of .0 suffix appropriately.

Analysis:

  1. ✅ Correctly identifies when scale is 0 but fractional digits were parsed
  2. ✅ Special case for .0 suffix is appropriate (allows common notation like 123.0)
  3. ✅ Uses div_wrapping to remove the parsed 0, maintaining consistency with the wrapping arithmetic
  4. ✅ Properly updates digits and fractionals counters after the division
  5. ✅ Returns clear error for other cases

Edge Cases and Testing ✅

Logic Verification:

  • For input 123.0 with scale=0 → fractionals=1, ends_with confirms digit is 0 ✅
  • For input 123.00 with scale=0 → fractionals=2, check fails, returns error ✅
  • For input 123.45 with scale=0 → fractionals=2, check fails, returns error ✅

Excellent test coverage added:

  1. Positive cases (lines 2772-2776): Tests 1.0 and 123.0
  2. Negative cases (lines 2778-2789): Comprehensive coverage including 1.2, 1.00, 1.23, 1.000, 1.123, 123.4, 123.00, 123.45
  3. Overflow test (line 2654): Ensures fix does not break overflow detection

Suggested additional tests (not blockers):

  • Negative numbers with .0 suffix like -1.0
  • Explicit positive sign like +1.0
  • Zero cases: 0.0 (should accept) and 0.00 (should reject)

Performance and Security ✅

Performance: Minimal O(1) overhead, no allocations, efficient operations

Security: No issues - strengthens input validation, uses safe Rust, no panics


Recommendations

Medium Priority:

  1. Add test cases for negative numbers and positive signs with .0 suffix
  2. Add test for 0.0 with scale 0

Low Priority:

  1. Document why .0 is specifically allowed
  2. Add comment explaining special scale==0 behavior around line 919-928

Final Verdict: ✅ LGTM with minor suggestions

This is a well-implemented fix that solves the problem correctly, maintains backward compatibility, has comprehensive test coverage, and follows existing patterns. Ready to merge.

File: arrow-cast/src/parse.rs:966-979
Type: Bug fix
Risk: Low

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
arrow-cast/src/parse.rs (1)

898-914: Fix E-notation scale-0 bypass by applying same strictness as non-E-notation path

The review comment correctly identifies an inconsistency. The non-E-notation path enforces scale-0 strictness (rejecting inputs like "1.2" but accepting "1.0") at lines 966-978, but the E-notation path bypasses this check entirely at lines 901 and 943. The parse_e_notation function lacks scale-0 validation, allowing "1.2e0" with scale 0 to parse when it should error.

Existing tests confirm the gap: scale-0 cases like ("1.234e16", "12340000000000000", 0) only test inputs without fractional parts before 'e'. No tests cover fractional E-notation with scale 0 (e.g., "1.2e0", "1.0e0").

Recommend Option B (allow only .0e...), as it mirrors the existing non-E-notation rule which accepts "1.0" with scale 0. Add the pre-validation check before both parse_e_notation calls (lines 901 and 943), and include test cases for "1.2e0", "1.0e0", and "1.0e-1" with scale 0 to encode the intended behavior.

🧹 Nitpick comments (3)
arrow-cast/src/parse.rs (3)

966-979: Good: strict scale-0 handling for non‑E decimals; tiny safety nit.

Logic to fold a trailing ".0" and error otherwise is correct and aligns with the stated intent. Consider a minor safety/readability tweak to avoid any chance of underflow on u8:

-                digits -= 1;
+                digits = digits.saturating_sub(1);

2771-2776: Zero-scale acceptance tests look good; add a few more edges.

Consider also asserting these cases:

  • ".0" => 0
  • "0.0" => 0
  • "-1.0" => -1

This tightens coverage around signs and missing integer parts.


2778-2789: Great rejection set; add E-notation counterparts to lock semantics.

Add cases to make the E-notation behavior explicit under scale 0:

  • Reject: "1.2e0", "1.00e0", "1.0e-1"
  • Decide on: "1.0e0" and "1.0e10" (accept or reject per chosen policy in code)

Example test additions:

@@
-        let zero_scale_err = [
+        let zero_scale_err = [
             "1.2", "1.00", "1.23", "1.000", "1.123", "123.4", "123.00", "123.45",
+            "1.2e0", "1.00e0", "1.0e-1",
         ];

If you choose Option A (strict disallow), also add to zero_scale_err: "1.0e0", "1.0e10".

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3b262 and 5a43c23.

📒 Files selected for processing (1)
  • arrow-cast/src/parse.rs (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: MIRI
  • GitHub Check: Rustdocs are clean
  • GitHub Check: Test
  • GitHub Check: Check Compilation
  • GitHub Check: Test on Mac
  • GitHub Check: Pyarrow C Data Interface (stable, 16)
🔇 Additional comments (2)
arrow-cast/src/parse.rs (2)

878-885: Iterator refactor reads clean and preserves indices.

Using enumerate().skip(signed as usize) keeps indices aligned to the original byte slice (including the sign) and works with downstream uses (e.g., point_index). Looks good.


2655-2655: Nice overflow coverage addition.

Adding "12345678900.0" to overflow cases (scale 0, precision 10) is a useful regression guard, especially after folding ".0".

@github-actions
Copy link

Findings

  • arrow-json/src/writer/encoder.rs:346 – Blocking: the match arm for DataType::BinaryView has been removed, so LineDelimitedWriter/ArrayWriter will now reject BinaryViewArrays with “Unsupported data type” instead of emitting the prior hex JSON. This is a regression from current behaviour; please restore the BinaryView encoder (and its tests) so view-backed binary columns continue to work.
  • arrow-json/src/writer/encoder.rs:698 – Blocking: the map key guard no longer allows DataType::Utf8View. Existing callers that produce MapArrays with view-backed string keys (previously supported and covered by tests) will now get a JSON error. Please keep accepting Utf8View here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants