Skip to content

PoC: Integer range analysis #15342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

RunDevelopment
Copy link

@RunDevelopment RunDevelopment commented Jul 25, 2025

This PR is a proof of concept for using integer range analysis to improve existing rules. I focused on the lints cast_possible_truncation, cast_possible_wrap, and cast_sign_loss for now, but range analysis is potentially useful in other existing rules as well.

The range analysis itself is rather simplistic right now. The possible values of an expression are represented using a closed interval with the IInterval type. This representation is applicable to all integer expression and variables, even those we know nothing about except for the type. E.g. a function parameter x: u8 will have the range 0..=255, and the expression x / 3 + 1 will have the range 1..=86.

The interval arithmetic necessary to implement operations like addition, subtractions, and many std integer methods is currently implemented by the Arithmetic type. I already implemented most methods that return integers or Option<int_type>.

Example

While only a limited proof of concept, it is already enough to fix false positives in cast_possible_truncation such as #7486. In the following, I annotated the computed interval ranges for the code of the issue:

fn issue7486(number: u64, other: u32) -> u32 {
    let other_u64 = other as u64;
    // ^ range: 0..=u32::MAX
    let result = number % other_u64;
    // ^ range: 0..=u32::MAX-1
    result as u32
}

As we can see, the range analysis is capable enough to determine that the result variable can only hold values in the range 0..=u32::MAX-1 despite being of type u64. This is enough to eliminate the false positive.

Here's a slightly more complex example where I annotated all expressions:

fn unorm16_to_unorm8(x: u16) -> u8 {
    //               ^ 0..=65535
    ((x as u32 * 255 + 32895) >> 16) as u8
    //^~~~~~~~ 0..=65535
    //^~~~~~~~~~~~~~ 0..=16711425
    //^~~~~~~~~~~~~~~~~~~~~~ 32895..=16744320
    //^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0..=255
}

A few details

I want to mention a few details to hopefully make what I implemented more understandable for potential reviewers.

  1. IInterval doesn't just represent an integer interval, but a typed interval. I omitted types in the above annotations, but all IIntervals know the underlying integer type.

  2. IIntervals can be empty. This is important for operations that can panic. E.g. 1_u32.strict_sub(2_u32) will always panic, so it will be assigned the empty range (u32).

  3. All interval operations guarantee that they return a superset of the actual result. E.g. for 1 + 1, the actual result interval is 1..=1, but the add implementation is allowed to return any superset of it, e.g. 0..=255. This is both necessary and useful.

    It's necessary because some operations don't output clean intervals. E.g. u8::wrapping_add(value, 1) where value is 254 or 255 will result in the outputs 255 or 0 respectively. However, the smallest interval that contains the set $\set{0,255}$ is 0..=255 — a superset.

    It's useful, because it makes operations easier to implement. Some operations are very complex to implement, so allowing any superset makes it possible to start with a very simple implementation that returns a large superset and then later improve it.

  4. Arithmetic has both methods and static functions. This is because the data Arithmetic carries represents compiler flags/options. If an operation depends on any flags/options, it's a method, and a static function otherwise. E.g. abs() may panic or wrap depending on whether overflow checks are enabled, so it's a method, while strict_abs() always panics, so it's a static function.

  5. IntervalCtxt is the type that implements recursively evaluating expression intervals (inspired by ConstEvalCtxt).


I want to stress that this is a proof of concept. This PR is in no state to be merged right now. Before I invest more time into this, I would like to ask (1) whether there is interest in clippy doing this type of range analysis in the first place and (2) whether there are people that could help me with the clippy side of things?

The main thing I need help with is testing (I think). Specifically:

  • I used property tests and snapshot tests to ensure the correctness of Arithmetic and IInterval, but I'm not sure how to best integrate them. I say "used", because I initially worked on IInterval and Arithmetic to be their own crate, but then I just copy-pasted the code for the sake of this PR. However, I didn't copy the tests.

    I don't care whether this code ends up inside clippy or as its own crate, I just want well-tested working code to make clippy smarter. So I would like to ask (1) whether rinterval should be its own crate (probably not), and if not (2) what's the best way to add the snapshot tests I had?

  • I want to test the recursive expression evaluation too. I imagine snapshot tests similar to uitest, where I can just define a few source files and it will output a snapshot whether the ranges of all expressions are annotated, similar to what I did for the above examples. Is there any existing infrastructure I could use?

Of course, other feedback and suggestions are also appreciated.

Before this PR is ready, some work still needs to be done:

  1. Lints need to actually integer ranges to reject false positives. Right now, I just made the 3 lints I mentioned at the start evaluate the intervals for all reported expressions and report it as a note, because I had no other way of seeing the ranges.
  2. Casting UI tests need to be re-written, because a lot of existing case are false positives. E.g. 1u64 as isize -> u64::MAX as isize.
  3. Caching. Right now, the expression evaluation is entirely uncached. For the sake of performance, this should obviously be changed.
  4. usize. I currently use rustc_lint::context::LateContext::data_layout().ptr_sized_integer() to map usize to a fixed-width integer type. Same for isize. This works, but also means that the interval analysis can't be used to determine the intervals for 32-bit AND 64-bit system, only one is possible. This can potentially lead to false positives and false negatives in lints.

I also want to point out that I see what I did in this PR as an MVP. In the future, I would like to make the range analysis even smarter (e.g. control-flow-based narrowing) and add floating-point ranges. Of course, I also want to use range analysis to improve other existing rules (which may even be done in this PR, idk).

Please tell me what you think.

Copy link

github-actions bot commented Jul 25, 2025

Lintcheck changes for 4a888fa

Lint Added Removed Changed
clippy::cast_possible_truncation 0 0 1705
clippy::cast_possible_wrap 0 79 848
clippy::cast_sign_loss 0 26 1186

This comment will be updated if you push new changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant