Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a proof of concept for using integer range analysis to improve existing rules. I focused on the lints
cast_possible_truncation
,cast_possible_wrap
, andcast_sign_loss
for now, but range analysis is potentially useful in other existing rules as well.The range analysis itself is rather simplistic right now. The possible values of an expression are represented using a closed interval with the
IInterval
type. This representation is applicable to all integer expression and variables, even those we know nothing about except for the type. E.g. a function parameterx: u8
will have the range0..=255
, and the expressionx / 3 + 1
will have the range1..=86
.The interval arithmetic necessary to implement operations like addition, subtractions, and many std integer methods is currently implemented by the
Arithmetic
type. I already implemented most methods that return integers orOption<int_type>
.Example
While only a limited proof of concept, it is already enough to fix false positives in
cast_possible_truncation
such as #7486. In the following, I annotated the computed interval ranges for the code of the issue:As we can see, the range analysis is capable enough to determine that the
result
variable can only hold values in the range0..=u32::MAX-1
despite being of typeu64
. This is enough to eliminate the false positive.Here's a slightly more complex example where I annotated all expressions:
A few details
I want to mention a few details to hopefully make what I implemented more understandable for potential reviewers.
IInterval
doesn't just represent an integer interval, but a typed interval. I omitted types in the above annotations, but allIInterval
s know the underlying integer type.IInterval
s can be empty. This is important for operations that can panic. E.g.1_u32.strict_sub(2_u32)
will always panic, so it will be assigned the empty range (u32).All interval operations guarantee that they return a superset of the actual result. E.g. for
1 + 1
, the actual result interval is1..=1
, but the add implementation is allowed to return any superset of it, e.g.0..=255
. This is both necessary and useful.It's necessary because some operations don't output clean intervals. E.g.$\set{0,255}$ is
u8::wrapping_add(value, 1)
wherevalue
is254
or255
will result in the outputs255
or0
respectively. However, the smallest interval that contains the set0..=255
— a superset.It's useful, because it makes operations easier to implement. Some operations are very complex to implement, so allowing any superset makes it possible to start with a very simple implementation that returns a large superset and then later improve it.
Arithmetic
has both methods and static functions. This is because the dataArithmetic
carries represents compiler flags/options. If an operation depends on any flags/options, it's a method, and a static function otherwise. E.g.abs()
may panic or wrap depending on whether overflow checks are enabled, so it's a method, whilestrict_abs()
always panics, so it's a static function.IntervalCtxt
is the type that implements recursively evaluating expression intervals (inspired byConstEvalCtxt
).I want to stress that this is a proof of concept. This PR is in no state to be merged right now. Before I invest more time into this, I would like to ask (1) whether there is interest in clippy doing this type of range analysis in the first place and (2) whether there are people that could help me with the clippy side of things?
The main thing I need help with is testing (I think). Specifically:
I used property tests and snapshot tests to ensure the correctness of
Arithmetic
andIInterval
, but I'm not sure how to best integrate them. I say "used", because I initially worked onIInterval
andArithmetic
to be their own crate, but then I just copy-pasted the code for the sake of this PR. However, I didn't copy the tests.I don't care whether this code ends up inside clippy or as its own crate, I just want well-tested working code to make clippy smarter. So I would like to ask (1) whether
rinterval
should be its own crate (probably not), and if not (2) what's the best way to add the snapshot tests I had?I want to test the recursive expression evaluation too. I imagine snapshot tests similar to
uitest
, where I can just define a few source files and it will output a snapshot whether the ranges of all expressions are annotated, similar to what I did for the above examples. Is there any existing infrastructure I could use?Of course, other feedback and suggestions are also appreciated.
Before this PR is ready, some work still needs to be done:
1u64 as isize
->u64::MAX as isize
.usize
. I currently userustc_lint::context::LateContext::data_layout().ptr_sized_integer()
to mapusize
to a fixed-width integer type. Same forisize
. This works, but also means that the interval analysis can't be used to determine the intervals for 32-bit AND 64-bit system, only one is possible. This can potentially lead to false positives and false negatives in lints.I also want to point out that I see what I did in this PR as an MVP. In the future, I would like to make the range analysis even smarter (e.g. control-flow-based narrowing) and add floating-point ranges. Of course, I also want to use range analysis to improve other existing rules (which may even be done in this PR, idk).
Please tell me what you think.