-
Notifications
You must be signed in to change notification settings - Fork 170
More precise data model in task definitions? #1125
Description
Currently, the data_model
key in a task definition is allowed to have the values ILP32
or LP64
. This does not contain enough information to deduce the size of all possible C types.
For example, on 32-bit x86 Linux machines (a commonly known ILP32
platform), gcc
uses a format with 80 bit precision to implement long double
s. This is somewhat hidden in the documentation, one can for example deduce it from the sentence "Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double." in the description of the option -m96bit-long-double
in gcc's x86 docs, but one also confirm this by trying it out. clang
seems to do the same.
However, on 32-bit ARM Linux machines, long double
is defined as 64 bit size (source) although this is also an ILP32
platform.
The C standard does not say anything concrete about long double
, only that it needs to be at least as large as double
(which is fulfilled in both cases). Wikipedia has an overview over the precision of long double
on many platforms and operating systems (it can vary from 64 to 128 bit).
This actually affects results and safety of programs, there is even an existing example in this repository: https://github.com/sosy-lab/sv-benchmarks/blob/master/c/floats-esbmc-regression/digits_for.c
This program contains a variable x
that is declared as long double
, and the task is defined to have data model ILP32
. However, this task is only safe if long double
has more than 64 bits (not sure exactly how many bits are needed, 80 bits are enough). This can even be easily shown without the need for an ARM machine by compiling on x86, once with gcc -mlong-double-64
(program will violate assertion and crash) and once with gcc -mlong-double-80
(program will not violate assertion).
So given only the data model ILP32
and assuming the program to be in GNU C (according to the SV-COMP rules), one cannot claim that this program is safe.
A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux
. Then it would be clear that long double
has a precision of 80 bit.
Another solution would be to exactly define the size and precision of each C type in the repo's documentation for both the ILP32
and LP64
data models.
Note that if we do the latter, we indeed need to define precision and size of a data type because these can be different. On 64-bit x86 Linux gcc
uses 80 bit precision but 128 bit size for long double
, on 32-bit x86 Linux gcc
uses 80 bit precision and 96 bit size for long double
.
A last possible solution would be to explicitly leave the size of long double
and similar data types unspecified and claim that the linked program is unsafe (because there exists an ILP32
platform where it violates the assertion). However, this would be extremely difficult for verifiers to implement (particularly but not only verifiers based on some compiler infrastructure). Basically this would be as difficult has having to verify a program for both ILP32
and LP64
at the same time, and we also do not require this.