Skip to content
This repository was archived by the owner on Oct 3, 2021. It is now read-only.
This repository was archived by the owner on Oct 3, 2021. It is now read-only.

More precise data model in task definitions? #1125

@PhilippWendler

Description

@PhilippWendler

Currently, the data_model key in a task definition is allowed to have the values ILP32 or LP64. This does not contain enough information to deduce the size of all possible C types.

For example, on 32-bit x86 Linux machines (a commonly known ILP32 platform), gcc uses a format with 80 bit precision to implement long doubles. This is somewhat hidden in the documentation, one can for example deduce it from the sentence "Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double." in the description of the option -m96bit-long-double in gcc's x86 docs, but one also confirm this by trying it out. clang seems to do the same.

However, on 32-bit ARM Linux machines, long double is defined as 64 bit size (source) although this is also an ILP32 platform.

The C standard does not say anything concrete about long double, only that it needs to be at least as large as double (which is fulfilled in both cases). Wikipedia has an overview over the precision of long double on many platforms and operating systems (it can vary from 64 to 128 bit).

This actually affects results and safety of programs, there is even an existing example in this repository: https://github.com/sosy-lab/sv-benchmarks/blob/master/c/floats-esbmc-regression/digits_for.c
This program contains a variable x that is declared as long double, and the task is defined to have data model ILP32. However, this task is only safe if long double has more than 64 bits (not sure exactly how many bits are needed, 80 bits are enough). This can even be easily shown without the need for an ARM machine by compiling on x86, once with gcc -mlong-double-64 (program will violate assertion and crash) and once with gcc -mlong-double-80 (program will not violate assertion).

So given only the data model ILP32 and assuming the program to be in GNU C (according to the SV-COMP rules), one cannot claim that this program is safe.

A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux. Then it would be clear that long double has a precision of 80 bit.

Another solution would be to exactly define the size and precision of each C type in the repo's documentation for both the ILP32 and LP64 data models.

Note that if we do the latter, we indeed need to define precision and size of a data type because these can be different. On 64-bit x86 Linux gcc uses 80 bit precision but 128 bit size for long double, on 32-bit x86 Linux gcc uses 80 bit precision and 96 bit size for long double.

A last possible solution would be to explicitly leave the size of long double and similar data types unspecified and claim that the linked program is unsafe (because there exists an ILP32 platform where it violates the assertion). However, this would be extremely difficult for verifiers to implement (particularly but not only verifiers based on some compiler infrastructure). Basically this would be as difficult has having to verify a program for both ILP32 and LP64 at the same time, and we also do not require this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions