## Verifying absence of integer overflow

One class of errors we need to guard against when writing critical software is arithmetic overflow. Before I go into detail, I invite you to consider the following program and decide what it prints:

#include<stdio.h>intmain(intargc,char*argv[]) {unsigned intx = 42;longy = -10; printf("%s\n", (x > y ? "Hello, normal world!" : "Hello, strange world!"));return0; }

I’ll return to this example later. For now, consider the function for averaging a set of readings stored in an array that I examined at in my previous two posts (ArC-specific bits are shown in green):

int16_t average(constint16_t *arrayreadings, size_t numReadings)pre(readings.lwb == 0; readings.upb == numReadings)pre(numReadings != 0)returns((+overreadings.all)/numReadings) {intsum = 0; size_t i;for(i = 0; i < numReadings; ++i)keep(i <= numReadings)keep(sum == +overreadings.all.take(i))decrease(numReadings - i) { sum += readings[i]; }return(int16_t)(sum/numReadings); }

I reported in the earlier post that ArC is able to verify this, apart from proving absence of integer overflow. Here are the verification warnings that ArC reports:

c:\escher\arctest\arctest14.c (17,13): Warning! Exceeded time limit proving: Arithmetic result of operator ‘+’ is within limit of type ‘int’, suggestion available (see C:\Escher\ArcTest\arctest14_unproven.htm#9), did not prove: minof(int) <= (sum$loopstart_5398,5$ + readings[i$loopstart_5398,5$ – readings.lwb]).

c:\escher\arctest\arctest14.c (17,13): Warning! Exceeded time limit proving: Arithmetic result of operator ‘+’ is within limit of type ‘int’, suggestion available (see C:\Escher\ArcTest\arctest14_unproven.htm#10), did not prove: (sum$loopstart_5398,5$ + readings[i$loopstart_5398,5$ – readings.lwb]) <= maxof(int).

c:\escher\arctest\arctest14.c (19,22): Warning! Unable to prove: Type constraint satisfied in implicit conversion from ‘int’ to ‘unsigned int’, suggestion available (see C:\Escher\ArcTest\arctest14_unproven.htm#18), did not prove: minof(unsigned int) <= sum$5398,5$.

Any time we do an explicit or implicit conversion from one type to another type that is not wider than the original or is defined by a constrained typedef, or we do certain arithmetic operations, we need to guard against integer overflow. This example contains several such operations.

The first implicit type conversion is when we initialize the loop counter *i* to the constant 0. We declared *i* to be of type **size_t**, which is an unsigned type, whereas the constant 0 has type (signed) **int **because we didn’t use the U suffix. So there is an implicit conversion from **int **to** size_t** here, and ArC will generate verification conditions that the constant 0 is in the range of **size_t**. Of course, the proof is trivial.

The other implicit type conversion is in the subexpression *sum/numReadings* in the **return **statement. Since *numReadings *has type **size_t** and **sum** has type **int**, the “usual arithmetic conversions” will be applied. In practice, **size_t** is never smaller than **int**, so there will be an implicit conversion of *sum *to **size_t**. ArC therefore generates the verification condition that the current value of *sum *is representable in** size_t**. As there is nothing to prevent sum from being negative, the proof fails – hence it emits the third warning message listed above.

In fact, ArC – like most ordinary static analysers and some compilers – warns about the signed/unsigned mismatch even before starting the proofs. Avoiding signed/unsigned mismatches in the first place is probably a more efficient strategy than dealing with the verification failures that often occur when you don’t. Had I followed my own advice to avoid using unsigned types, the signed/unsigned mismatch would not have arisen.

To fix the error, we can either change the type of the parameter *numReadings *to **int**, or cast it to type **int **before using it as the divisor. If we do the latter, we’ll also need to add a function precondition that *numReadings *is not too large to fit in an **int**.

For the integer operators + – and * it can be the case that applying the “usual arithmetic conversions” to an operator expression does not involve an unsafe implicit type conversion, yet the arithmetic result is not representable in the result type. The C standard defines the result in this case as undefined if the promoted operands are signed, or the arithmetic result modulo some *N *if they are unsigned. ArC treats it as an error in either case, because unsigned types are frequently used where modulo arithmetic is not the user’s intention; so it always generates verification conditions that the result is in type. This is the reason for the first two verification failures that ArC reports for our original example. The expression `sum += readings[i]`

could easily overflow, because the sum could be as high as *maxof(int16_t) * maxof(size_t)* or as low as *minof(int16_t) * maxof(size_t)*, which are larger than *maxof(int)* and *minof(int)* respectively. To avoid overflow, we will need to constrain either the minimum and maximum values of each reading, or the number of readings. Here’s our example with the precondition changed to limit the number of readings:

int16_t average(constint16_t *arrayreadings, size_t numReadings)pre(readings.lwb == 0; readings.upb == numReadings)pre(numReadings != 0; numReadings <= maxof(int)/(maxof(int16_t) + 1))returns((+overreadings.all)/numReadings) {intsum = 0; size_t i;for(i = 0; i < numReadings; ++i)keep(i <= numReadings)keep(sum == +overreadings.all.take(i))decrease(numReadings - i) { sum += readings[i]; }return(int16_t)(sum/(int)numReadings); }

I’ve referred to the maximum value supported by type **int **as *maxof( int)*. ArC provides type operators

*maxof(T)*and

*minof(T)*for your convenience, to save you from having to #include <limits.h> or similar just for specification purposes. I’ve also cast

*numReadings*to

**int**prior to using it in the division.

With the compiler parameters set to assume **int **is 32 bits and **short int** is 16 bits, ArC verifies this example completely. You may be wondering why the expression `(+ `

in the **over **readings.all)/numReadings**returns **specification doesn’t also generate a signed/unsigned warning or verification failure. That’s because integer arithmetic in specifications is always carried out after promoting operands to type **integer**, the ArC integral types that has no bounds.

Finally, let’s return to the short program I invited you to consider at the start of this post. Is the world normal or strange? On my computer, compiling with Microsoft Visual C++ 2008, it’s strange. That’s because for this compiler, both (unsigned) **int **and **long **are 32 bits. So both operands of less-than get converted to **unsigned long** … and -10 won’t fit! For the world to be normal, you’ll need to use a compiler for which **long **is larger than **int**, so that both operands get converted to (signed) **long **instead.

David,

With regard to your tease about C’s integer promotion rules, does ArC understand these rules and is able to show when expectations become broken, just as your example illustrates? If so, is this for one particular integer size setup, or can ArC help with producing fully portable code? Or is that an unrealistic dream for a C program!

Best Regards

Dave Banham

Dave,

Producing portable code is unfortunately very difficult in C. In developing the specification for C semantics in ArC, I found that there were more sources of non-portability than I had previously realised. Defining types such as ‘int16_t’ and ‘int32_t’ as MISRA suggests solves the problem of knowing that variables are big enough to hold the data you want to store, but unfortunately the semantics of integer operations on these types depends on whether they map to ‘short int’ and ‘int’, or to ‘int’ and ‘long int’, or to something else. This is compounded by the fact that the type of a constant such as 32768 depends on how many bits are in an ‘int’. So ArC requires you to configure the parameters of the compiler you are targeting, including the sizes of each integral type, so that it can determine the semantics on your target platform and warn you about broken expectations. If you intend to compile the same code for two platforms having different integral sizes, you need to run ArC verification separately for each of them.

If you want to produce maximally portable C code, you could take the approach we use in our C++ code generator in Perfect Developer:

1. Decide how many bits you need to represent any integral value you ever want to store and any intermediate result of a computation you will ever produce. Then pick a signed integral type on your target platform that is at least this big. For most embedded applications, I think a signed type with 32 bits should suffice.

2. Use a typedef to give this type a convenient name. In your program, declare all integral variables, parameters etc. to be of this type. Don’t use any other integral types. If you find yourself doing a sizeof() or otherwise getting handed a size_t or any other integral type, cast it immediately to your standard type before using it.

3. Declare a macro that you will use to surround every integer constant in your program in order to make sure it will have your chosen integral type. So instead of writing e.g. the constant 1 in your program, you will write e.g. INT(1), where INT(x) expands to x if your chosen type is equivalent to ‘int’, and xL if it is equivalent to ‘long int’.

If you do this, I think your program will be portable in respect of integer semantics, subject to defining your typedef and the INT() macro appropriately for each target platform.