Home > C and C++ in critical systems, Formal verification of C programs > Reasoning about null-terminated strings in C/C++

Reasoning about null-terminated strings in C/C++

February 23, 2010

In my last post I described how ArC supports reasoning about array access, by allowing you to refer to the bounds of an array in a specification. If the code itself needs to know the size of an array, then the size is provided by other means, for example by passing it as an extra parameter. However, when using arrays of characters, standard practice in C is not to pass the number of elements, but to use a null character to indicate the end. Indeed, C and C++ string literals are compiled to static arrays of characters with a terminating null; and many standard C library functions assume that their string arguments are null-terminated.

There are dangers here. Consider the following code:

#include "string.h"
 void foo(const char* msg)
{ char buf[20];
strncpy(buf, msg, 25);
...
...strlen(buf)...
}

The programmer thinks he has protected against overflowing buf by using strncpy instead of strcpy. However, a buffer size of 25 has been passed to strncpy, when the buffer is only 20 characters long. Maybe the buffer used to be 25 characters but was subsequently changed to 20. Using “magic numbers” like 20 and 25 in code is bad practice. We should instead declare const int fooBuflen = 20; and then use fooBuflen wherever we need the length of the buffer, thereby avoiding this kind of error.

Even when we do correct this error, there is another problem. strncpy doesn’t guarantee to include a null terminator when writing to the destination. So the subsequent strlen call can end up reading well beyond the end of buf, resulting in undefined behaviour.

Here’s the same example annotated for ArC:

#include "arc.h"
#include "arc_string.h"
#include "string.h"
void foo(const char* array msg)
pre(isNullTerminated(msg))
{ char buf[20];
strncpy(buf, msg, 25);
...
...strlen(buf)...
}

As well as specifying that msg is an array, I’ve included an extra header file arc_string.h and added precondition isNullTerminated(msg). Here’s part of file arc_string.h:

ghost bool isNullTerminated(const char* array src)
post(result == (exists i in 0..src.upb :- src[i] == (char)0));

We’ve provided the function isNullTerminated, so that in specifications we can tell whether a char* array pointer addresses a null-terminated string. It’s declared ghost to indicate that it has no code, so it can only be called from specifications. It’s postcondition states that it returns true if and only if there is some index in the range 0 to the upper bound of src such that the corresponding element of src is a null character. That’s what the exists expression means (the “:-” symbol can be read as “such that”).

Here’s another extract from arc_string.h:

spec size_t strlen(const char* array src)
pre(isNullTerminated(src))
post(result < src.lim)
post(src[result] == (char)0)
post(forall i in 0..(result - 1) :- src[i] != (char)0);

spec char* array strncpy(char* array dst, const char* array src, size_t len)
pre(isNullTerminated(src))
pre(len <= dst.lim)
post(forall i in 0..(len - 1)
:- dst[i] == (i <= strlen(src) ? src[i] : old dst[i]))
post(result == dst);

These provide ArC with specifications of the standard library functions strlen and strncpy. The spec keyword tells ArC that this is the specification for a function declared elsewhere in un-annotated C/C++, so that it doesn’t complain about conflicting declarations. We’ve specified in preconditions that both functions require null-terminated source parameters, and that the destination parameter of strncpy must be an array of at least len bytes.  We’ve also provided postconditions to described the effect of calling these functions, so that ArC can reason about the program state after they are called. The forall expression is similar to the exists expression described earlier, except that it returns true if the condition after the “:-” symbol is true for all the values in the specified range.

When we run ArC over the annotated example, it reports two verification problems as expected:

c:\escher\arctest\arctest6.c (15,5): Warning! Unable to prove: Precondition of ‘strncpy’ satisfied (defined at c:\escher\arctest\arc_string.h (16,14)) (see C:\Escher\ArcTest\arctest6_unproven.htm#17), cannot prove: 25 <= char* array {buf$uninitialised$}.lim().

c:\escher\arctest\arctest6.c (16,18): Warning! Unable to prove: Precondition of ‘strlen’ satisfied (defined at c:\escher\arctest\arc_string.h (8,10)) (see C:\Escher\ArcTest\arctest6_unproven.htm#18), cannot prove: isNullTerminated(char const* array {buf$15,5$}).

If the expressions in the warning messages look strange, that’s because they currently come out in a mixture of C and Perfect syntax. We’ll need to tidy these up before we release ArC.

%d bloggers like this: