The Taming of the Pointer – Part 2
In last Wednesday’s post I mentioned three ways in which pointers are troublesome in C/C++, and I introduced the null ArC keyword to mitigate one of them. Now I’ll turn to the second issue: the fact that given (say) a variable or parameter of type int*, the type does not allow us to determine whether it refers to a single int, or to an array of ints – nor, if it refers to an array, can we find how many elements the array contains.
There are two solutions to this. The best one is not to use arrays at all, except for string literals and static tables of constant data. If we are writing in C++, then we can use generic collection classes instead of arrays. I find the most useful classes are an Array class that mimics an array containing N elements of type T, and a Vector class that contains a variable number of up to N elements. [The Standard Template Library collection classes are more flexible, but rely on dynamic memory allocation, so are usually not appropriate in critical embedded systems.] If you’re new to C++, you might be interested to learn that arrays are rarely used by good C++ programmers (look up arrays are evil in a search engine if you want to know more).
I’ll say more about the Array and Vector classes in another post. If you’re stuck with C or with an “embedded C++” compiler that doesn’t support generics, or if you’re trying to implement the Array or Vector classes themselves, then you need a different solution.
ArC provides some features to tame array pointers. First, it requires them to be qualified with the keyword array. Here’s an example:
void copyError(const char * array msg, char * array dst, int dstSize) { ... }
The presence of the array keyword tells ArC that the msg and dst parameters point to an arrays rather than single values. If you leave it out, then ArC will not allow you to perform indexing or any other sort of pointer arithmetic on those parameters. When you compile the code, array becomes another macro with an empty expansion, so the compiler doesn’s notice it.
In this example, we’ve passed the size of the destination buffer in a separate int parameter, so that the code can limit how many characters it writes. However, in writing specifications, we often need to talk about the size of the array that a pointer points to even when we don’t have it available in a separate parameter. How do we do this? Well, ArC considers an array pointer to be a struct comprising three values: the pointer itself, the lower bound (i.e. index of the first element), and the limit (i.e. one plus the index of the last element). To refer to the lower bound or limit of dst we use the syntax dst.lwb or dst.lim respectively. We also allow dst.upb (for upper bound), which is defined as (dst.lim – 1). Of course, you can’t refer to these fields in code, but you can use them in specification constructs (e.g. preconditions, invariants, assertions) as much as you like. We call them ghost fields because they aren’t really there.
For example, let’s specify that when the copyError function is called, it assumes that dst points to an array with at least dstsize elements available. Here’s how we can specify that:
void copyError(const char * array msg, char * array dst, int dstSize) pre(dst.lim >= dstSize) { ... }
An array pointer in C/C++ may only address the first element of an array, or one-past-the-last element, or any element in between. If p addresses the first element of an array of N elements, then p.lwb == 0 and p.lim == N. If it addresses one-past-the-last element, then p.lwb = -N and p.lim = 0. So p has implicit invariant p.lwb <= 0 && p.lim >= 0.
Where does all of this get us? Well, within the body of copyError, ArC will attempt to prove that all accesses to msg and dst are in bounds. For example, the expression dst[i] has precondition dst.lwb <= i && i < dst.lim. Also, wherever copyError is called, ArC will attempt to prove that the precondition holds. So anywhere that buffer overflow is possible, there will be a corresponding a failed proof. If all the proofs succeed, and provided that no function with a precondition is ever called by external unproven code, we know that buffer overflow will not occur.
David,
You state that “An array pointer in C/C++ may only address the first element of an array, or one-past-the-last element, or any element in between.” Can you explain why ArC would allow it to point at “one-past-the-last element”?
Thanks
Dave B.
One of the decisions we had to make was whether to support pointer arithmetic in ArC. MISRA C 2004 includes a rule prohibiting pointer arithmetic. However, some of the SIL 2 client code we have looked at does occasionally use it. The typical case is to iterate over an array using code of the form "for (T* p = start; p < start + count; ++p) {…}". I'll say more about pointer arithmetic in another post. Suffice to say that formal verification of pointer arithmetic is not difficult, and we decided to support it. This meant that we also had to support the "one-past the-end" pointer, such as "start + count" in the above example.
Have you looked at Deputy? It does a mixture of run-time checking and static verification to ensure memory-safety properties (such as that array dereferences are in-bounds). Given your interests, it seems worth looking at.
Run-time checks are useful – especially during testing – but proving absence of run-time errors is better. What are you going to do if a run-time check fails in actual use? There are many systems for which shutting down the system is not a realistic option, e.g. in aerospace and automotive markets.
I share your reservations about run-time checks. In your case, what you might find most interesting about Deputy is the syntax, as well as the kinds of annotations, preconditions, and invariants that the system supports and how it supports it. They have some experience with building code using their system, so I think there are some interesting lessons to be learned about what kinds of invariants come up frequently and how to enable them to be specified in a way that is as convenient for programmers as possible. In addition, I’m under the impression that Deputy has been used in an embedded environment (TinyOS). Sorry that I didn’t explain this well in my original comment.