## Expressing the Inexpressible

When writing preconditions, postconditions and other specifications for C programs, sometimes we need to write expressions that can’t be expressed in plain C. That’s why formal verification systems based on annotated programming languages almost always augment the expression sublanguage with **forall **and **exists **expressions. In previous posts, I’ve introduced ArC’s implementations of these. For example, the following expression yields **true **if all elements of the array *arr *are between 0 and 100 inclusive:

forallindinarr.indices :- arr[ind] >= 0 && arr[ind] <= 100

Here, *ind *is declared as a bound variable that ranges over the values in the expression that follows the keyword **in**, which in this case is all the indices into *arr*.

Similarly, this expression:

existsindin0..9 :- arr[ind] >= 0 && arr[ind] <= 100

yields **true **if at least one of the first ten elements is in the range 0 to 100. I’ve used one of ArC’s special operators here: the range operator “..”, which yields a sequence of values from its first operand up to its second operand. In fact, in the first example,* arr.indices* is defined as *arr.lwb .. arr.upb*, so I was implicitly using the range operator there too.

Providing **forall **and **exists **to quantify over finite sets and sequences lets us express many sorts of specification, but isn’t always enough. Very occasionally, we need to use **forall **or **exists **to quantify over a potentially infinite type, which ArC also allows. For example, the following expression yields **true **if *p(x)* is true for all values of *x* belonging to type *T*:

forallT x :- p(x)

However, there are many cases in which a specification still cannot be expressed. For example, consider the following function for averaging a set of readings stored in an array:

int16_t average(constint16_t *arrayreadings, size_t numReadings)pre(readings.lwb == 0; readings.lim == numReadings)pre(numReadings != 0)post(result== ?/* sum of elements of readings */`/numReadings`

`) {`

intsum = 0; size_t i;for(i = 0; i < numReadings; ++i)keep(i <= numReadings)keep(sum == ?/* sum of first i elements of readings */)decrease(numReadings - i) { sum += readings[i]; }return(int16_t)(sum/numReadings); }

I’ve included some ArC annotations (highlighted in green text) in this example, to specify that the valid indices of *readings *are* 0..(numReadings – 1)*, and that *numReadings *isn’t zero so that the final division operation will be valid. I’ve also provided a loop invariant and loop variant (see my earlier posts on verifying loops if you aren’t familiar with these). However, in the postcondition I’ve put a question mark where I need to express “sum of all elements of *readings*“, and in the loop invariant I’ve put another question-mark where I want “sum of the first *i* elements of *readings*“. How can we express these quantities?

In this case, there is an easy way, and another way that is less easy but more general. Let’s start with the easy way. ArC is derived from Perfect Developer, and when writing ArC specifications you can use most of the library types and expression types provided by PD. In particular, type *seq of T* from PD is treated as equivalent to *T[]* in C. So we can use *Perfect *sequence operations on C arrays. A list of member functions of *seq of T* can be found in the Perfect Developer Library Reference. In PD, most of these functions are available for use anywhere, since code can be generated for them; but when used in ArC, they are of course all “ghost” functions – that is, functions that can be used in specifications only.

The particular *Perfect *expression type we need here is the **over **expression, which expresses repeated application of an operator over the elements of a collection. Those with a background in functional programming may recognise it as *left-fold*. Our postcondition can be written:

**post**(**result **== (+ **over **readings.all)/numReadings)

We’ve had to use *readings.all* rather than just *readings *as the operand of +**over**, because *readings *is an array pointer, and we need to provide a genuine array to +**over**. Hence ArC provides the array pointer type with ghost member *all*. You can think of* readings.all* as yielding the sequence *readings[readings.lwb]*, *readings[readings.lwb + 1]* and so on up to *readings[readings.upb]*. We’d have liked to use **readings *to mean the array that *readings *points to, but of course in C that just yields the value of the first element.

For the loop invariant expression, we can use +**over **again, but we need to apply it to the first *i* elements of *readings *rather than all elements. The *seq of T* class provides member *take *for this purpose, allowing us to use:

**keep**(sum == + **over **readings.all.take(i))

That’s enough to verify our function, apart from dealing with a potential integer overflow when summing the elements, which I’ll return to in a later post. Next time I’ll demonstrate how we can use a *ghost function* to define the notion of summation without recourse to **over**-expressions.

Hi David,

I have a question on “over”. You do not specify an initial value for the fold operation, does that mean you always use a default zero value for operations on integers? So that (* over arr) always yields value zero?

I notice with great interest how you use ideas from Ada to verify C, and I think you could be interested by the project Hi-Lite we have just launched, where we aim at verification of mixed C and Ada code.

Minor remark: you seem to use “ghost” in an unusual way to designate logical expressions. I’ve seen it rather used to designate parts of the code that are not executed but follow the same syntax as C code, like “ghost” assignments to “ghost” variables.

Hi Yannick,

We imported ‘over’ and its semantics directly from Perfect Developer, which defines the initial value of the fold operation as the left-identity value in the declaration of the operator that is being used with ‘over’. If no left identity value was declared, then the ‘over’ expression has a precondition that the operand collection is not empty. The built-in operators all have appropriate left-identity declarations, so (* over arr) yields the expected value, for example 1 if arr is empty.

I’m not sure who first used the term “ghost” in the context of specifications – it may have been us, since we’ve been using the term “ghost function” in our documentation to mean a function used for specification purposes only since 1999.

Right, the use of “ghost” I mention stems mostly from JML, and they probably introduced it after you did. Yu should have trademarked it. 🙂

And thanks for the details over “over”. Over.