Expressing the Inexpressible

May 24, 2010

When writing preconditions, postconditions and other specifications for C programs, sometimes we need to write expressions that can’t be expressed in plain C. That’s why formal verification systems based on annotated programming languages almost always augment the expression sublanguage with forall and exists expressions. In previous posts, I’ve introduced ArC’s implementations of these. For example, the following expression yields true if all elements of the array arr are between 0 and 100 inclusive:

forall ind in arr.indices :- arr[ind] >= 0 && arr[ind] <= 100

Here, ind is declared as a bound variable that ranges over the values in the expression that follows the keyword in, which in this case is all the indices into arr.

Similarly, this expression:

exists ind in 0..9 :- arr[ind] >= 0 && arr[ind] <= 100

yields true if at least one of the first ten elements is in the range 0 to 100. I’ve used one of ArC’s special operators here: the range operator “..”, which yields a sequence of values from its first operand up to its second operand. In fact, in the first example, arr.indices is defined as arr.lwb .. arr.upb, so I was implicitly using the range operator there too.

Providing forall and exists to quantify over finite sets and sequences lets us express many sorts of specification, but isn’t always enough. Very occasionally, we need to use forall or exists to quantify over a potentially infinite type, which ArC also allows. For example, the following expression yields true if p(x) is true for all values of x belonging to type T:

forall T x :- p(x)

However, there are many cases in which a specification still cannot be expressed. For example, consider the following function for averaging a set of readings stored in an array:

int16_t average(const int16_t * array readings, size_t numReadings)
pre(readings.lwb == 0; readings.lim == numReadings)
pre(numReadings != 0)
post(result == ? /* sum of elements of readings */ /numReadings) 
int sum = 0;
size_t i;
for (i = 0; i < numReadings; ++i)
keep(i <= numReadings)
keep(sum == ? /* sum of first i elements of readings */ )
decrease(numReadings - i)
sum += readings[i];
return (int16_t)(sum/numReadings);

I’ve included some ArC annotations (highlighted in green text) in this example, to specify that the valid indices of readings are 0..(numReadings – 1), and that numReadings isn’t zero so that the final division operation will be valid. I’ve also provided a loop invariant and loop variant (see my earlier posts on verifying loops if you aren’t familiar with these). However, in the postcondition I’ve put a question mark where I need to express “sum of all elements of readings“, and in the loop invariant I’ve put another question-mark where I want “sum of the first i elements of readings“. How can we express these quantities?

In this case, there is an easy way, and another way that is less easy but more general. Let’s start with the easy way. ArC is derived from Perfect Developer, and when writing ArC specifications you can use most of the library types and expression types provided by PD. In particular, type seq of T from PD is treated as equivalent to T[] in C. So we can use Perfect sequence operations on C arrays. A list of member functions of seq of T can be found in the Perfect Developer Library Reference. In PD, most of these functions are available for use anywhere, since code can be generated for them; but when used in ArC, they are of course all “ghost” functions – that is, functions that can be used in specifications only.

The particular Perfect expression type we need here is the over expression, which expresses repeated application of an operator over the elements of a collection. Those with a background in functional programming may recognise it as left-fold. Our postcondition can be written:

post(result == (+ over readings.all)/numReadings)

We’ve had to use readings.all rather than just readings as the operand of +over, because readings is an array pointer, and we need to provide a genuine array to +over. Hence ArC provides the array pointer type with ghost member all. You can think of readings.all as yielding the sequence readings[readings.lwb], readings[readings.lwb + 1] and so on up to readings[readings.upb]. We’d have liked to use *readings to mean the array that readings points to, but of course in C that just yields the value of the first element.

For the loop invariant expression, we can use +over again, but we need to apply it to the first i elements of readings rather than all elements. The seq of T class provides member take for this purpose, allowing us to use:

keep(sum == + over readings.all.take(i))

That’s enough to verify our function, apart from dealing with a potential integer overflow when summing the elements, which I’ll return to in a later post. Next time I’ll demonstrate how we can use a ghost function to define the notion of summation without recourse to  over-expressions.

  1. Yannick Moy
    May 29, 2010 at 09:28

    Hi David,

    I have a question on “over”. You do not specify an initial value for the fold operation, does that mean you always use a default zero value for operations on integers? So that (* over arr) always yields value zero?

    I notice with great interest how you use ideas from Ada to verify C, and I think you could be interested by the project Hi-Lite we have just launched, where we aim at verification of mixed C and Ada code.

    Minor remark: you seem to use “ghost” in an unusual way to designate logical expressions. I’ve seen it rather used to designate parts of the code that are not executed but follow the same syntax as C code, like “ghost” assignments to “ghost” variables.

    • May 29, 2010 at 16:44

      Hi Yannick,

      We imported ‘over’ and its semantics directly from Perfect Developer, which defines the initial value of the fold operation as the left-identity value in the declaration of the operator that is being used with ‘over’. If no left identity value was declared, then the ‘over’ expression has a precondition that the operand collection is not empty. The built-in operators all have appropriate left-identity declarations, so (* over arr) yields the expected value, for example 1 if arr is empty.

      I’m not sure who first used the term “ghost” in the context of specifications – it may have been us, since we’ve been using the term “ghost function” in our documentation to mean a function used for specification purposes only since 1999.

  2. Yannick Moy
    May 31, 2010 at 05:05

    Right, the use of “ghost” I mention stems mostly from JML, and they probably introduced it after you did. Yu should have trademarked it. 🙂

    And thanks for the details over “over”. Over.

  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: