Aliasing and how to control it

June 22, 2010

Today I’ll start by writing a simple function that determines the maximum and minimum of two integers. We want to return two values, and C doesn’t make that easy unless we declare a struct to hold them. So I’ll pass two pointers to where I want the results stored instead. Here goes:

#include "arc.h"
 void minMax(int a, int b, out int *min, out int *max)
post(*min <= a; *min <= b; *min == a || *min == b) post(*max >= a; *max >= b; *max == a || *max == b)
{ *min = a < b ? a : b;
  *max = a > b ? a : b;
}

I’ve highlighted the ArC annotations in green. Notice how, in the postcondition, I’ve defined the minimum as a value that is less than or equal to both inputs and equal to one of them. Similarly for the maximum. Also notice that the two parameters used to pass the results back are flagged with the out keyword. This tells ArC (and users of this function) that the initial values of *min and *max are not used by the function, and the obligation to initialize them rests with the function, instead of with the caller of the function.

Does this function work? Always? Let’s see what ArC makes of it:

c:\escher\arctest\arctest20.c (9,25): Warning! Unable to prove: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (5,27)) (see C:\Escher\ArcTest\arctest20_unproven.htm#2), did not prove: *min <= b.
c:\escher\arctest\arctest20.c (9,25): Warning! Unable to prove: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (5,16)) (see C:\Escher\ArcTest\arctest20_unproven.htm#1), did not prove: *min <= a.

c:\escher\arctest\arctest20.c (9,25): Information! Confirmed: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (5,43)) (see C:\Escher\ArcTest\arctest20_proof.htm#3), proved: (a == *min) || (b == *min).
c:\escher\arctest\arctest20.c (9,25): Information! Confirmed: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (6,16)) (see C:\Escher\ArcTest\arctest20_proof.htm#4), proved: a <= *max.
c:\escher\arctest\arctest20.c (9,25): Information! Confirmed: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (6,27)) (see C:\Escher\ArcTest\arctest20_proof.htm#5), proved: b <= *max.
c:\escher\arctest\arctest20.c (9,25): Information! Confirmed: Postcondition satisfied when function returns (defined at c:\escher\arctest\arctest20.c (6,43)) (see C:\Escher\ArcTest\arctest20_proof.htm#6), proved: (a == *max) || (b == *max).

So ArC proves that the maximum is returned correctly, but not the minimum. What’s going on here? If we follow the link to the “unproven” file, we find a clue:

Could not prove: !(min == max)

Sure enough, it’s clear that if we make the following call:

int temp;
minMax(42, 24, &temp, &temp);

then minMax fails to meet its specification, because after storing the minimum in temp, it overwrites that value with the maximum.

The fix is to add a precondition to make that sort of call illegal:

void minMax(int a, int b, out int *min, out int *max)
pre(min != max) post(*min <= a; *min <= b; *min == a || *min == b) post(*max >= a; *max >= b; *max == a || *max == b)
{ *min = a < b ? a : b;
  *max = a > b ? a : b;
}

ArC is then able to verify the function.

This is a simple example of classic pointer aliasing – the situation in which two pointers refer to the same memory location. Any assignments we make through one of the pointers affects the value we read through the other. Another form of aliasing is pointer-variable aliasing. In this variant, a function is manipulating the value referred to by a pointer, and also a variable – typically a static variable declared outside the function. If the pointer points to the variable, then assignments through the pointer affect the value read from the variable, and vice versa.

ArC adopts a strict aliasing model that disallows casts between different pointer types. This means that in an ArC-compatible C program, pointer-pointer aliasing can only occur if the pointers have the same type, or if one of the types pointed to contains a field or element of the other. Similarly, aliasing between a pointer and a variable can only happen if the type pointed to is either the same as the type of the variable, or the same as the type a field or element contained in the variable. This serves to reduce the potential for aliasing.

Nevertheless, there are still many situations in which aliasing is possible and will break the program if it occurs. For example, consider the following function for copying elements from one array to another:

#include "arc.h" void arrayCopy(const int* array src, int* array dst, size_t num)
pre(src.lim >= num; dst.lim >= num) post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ size_t j;
  int i;
  for (j = 0; j < num; ++j)
  keep(j <= num) keep(forall i in 0..(j - 1) :- dst[i] == src[i]) decrease(size - i)
  { dst[j] = src[j];
  }
}

The precondition ensure that we don’t get out-of-bounds array accesses, and the postcondition describes what we want to achieve. For an explanation of the loop invariants (keep clauses) and loop variants (decrease clauses), see Verifying loops in C and C++, Verifying loops – part 2 and Verifying loops: proving termination.

Does this function meet its specification? Not if the elements of dst we are writing overlap with the elements of src that we are reading! So we need an anti-aliasing precondition again. Here’s one possibility:

void arrayCopy(const int* array src, int* array dst, size_t num)
pre(src.lim >= num; dst.lim >= num) pre(forall i in src.indices; j in dst.indices; &src[i] != &dst[j]) post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ ...
}

Stating that arrays don’t overlap in this way is rather cumbersome. But it can get a lot worse! Suppose we have, for example, an int* and a pointer to an array of structs, where each struct has several int fields, some int[] fields, and some struct fields that may themselves contains ints. If we want to write a precondition stating that the int* doesn’t alias any of the ints contained in the array of structs, we would need one precondition term for each possible way in which they could be aliased. If we add another int* parameter that is also forbidden to alias the other pointers, we would need to do much the same all over again.

To make it easier to express absence of aliasing, ArC provides the disjoint-expression. Here’s our example modified to use it:

void arrayCopy(const int* array src, int* array dst, size_t num)
pre(src.lim >= num; dst.lim >= num) pre(disjoint(src.all, dst.all)) post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ ...
}

The disjoint-expression takes two or more operands and yields true if and only if no pair of operands refer to overlapping storage. Internally, ArC determines all the ways in which pairs of operands could be aliased, and expands the expression into a conjunction of equivalent predicates for the prover to use.

Although ArC can reason about C programs even in the presence of aliasing, it’s best to write programs that avoid the possibility of aliasing as far as possible. To help you, ArC extends the strict aliasing model when you declare a new type using a constrained typedef. ArC doesn’t allow you to cast a pointer to a constrained type to or from a pointer to any other type. For example, suppose I declare the following:

typedef int invariant(value >= 0) count_t;
typedef int invariant(value >= -273) temperature_t;

void foo(count_t * a, temperature_t *b) { ... }

Within foo, ArC knows that *a and *b cannot be aliases for the same variable, because the pointer types are not compatible. If you should want to declare a new sort of integral type without actually constraining it, you can use invariant(true).

  1. Jimmy
    October 30, 2010 at 08:12

    Great post!

    I’m just wondering if ArC can infer automatically separation information. I see it allows users to use “disjoint” but can ArC add “disjoint’s” by itself?

    Of course, an easy case is where two arrays are completely separated from each other and can be inferred by an alias analysis. Harder cases are when a loop touches one part of the array and then another part which turn out that they are separated. Here, we would need more sophisticated methods …

    What does ArC do?
    Moreover, what do you think it’s reasonable to infer automatically and what not?

    Thanks!

    • October 30, 2010 at 15:35

      Hi Jimmy, ArC (now eCv) allows for all possible aliasing that may be present, subject to a couple of assumptions. So the case you mention is handled automatically by eCv. If a loop touches a[i] and a[j], then the prover will assume that i might equal j until it proves otherwise. However, eCv verifies each function individually, so it makes no assumptions about the environment in which a function is called other than the precondition. That is why you need to use disjoint-expressions in preconditions. You can use them in other specification contexts like loop invariants and assertions if you want, but you shouldn’t often need to.

  2. Artyom Shalkhakov
    March 1, 2011 at 04:54

    Hello,

    An interesting post, thanks.

    I wonder if eCv can handle some intricate cases of pointer aliasing.

    For example, suppose I have the following (an implementation of the half-edge data structure):

    typedef struct hedge_s hedge_t;
    struct hedge_s {
    hedge_t *next;
    hedge_t *opposite;
    int vertex;
    int left_face;
    };
    typedef struct mesh_s {
    int num_hedges;
    hedge_t *hedges;
    } mesh_t;

    and the following invariants should hold for a “good” mesh:

    for any half-edge he, he->opposite != he
    for any half-edge he, he->opposite->opposite == he
    for any half-edge he, he->next != he
    for any half-edge he, if he->left_face != 0 then he->next->next->next == he

    (that is, “next” pointers form loops, and each half-edge is paired to some other half-edge that is not itself)

    For starters, is it possible to describe the preceding constraints with eCv? (Sorry for a complicated example, it comes from polyhedra representations used in computer graphics; if you need further clarification, feel free to ask)

    • March 1, 2011 at 15:43

      Hi, and thanks for your comment. One of the problems with invariants in C is knowing when they are supposed to hold, because they generally get broken while structures are being updated. Where an invariant involves 2 or more structs, what we generally do in eCv is declare a “valid” function instead of an invariant. For your example we would do something like the following:

      ghost(bool valid2(hedge_t *he)
          returns(he->opposite != he);)
      ghost(bool valid2(hedge_t *he1, hedge_t *he2)
          returns((he1->opposite == he2) =>
                             (he2->opposite == he1));)
      

      and so on. Next we would declare a ghost function that returns true if all elements and pairs of elements in a collection of half edges obey these validity constraints. Finally we would refer to that ghost function in preconditions and postconditions of functions that manipulate half edges belonging to that collection.

      There are other approaches, for example Microsoft Research’s Vcc tool requires you to wrap and unwrap objects explicitly to indicate when their invariants are expected to hold. We found that the overhead of this extra annotation wasn’t necessary or justified when verifying typical safety-critical software (which typically doesn’t use complex data structures); but it may be more suited to your example.

  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: