## Verifying a binary search

In the last post, I covered some different levels of formal verification that you may be interested in, and showed how to add minimum annotation to the linearization example to allow ArC to prove predictable execution. The example provided a prototype for the binary search function it called, to which we added a minimal postcondition, so that it looked like this:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key)post(result<= nElems);

Let’s develop a verified implementation of this binary search function. We could write the code and then try to get ArC to verify it – either for predictable execution only, or for full functional correctness. However, it’s quite hard to get a binary search function right. So for this example, it’s more productive to use formal specification to develop an implementation that is correct by construction.

First, we need to specify exactly what *bSearch *is supposed to return, and what its preconditions are. Let’s start with the preconditions. We’re passing the number of elements of *table *in *nElems*, so we need to specify this:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key)pre(table.lwb == 0; table.lim == nElems)post(result<= nElems);

The precondition says that the lower bound (the lowest legal index) into *table *is zero – in other words, *table *is a pointer to the first element of an array – and that the limit of *table *(one plus the highest legal index) is *nElems*. However, this isn’t enough: the raw values need to be in increasing order too. A standard way of specifying “in increasing order” is to say that for any two indices *a* and *b*, if *b* > *a* then the element at *b* is greater than the element at *a*. Here’s that expressed as a precondition:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key)pre(table.lwb == 0; table.lim == nElems)pre(foralla in table.indices; b in table.indices :- b > a => table[b].raw > table[a].raw)post(result <= nElems);

The ghost member *indices *of an array pointer is defined by ArC as the sequence of all valid indices into the array. So* table.indices* means the same as* table.lwb .. table.upb*. The => symbol means “implies”, so* b > a => table[b].raw > table[a].raw* means the same as* !(b > a) || table[b].raw > table[a].raw* but is clearer.

Now we need to specify what the function returns. In developing *linearize *in the previous post, I said I was assuming a return value of *0* means the raw value is off the bottom of the table, *nElems *means it is off the top, otherwise the table entry indexed by the return value and the previous table entry bracket the raw value. Actually, we can simplify this. If we say that any elements with indices below the returned value have raw values less than or equal to the key, and any elements with indices at or above the returned value have raw values greater than or equal to the key, then this covers the cases of returning *0* or *nElems *as well. So let’s express this single constraint in a postcondition:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key)pre(table.lwb == 0; table.lim == nElems)pre(forallaintable.indices; bintable.indices :- b > a => table[b].raw > table[a].raw)post(result<= nElems)post(foralliin0..(result- 1) :- key >= table[i])post(foralliin result..(nElems - 1) :- key <= table[i]);

This specification isn’t precise about the value we return when there is a table entry whose raw value exactly matches the key – it allows us to return the index of either that entry or the next entry. My original informal specification wasn’t precise either – it said that the two entries “bracket” the key. Given that our precondition forbids duplicate entries, we could be more precise if we want, e.g. by changing <= in the final postcondition to <.

Now we can work on the implementation. Note that when you have a function prototype declaration separate from the implementation (as here), ArC expects you to put the function specification in the prototype only, so we don’t need to repeat the preconditions and postconditions in the implementation. Here’s a rough sketch of what we want to do:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key) { size_t low = 0, high = nElems;while(high != low) {/* increase low or decrease high, such that high remains >= low, all elements below low have raw values <= the key, and all elements at or above high have raw values >= the key */}returnlow; }

Let’s express the text in the comment as a loop variant and a loop invariant (see my earlier post on verifying loops if you’re not familiar with these):

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key) { size_t low = 0, high = nElems;while(high != low)keep(high >= low)keep(foralliin0..(low - 1) :- table[i].rawValue <= key)keep(foralliinhigh..(nElems - 1) :- table[i].rawValue >= key)decrease(high - low) {/* do something */}returnlow; }

Before we go any further, let’s check that this makes sense. ArC will need to prove that the loop invariants (the **keep **clauses) are true at the start of the loop. That’s easy – the initialization of low and high ensures that *high >= low* (which satisfies the first invariant), and that there are no table indices below *low *and no table indices at or above *high* (which satisfies the second and third invariants, because **forall **over an empty range is always true). Also, because we are returning *low *when the loop terminates, ArC will need to prove that the value of *low *at the end of the loop meets the postcondition on the return value. We know that when the loop terminates, the loop invariants will be true and the while-clause will be false. Therefore, we can substitute *result *for *low *(because we are returning *low*) and *result *for *high *(because *high *== *low *is the inverse of the while-clause) in the keep-clauses, to find out what is known about *result*. When we do this, the last two **keep **clauses magically turn into the last two postconditions – which is exactly what we want. However, this doesn’t help with the first postcondition, which requires *result <= nElems*. Adding an extra loop invariant* low <= nElems* or *high <= nElems* will do the trick, since substituting *result *for both *low *and *high *will then give us the required term. I’ll choose *high <= nElems*, because it is stronger and I don’t intend that *high *should ever exceed *nElems*.

We can use ArC to check that the design is OK so far, even before we code the loop body. As well as adding the new loop invariant, we’ll need to tell ArC that we intend to assign to* low *and *high **low *within the loop body, which we can do like this:

size_t bSearch(constLinEntry*arraytable, size_t nElems, uint16_t key) { size_t low = 0, high = nElems;while(high != low)keep(high >= low)keep(high <= nElems)keep(foralliin0..(low - 1) :- table[i].rawValue <= key)keep(foralliinhigh..(nElems - 1) :- table[i].rawValue >= key)decrease(high - low) { low = ?; high = ?; }returnlow; }

ArC allows you to use ? in place of an expression to mean you haven’t decided what goes there yet. Naturally, ArC won’t be able to prove that the loop preserves its invariants or decreases its variant. Sure enough, if we run ArC on this example, we get 6 unproven verification conditions that refer to the loop body: one for each of the loop invariants and two for the variant. However, ArC reports that everything else is OK, including that the loop invariants are satisfied by the initialization and that the return value meets all three postconditions.

So all we need to do now is to write loop body code that assigns *low *and/or *high *such that the four loop invariants are preserved and the loop variant *high – low* decreases. But this post is already quite long, so I’ll do that next time.