What are you trying to prove?

Home > C and C++ in critical systems, Formal verification of C programs > What are you trying to prove?

What are you trying to prove?

April 27, 2010 davidcrocker

If you’re thinking of using formal verification to increase the quality and reliability of your software, one of the first decisions you need to make is what you want to prove. Roughly speaking, you have three levels to choose from:

1. Predictable execution – that is, freedom from undefined and implementation-defined behaviour. The proofs cover such things as: all array indices are in bounds, variables are initialized before use, conversions of values to narrower types do not result in overflow, arithmetic does not overflow or underflow, null pointers are not dereferenced, and division by zero does not occur. It includes all the usual causes of what the MISRA standards call “run-time errors”, so it covers MISRA C rule 21.1.

2. Partial correctness – all the above, and the program meets its specification, if it terminates. The specification you provide may be a full functional specification, or a partial specification covering the most important parts, such as safety properties.

3. Full correctness – all the above, and the program terminates.

Most formal program verification systems supports one or more of these levels. For example, SPARK Ada supports levels 1 and 2. In SPARK-speak, level 1 is called “exception freedom”, because Ada programs normally perform run-time checking, so run-time errors manifest themselves as exceptions (which are predictable). On the other hand, Perfect Developer is focussed on full correctness, because it is a top-down tool that starts from specifications – leaving you no choice but to specify what the program is intended to achieve. ArC gives you the choice of all three levels.

You don’t have to choose a single verification level for the entire system. So you might choose full correctness for the most critical parts of the system, but be content with predictable execution elsewhere.

What difference does the choice of verification level make to you as software developer? The main effect is that if you wish to prove partial or full correctness rather than just predictable execution, you will need to write function postconditions that state what the functions are intended to achieve. If you choose only to prove predictable execution, you will sometimes still need to write function postconditions, but they can be more vague – stating something about what the function achieves rather than everything it is supposed to achieve.

Here’s an example. Suppose we have read a value from a sensor that has a non-linear response. We wish to linearize the value. To this end, we have a constant table of pairs of (raw value, linearized value), ordered such that both values are monotonically increasing. We will use a binary search to find the two adjacent entries whose raw values bracket the value read from the sensor, then interpolate between the corresponding linearized values. So the function looks something like this:

#include "arc.h"
#include "stddef.h"
 typedef unsigned short uint16_t; typedef struct { uint16_t raw; uint16_t corrected; } LinEntry;

size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key);

uint16_t linearize(const LinEntry* array table, size_t nElems, uint16_t rawValue) { size_t index = bSearch(table, nElems, rawValue);  if (index == 0) return table[0].corrected;  else if (index == nElems) return table[nElems - 1].corrected;  else { LinEntry low = table[index - 1], high = table[index];  return (((high.corrected - low.corrected)/(high.raw - low.raw)) * (rawValue - low.raw)) + low.corrected; } } static const LinEntry linTable[] = { {0, 0}, {10, 15}, {20, 29}, ... }; ... linearize(linTable, sizeof(linTable)/sizeof(linTable[0]), someData) ...

The bSearch function takes a pointer to an array of LinEntry records (see my earlier post on taming arrays if you haven’t seen the array keyword before), the number of elements in the array, and the raw data value. I’ve assumed it returns a value in the range 0..nElems, such that 0 means the raw value is off the bottom of the table, nElems means it is off the top, otherwise the table entry it indexes and the previous table entry bracket the raw value. If we want to prove partial or full correctness, we need to specify all of this. But we can be less precise if we’re just proving predictable execution. Let’s see what ArC makes of it with no annotation other than the array keywords:

c:\arc\ex1.c (11,33): Warning! Unable to prove: Precondition of ‘[]’ satisfied
(see C:\Arc\ex1_unproven.htm#9), cannot prove: 0 < table.lim.
c:\arc\ex1.c (12,43): Warning! Exceeded boredom threshold proving: Precondition of ‘[]’ satisfied
(see C:\Arc\ex1_unproven.htm#7), cannot prove: nElems < (1 + table.lim).
c:\arc\ex1.c (14,29): Warning! Exceeded boredom threshold proving: Precondition of ‘[]’ satisfied
(see C:\Arc\ex1_unproven.htm#2), cannot prove: index < (1 + table.lim).
c:\arc\ex1.c (14,54): Warning! Exceeded boredom threshold proving: Precondition of ‘[]’ satisfied
(see C:\Arc\ex1_unproven.htm#4), cannot prove: index < table.lim.
c:\arc\ex1.c (15,17): Warning! Unable to prove: Precondition of ‘/’ satisfied
(see C:\Arc\ex1_unproven.htm#5), cannot prove: low.raw < high.raw.

There are a few problems here, all in function linearize:

The first warning says that table[0] might be out-of-bounds, because table might be an empty array.
The second says that table[nElems – 1] might be out-of-bounds, because we haven’t told ArC that nElems is the number of elements in table.
The third and fourth say that table[index] and table[index – 1] might be out-of-bounds, because ArC doesn’t know that bSearch returns a value between 0 and nElems, and that table points to the start of an array with nElems elements.
The final warning says that in the interpolation operation, the divisor (table[index].raw – table[index – 1].raw) might not be positive. Division by zero would typically cause an exception; while division by a negative number yields an implementation-defined result in C’90 and is therefore not allowed by ArC. The division will be safe if the raw values in table are monotonically increasing.

So let’s give ArC the information it needs, by adding some partial specifications:

size_t bSearch(const LinEntry* array table, size_t nElems, uint16_t key)
post(result <= nElems);

uint16_t linearize(const LinEntry* array table, size_t nElems, uint16_t rawValue) pre(table.lwb == 0; table.lim == nElems) pre(nElems >= 1) pre(forall i in 1..(nElems - 1) :- table[i].raw > table[i - 1].raw) { size_t index = bSearch(table, nElems, rawValue);  if (index == 0) return table[0].corrected;  else if (index == nElems) return table[nElems - 1].corrected;  else { LinEntry low = table[index - 1], high = table[index];  return (((high.corrected - low.corrected)/(high.raw - low.raw)) * (rawValue - low.raw)) + low.corrected; } }

In the postcondition of bSearch, we’ve specified that the return value is no greater than nElems (since it is of type size_t, it can’t be less than zero anyway). In the precondition of linearize we’ve said that table is a regular array with lower bound zero and limit nElems. These are enough to ensure that the array accesses table[index – i] and table[index] in the else-part of the if-statement are in bounds, because the previous parts of the if-statement handle the cases index == 0 and index == nElems. We’ve also specified that nElems is at least one to take care of the access to table[0].

The final precondition we’ve added to linearize says the raw values in table are monotonically increasing, by stating that for all values i from 1 up to nElems – 1 (i.e. the highest index of table), table[i].raw is greater than table[i – 1].raw.

These partial specifications are sufficient for ArC to prove that function linearize executes predictably, assuming that it is called in a state satisfying its preconditions, and that bSearch executes predictably, has no side-effects, and returns a value satisfying its postcondition. The preconditions of linearize will be verified at every place where it is called. We’ll look at verifying bSearch next time.

Comments (9)

Yannick Moy

May 7, 2010 at 22:45

Hi David,

It seems to me that the 3 levels of formal verification that you describe are those that have been described in theoretical computer science, but that in practice many other properties are the target of formal verification: absence of dead code, bound on the stack size, WCET, etc.

Not to say that these properties should be proved using the same deductive verification techniques as the 3 levels you describe, but they are certainly as important.
- davidcrocker
  
  May 8, 2010 at 08:21
  
  Hi Yannick,
  
  Yes, there are many other properties that it is desirable to establish. We’ll probably add proving absence of dead code to ArC at some time. Determining an upper bound on the stack size or deducing WCET require detailed knowledge of compiler and processor behavour, so I think those are best done by the compiler suppliers.
Yannick Moy

May 8, 2010 at 10:31

Hi David,

You might have seen the new Dead Path Analyzer that ships with SPARK Pro 9.0 (codename Zombiescope :)). It outputs for each path whether the prover could prove it dead. It is then up to the user to decide whether these dead paths correspond to the desired behavior or not.

Not really the same as proving the absence of dead code, which I think is best left to dynamic coverage analysis. At least currently, but maybe you have good ideas in this area?
- davidcrocker
  
  May 8, 2010 at 11:31
  
  ArC is derived from Perfect Developer, which often (but not always) flags dead code. This happens when the theorem prover notices that it proved a verification condition by refuting the givens without reference to the goal. The situation is complicated in PD because it supports inheritance and dynamic binding, and code in a common abstract base class may be live in one descendent class but dead in another. Even in C code, it isn’t always clear whether code it dead or just inactive. For example, I might use a library sorting function that does an insertion sort if there are fewer than ten elements to be sorted, and a heap sort otherwise. Suppose I build a large, complex system that only ever calls it with fewer than ten elements. Is the heap sort branch dead code or not? At the system level it is; but if the function is taken from a well-used, tested and trusted library, should I really be required to modify it by removing the heap sort branch?
Yannick Moy

May 8, 2010 at 13:29

The problem of libraries is a bit different. Ideally, you should be able to add annotations to “accept” (in SPARK parlance) that some code is dead. Of course if you cannot modify the library code, this is yet more cumbersome.

But say you only care about dead code in your core application, you could still make a distinction between “live” code because it can be exercised in a unit test, and “live” code because it is exercised when running the full application. I don’t know if some (safety or security) standards make such a difference?

No matter how you define dead code, proving the absence of dead code amounts to showing a witness execution. This is not something easy to do statically, while it is trivial dynamically. But there are still benefits for generating statically candidate inputs for a witness, like they do in concolic testing: run the program on input A, collect constraints on the execution, flip one constraint to force the execution on another path, feed a prover with the modified constraints, get back a “model” for the set of constraints, extract from this model an input B, run the program on input B to check if it follows the expected path.
- davidcrocker
  
  May 8, 2010 at 14:15
  
  Although finding a witness execution dynamically may often be trivial, it isn’t always so. If normal testing fails to cover some path, it may require some serious investigation to find inputs that lead to such a path. However, if the program isn’t too large, then I would have thought that whole program analysis with the aid of a constraint solver might be a good approach.
Yannick Moy

May 8, 2010 at 14:45

I don’t know if this kind of approach has been used to prove coverage properties. I would be interested in knowing it if this is the case.
Dave Banham

May 21, 2010 at 20:33

Why assume all dead-code is bad? Dead-code can exist for both bad and good reasons. Some bad reasons are gold-plated designs as well as defective designs. Some good reasons are exception handlers/traps for situations that cannot be reached through normal software execution. Reuse of software components can also lead to dead code. Interestingly, the use of templates can often eliminate the introduction of dead code because the compiler can deploy the template for each specific case.
- davidcrocker
  
  May 24, 2010 at 07:19
  
  Many standards for critical software require that dead code be eliminated. The problem is that it is difficult to test that the dead code is “correct”, whatever that means. If the code cannot be reached through normal software execution for any inputs, how are you going to test it? If you are doing formal analysis, what assumptions is the verifier supposed to make when the dead code is entered? Your point that the use of templates can avoid the introduction of dead code is a good one, and another argument for using a small subset of C++ in preference to C.

No trackbacks yet.

Comments are closed.

Verifying a binary search Danger – unsigned types used here!

David Crocker's Verification Blog

What are you trying to prove?

Email Subscription

Latest posts

Categories

Pages

Archives

David Crocker's Verification Blog

What are you trying to prove?

Share this:

Related

Email Subscription

Latest posts

Categories

Pages

Archives