David Crocker's Verification Blog

Taming Pointers in C/C++

February 17, 2010 davidcrocker Comments off

When doing verification or deep static analysis of C and C++, pointers are troublesome in several ways:

Zero (i.e. NULL) is an allowed value of every pointer type in the language. Occasionally we want to allow null pointers, for example in the link field of the last element of a linked list. More usually, we don’t want to allow null. Verification requires that anywhere we use the * or [ ] operator on a pointer, we can be sure that it is not null.
C and C++ do not distinguish between pointers to single variables and pointers to arrays. So, where we have a parameter or variable of type T*, we can’t tell whether it is supposed to point to a variable or an array. If it points to a single variable, then we mustn’t do pointer arithmetic or indexing on it. The verifier must be able to check this.
Array parameters in C/C++ are passed as pointers. Aside from the problem that we can’t distinguish array pointers from pointers to single variables, we also have the problem that there is no size information contained in an array pointer.
Anywhere we use pointers to mutable data, there is the possibility of aliasing. In other words, there may be more than one pointer to the same data. The verifier needs to take account of the fact that changes to data made through one pointer may affect the values subsequently read through another pointer.

Although pointers are less troublesome in Ada, the aliasing problem still exists. The SPARK verifiable subset of Ada handles this by banning pointers altogether. Unfortunately, this isn’t an option in a C/C++ subset for critical systems, because pointers are the only mechanism for passing parameters by reference.

I’ll deal with the issue of unwanted nullability first. One solution is to add invariants stating that particular variables of pointer type cannot be NULL. Similarly, where a function takes parameters of pointer type, we can write preconditions that these parameters cannot be NULL. Here are a couple of examples:

struct Status1 { const char* message; ... invariant(message != NULL) }

void sum(int *arr, int size) pre(arr != NULL) { ... }

The problem with this approach is that you need a lot of these invariants and preconditions because, more often than not, it makes no sense to allow NULL. So in ArC we take the opposite approach. We assume that pointers are not allowed to be NULL except where you say otherwise. In the above examples, you can leave out the precondition and invariant if you don’t want to allow either pointer to be NULL.

To tell ArC that a pointer is allowed to be NULL, you flag it with the null attribute, like this:

struct Status2 { const char* null message; ... }

void setMessage(const char * null msg) { ... }

This greatly reduces the amount of annotation needed, because the null annotation is more concise than a precondition or invariant, and it is needed less often. As you might expect, null is another macro defined in arc.h that expands to emptiness when you compile the code. Syntactically, it behaves like const or volatile.

That’s all for today – I’ll discuss how we handle the other problems with pointers later.

Categories: C and C++ in critical systems Tags: formal verification, pointers, strong typing

More reasons why C++ can be safer than C

February 12, 2010 davidcrocker Comments off

In previous postings I’ve explained how you can use C along with a few selected C++ features to write better and less error-prone code than you can in plain C. However, even if you don’t use any C++ constructs at all, you can still get a few benefits by compiling your C code using a C++ compiler. Here’s why:

Type of string literals: in C, a string literal has type char[]. There is nothing to prevent you from passing a string literal as a parameter to a function that writes to it. The C standard specifies that the effect of writing to a string literal is undefined, so you had better avoid it. Things are better in C++, where the type of a string literal is const char[]. If you want to use a C++ string literal in some context that eventually modifies it, you will have to cast away const-ness – which is a much easier thing for a static checker to detect.

Type of enumeration constants: in C you can declare enumerations, but the enumeration constants you declare have type int. So an enumeration declaration is just a shorthand for declaring a pile of integer constants, and provides no additional type safety. In C++, each enumeration is a distinct type. So you can’t assign a value of type int to a variable of an enumeration type, unless you use an explicit conversion. Unfortunately, you can still assign a value of an enumeration type to a variable of type int, because enumeration types can be implicitly converted by integral promotion. However, a static checker can easily detect this implicit conversion and warn about it.

Type of character literals: In C a character literal has type int. Therefore, if you want to pass a character literal as a parameter to a function expecting a char, or you want to assign a character literal to a variable of type char, then you are performing a narrowing type conversion. Good practice states that such a conversion must be explicit, and a static checker should report any implicit int to char conversion. This is of course absurd. In C++ a character literal has type char as you would expect.

If you don’t need to keep the ability to compile your program using a C compiler, you can take advantage of a few other improvements of C++ over C. For example, you can use the // form of comment – safer than the /*..*/ form because you can see immediately where it ends (i.e. the end of the line). You can also stop writing typedef struct _Foo { … } Foo and just write struct Foo { .. } instead, because in C++ any class, struct or enum declaration defines a type name. Finally, you can stop writing those dangerous function-like macros, because the inline keyword of C++ lets you declare a proper function instead, without (for any reasonable compiler) worrying about efficiency.

Is there any risk in compiling existing C code using a C++ compiler? Well, there are a few ways in which you can write programs that have different meanings in C and C++. One is to use a character literal as the operand of sizeof ( ). Some static checkers will warn about this, and it’s banned in my safety-critical C++ subset. Another way is to declare a struct or an enum with the same name as a variable in an outer scope, and then take sizeof() that name – possible, but unlikely because types are typically declared only at file scope. See this Wikipedia article for an summary of the ways in which C++ isn’t backwards compatible with C.

Switching compilers does of course introduce risks. Fortunately, many vendors use a single compiler for both C and C++. If your C compiler also supports C++, then you can just tell it to compile in C++ mode instead of C mode. The chances are that you will be using very little functionality of that compiler that you weren’t using before – until you start using major new language features of C++. However, if for you moving to C++ compilation requires using a compiler from a different vendor, then you’ll probably want to wait for a new project before you do it.

I’ve digressed a little from the main theme of this blog, which is verification of safety-critical code written in C/C++. I’ll get back to that in my next post.

Categories: C and C++ in critical systems

Safer explicit type conversion

February 11, 2010 davidcrocker 4 comments

In C there is just one syntax for explicit type conversion: the cast expression, with syntax (T) e where T is a type and e is an expression. This notation covers a multitude of sins. It can be used to do any of the following, or more than one at the same time:

Convert one numeric type to another
Reinterpret a pointer to one type as a pointer to another type, an integral type as a pointer type, or vice versa
Remove the const or volatile attribute of the expression’s type

Although C++ still supports the C-style cast syntax, it also provides better type-conversion operators with more specialised meanings:

static_cast<T>(e) is provided to convert between numeric types
reintrepret_cast<T>(e) is provided to convert between pointer types, or between integral and pointer types
const_cast<T>(e) is provided to remove const and/or volatile from the type

There is also dynamic_cast<T>(e) but that is not relevant here because we’re not allowing classes to inherit from each other.

These C++ type conversion operators have significant advantages over the C cast notation. First, they make it clear what category of type conversion we intend to do. This prevents us from intending to do a type conversion in one category but accidentally doing a conversion in a more dangerous category; or inadvertantly discarding const-ness or volatility along with the conversion we intended. Second, it is easy to pick out the more dangerous categories when reviewing source code. For example, the conversions that reinterpret_cast does are generally more dangerous than the ones performed by static_cast. Third, the operand of the type conversion appears in brackets, removing the possibility of precedence errors. Fourth, you can easily search source text for instances of these type conversions.

Therefore, my “C with little bits of C++” language for developing critical systems includes the following rules:

You are allowed to use the static_cast, reinterpret_cast and const_cast features of C++.
You are not allowed to use the C cast notation.

For critical system development, reinterpret_cast and const_cast should generally be avoided. If you have been writing in C and using a MISRA compliance checker, then MISRA C 2004 ruless 11.1 through 11.5 restrict what you can do with C casts having the semantics of reinterpret_cast and const_cast. To get equivalent checks wheh you use the C++ constructs, you’ll need to use a MISRA C++ compliance checker, or (preferably) ban the use of reinterpret_cast and const_cast completely.

Categories: C and C++ in critical systems

Invariants for C/C++ Classes and Structs

February 5, 2010 davidcrocker 3 comments

In yesterday’s post, I proposed the use of simple C++ classes in critical software. I pointed out that classes are better than C structs, because they offer encapsulation and make it easier to avoid using objects that are not completely initialized. Now I’m going to point out another advantage of classes over structs, which is that they make it easier to enforce invariants.

Consider the following C code:

typedef struct _Limits { int minValue; int maxValue; // must always be >= minValue } Limits;

The comment is an example of an invariant, i.e. a condition on the values of the members that we always expect to be true. During testing, we might want to do runtime checks to report any violation of the invariant. We would also like to do static analysis to make sure it always holds.

The problem with enforcing this invariant is that minValue and maxValue are public. This means that any piece if code that uses a variable of type Limits can break the invariant by assigning a new valie to minValue or maxValue. If we want to check the invariant at runtime, we must add a runtime check everywhere that the code assigns a value to either of these fields. Likewise, a static analyser must consider whether the invariant is broken at every place where one of these fields is assigned.

Let’s look at how we would define the Limits type using a C++ class instead:

class Limits { int _minValue; int _maxValue; // must always be >= minValue public: int minValue() const { return _minValue; } int maxValue() const { return _maxValue; } Limits(int n, int x) : _minValue(n), _maxValue(x) {} }

I’ve made the data private, and I’ve added a couple of functions to allow the min and max values to be read, but not written (don’t worry about whether this is efficient – any reasonable C++ compiler will inline calls to these functions). I’ve also added a constructor so that we can create values of type Limits. Using this new declaration of Limits, the only way that anyone can break the invariant is by calling the constructor with n > x. So there is just one place where we need to insert a runtime check to catch every instance where this invariant might be broken.

Finally, let’s look at what you need to do to get ArC to verify statically that the invariant always holds:

#include "arc.h" class Limits { int _minValue; int _maxValue; invariant(_maxValue >= _minValue) public: int minValue() const { return _minValue; } int maxValue() const { return _maxValue; } Limits(int n, int x) : _minValue(n), _maxValue(x) pre(x >= n) {} }

Instead of expressing the invariant as a comment, we have expressed it using the invariant keyword. We #include “arc.h” at the start so that when you are compiling the file using a normal C++ compiler, invariant(…) is defined as a macro that expands to nothing. This makes the invariant invisible to the compiler. But when ArC sees the invariant, it know that it needs to prove that the invariant holds anywhere that we create or modify a value of type Limits.

Since the invariant only depends on private data, ArC only has to worry about breaking the invariant within the class’s own constructors and members. In order to prove that the Limits constructor satisfies the invariant, we need to ensure x >= n whenever it is called. That’s why I added the pre(x >= n) clause in the constructor. This clause tells ArC to assume x >= n when it verifies the constructor, and to verify x >= n whenever we call the constructor. pre is another ArC keyword – it stands for precondition.

Incidentally, although Microsoft’s Vcc doesn’t support any C++ (unlike ArC), it does allow you to declare invariants on structures. But when you want to initialize or modify such a structure, you’ll generally need to add some more annotations to “unwrap” and “wrap” it. That’s the price of not having encapsulation.

Categories: C and C++ in critical systems, Formal verification of C programs Tags: formal specification, formal verification, struct invariant

Using C++ classes in critical software

February 4, 2010 davidcrocker 3 comments

The first C++ feature I’m going to suggest for use in safety-critical software is classes in place of structs. Here are the rules:

You may use class instead of struct
Within class declarations, as well as declaring data members, you may declare function members and constructors (but not operators)
You may use the private: and public: modifiers
All data members in a class should be declared private
Every class must have at least one constructor
Each constructor of a class must initialise all the data members of the class
If you declare a single-argument constructor, it must be declared explicit
If you declare a copy constructor, it must be declared private.

That’s all I want you to use, for now at least. Don’t use inheritance or the virtual keyword. Don’t use default parameters, or function overloading. Try to avoid overloading constructors too. Why no overloading or default parameters? Well, one of the few really big mistakes that the designers of C++ made was to introduce these features without at the same time massively restricting automatic type conversion. These features interact very badly. Is there any one out there who really understands the C++ rules for resolving ambiguity when a function call matches more than one declaration?

Some years ago when I was reviewing a large body of C++ code, the most common error I found was that someone had left out a parameter in a function call. The compiler was silently type-converting the other parameters to match a different overload of the same function, and the resulting behaviour was not what the programmer had intended.

Why do I insist that single-argument constructors must be explicit? Because if you don’t, then the compiler is free to use the constructor to perform automatic type conversions. Repeat after me: automatic type conversions are evil.

Why the restriction on user-defined copy constructors? Well, if you don’t declare a copy constructor, then the compiler will generate one, which will just copy the object field-by-field. This is the obvious semantics that we expect when copying an object. The usual purpose of user-defined copy constructors is to perform side-effects related to memory management. For example, you might write a copy constructor that, instead of copying a pointer field, creates a fresh copy of the object pointed to and points to that copy instead. That’s fine in an application that is free to use dynamically-allocated memory. But we don’t use dynamic memory allocation in critical software, except possibly during the initialisation phase. So there is no place in critical code for copy constructors of this form. On the other hand, it is occasionally useful to make it impossible to compile code that tries to copy an object of some class, which you can achieve by declaring a private copy constructor.

What do we gain by using C++ classes instead of structs? The major gains are encapsulation and more predictable execution. If you follow the rules above, then you have precise control over how the data members of your objects can be modified (because the data members are declared private). You also prevent the declaration or creation of uninitialized and partly-initialized objects, which are a major cause of unpredictable execution. The rule that requires every class to have at least one constructor prevents you from declaring a variable of the class without initializing it. The rule that constructors must initialize all data members prevents you from creating partly-initialized objects. Of course, you still have to make sure that your constructor really does initialize all members – but you only have to do this in one place, not everywhere you declare a variable of that class.

Categories: C and C++ in critical systems

Which is better for critical software: C or C++ ?

February 2, 2010 davidcrocker Comments off

I’m well aware that there are many in the critical software business who will answer that question with “Neither; you should write in Ada, which is a much safer language than C or C++”. I’m not going to argue with that. What I am going to argue is that if you can’t or don’t want to use Ada, then rather than write in plain C, you should consider using a very limited subset of C++ .

At this point, another group of people will be telling me that C++ isn’t suitable for critical software. The arguments they put forward include the following:

C++ is a complicated language with difficult semantics, not sufficiently understood by developers, compiler writers, or even the language designers.
C++ compilers are much less mature than C compilers and more difficult to write, so there is a greater chance of code generation errors.
Whereas the C language standard includes a list of constructs for which behaviour is undefined or implementation-defined, there is no such list for C++; therefore it is impossible to develop a C++ subset that avoids all such constructs.

Let’s address these concerns. C++ is indeed a complicated language; but I’m not proposing that developers of safety-critical systems embrace the whole of C++, or even a large part of it. I’m suggesting instead that we allow a few selected C++ constructs to be used in what would otherwise be a C program. This is not the same as what MISRA C++ or JSF C++ do. Those standards list C++ constructs that they consider are not safe to use. I’m going to take the opposite approach and list a few C++ constructs that I believe are safe to use and offer some benefit compared to writing in plain C.

Ten years ago, concern about C++ compiler maturity was entirely justified. When we were writing the C++ code generation for Perfect Developer back in 1999, we had to work around bugs in both gcc and Microsoft C++ compilers. However, these and many other C++ compilers are now very mature. Also, if you are just using the simple C++ features that I propose, you are unlikely to hit the difficult areas where compiler bugs may still lurk – no more likely than if you use plain C.

The lack of a complete list of C++ constructs with undefined and implementation-defined behaviour is a real concern. However, in practice C++ developers avoid the dark corners of the language, and they rarely to hit problems with undefined behaviour – apart from all the usual ones that are also present in C (e.g. accessing arrays out-of-bounds). There are a few areas of concern – such as the order of initialisation of static variables – but these are well-known. And I’m proposing that we use only a small fraction of the C++ language – we won’t go anywhere near those dark corners.

I hope I’ve persuaded you that if you are currently writing critical software in C, then writing in a very limited subset of C++ need be no less safe. Of course, you’ll want to make sure that you have a good-quality C++ compiler for your target hardware, and some means of enforcing the subset (we can help you with that). But what are the benefits? They are, in no particular order:

To let you write code that is better structured and easier to maintain;
To avoid some “bad” C constructs, using safer C++ constructs instead;
To make the code easier to verify.

I’ll go into more detail in my next post.

Categories: C and C++ in critical systems

Hello from David Crocker!

January 31, 2010 davidcrocker Comments off

Hi, I’m David Crocker, CTO of Escher Technologies. We make software development tools to help you write safety-critical and other high-integrity software. Our emphasis is on using automated mathematic proof to make sure that the software meets its requirements. Of course, that’s only possible if you know what the requirements are in the first place! Requirements often change through the lifetime of the software – or even while the initial version is being developed – so of course our tools also cope with changing requirements.

Up till now, we’ve mostly concentrated on tools that let you express specifications, refine them if necessary and then generate code (see http://www.eschertech.com if you want to know more). However, we’ve found that in much of the embedded world, many developers prefer to write the code by hand – and most of them are writing in C. So we’re also looking at automatic proofs of correctness for embedded software written in C and C++. For example, do you use assertions in your C/C++ code? Wouldn’t it be great if a tool could automatically prove that those assertions are true, or else tell you why it “ain’t necessarily so” ? Tools to do that sort of analysis is what our business is about.

We’re currently using two different tools to verify C programs. One is Vcc from Microsoft Research. The other is our own, which we call ArC. That’s short for Automated Reasoning for C. It differs from Vcc in that it doesn’t handle concurrency, does handle a few features that Vcc doesn’t (e.g. floats and doubles), and is designed for a subset of C suitable for critical systems, with a few C++ features thrown in. Which one is best for you depends on what you are trying to do.

In future posts I’ll talk about how we subset C/C++ to make it both safer to write in and easier for tools to verify, and some of the problems in doing automatic verification of C/C++.

Categories: C and C++ in critical systems

Newer Entries

David Crocker's Verification Blog

Taming Pointers in C/C++

More reasons why C++ can be safer than C

Safer explicit type conversion

Invariants for C/C++ Classes and Structs

Using C++ classes in critical software

Which is better for critical software: C or C++ ?

Hello from David Crocker!

Email Subscription

Latest posts

Categories

Pages

Archives