Archive for February, 2010

Invariants for C/C++ Classes and Structs

February 5, 2010 3 comments

In yesterday’s post, I proposed the use of simple C++ classes in critical software. I pointed out that classes are better than C structs, because they offer encapsulation and make it easier to avoid using objects that are not completely initialized. Now I’m going to point out another advantage of classes over structs, which is that they make it easier to enforce invariants.

Consider the following C code:

typedef struct _Limits {
int minValue;
int maxValue;  // must always be >= minValue
} Limits;

The comment is an example of an invariant, i.e. a condition on the values of the members that we always expect to be true. During testing, we might want to do runtime checks to report any violation of the invariant. We would also like to do static analysis to make sure it always holds.

The problem with enforcing this invariant is that minValue and maxValue are public. This means that any piece if code that uses a variable of type Limits can break the invariant by assigning a new valie to minValue or maxValue. If we want to check the invariant at runtime, we must add a runtime check everywhere that the code assigns a value to either of these fields. Likewise, a static analyser must consider whether the invariant is broken at every place where one of these fields is assigned.

Let’s look at how we would define the Limits type using a C++ class instead:

class Limits {
int _minValue;
int _maxValue; // must always be >= minValue
int minValue() const { return _minValue; }
int maxValue() const { return _maxValue; }
Limits(int n, int x)
: _minValue(n), _maxValue(x) {}

I’ve made the data private, and I’ve added a couple of functions to allow the min and max values to be read, but not written (don’t worry about whether this is efficient – any reasonable C++ compiler will inline calls to these functions). I’ve also added a constructor so that we can create values of type Limits. Using this new declaration of Limits, the only way that anyone can break the invariant is by calling the constructor with n > x. So there is just one place where we need to insert a runtime check to catch every instance where this invariant might be broken.

Finally, let’s look at what you need to do to get ArC to verify statically that the invariant always holds:

#include "arc.h"
class Limits {
int _minValue;
int _maxValue;
invariant(_maxValue >= _minValue)
int minValue() const { return _minValue; }
int maxValue() const { return _maxValue; }
Limits(int n, int x)
: _minValue(n), _maxValue(x) pre(x >= n) {}

Instead of expressing the invariant as a comment, we have expressed it using the invariant keyword. We #include “arc.h” at the start so that when you are compiling the file using a normal C++ compiler, invariant(…) is defined as a macro that expands to nothing. This makes the invariant invisible to the compiler. But when ArC sees the invariant, it know that it needs to prove that the invariant holds anywhere that we create or modify a value of type Limits.

Since the invariant only depends on private data, ArC only has to worry about breaking the invariant within the class’s own constructors and members. In order to prove that the Limits constructor satisfies the invariant, we need to ensure x >= n whenever it is called. That’s why I added the pre(x >= n) clause in the constructor. This clause tells ArC to assume x >= n when it verifies the constructor, and to verify x >= n whenever we call the constructor. pre is another ArC keyword – it stands for precondition.

Incidentally, although Microsoft’s Vcc doesn’t support any C++ (unlike ArC), it does allow you to declare invariants on structures. But when you want to initialize or modify such a structure, you’ll generally need to add some more annotations to “unwrap” and “wrap” it. That’s the price of not having encapsulation.

Using C++ classes in critical software

February 4, 2010 3 comments

The first C++ feature I’m going to suggest for use in safety-critical software is classes in place of structs. Here are the rules:

  1. You may use class instead of struct
  2. Within class declarations, as well as declaring data members, you may declare function members and constructors (but not operators)
  3. You may use the private: and public: modifiers
  4. All data members in a class should be declared private
  5. Every class must have at least one constructor
  6. Each constructor of a class must initialise all the data members of the class
  7. If you declare a single-argument constructor, it must be declared explicit
  8. If you declare a copy constructor, it must be declared private.

That’s all I want you to use, for now at least. Don’t use inheritance or the virtual keyword. Don’t use default parameters, or function overloading. Try to avoid overloading constructors too. Why no overloading or default parameters? Well, one of the few really big mistakes that the designers of C++ made was to introduce these features without at the same time massively restricting automatic type conversion. These features interact very badly. Is there any one out there who really understands the C++ rules for resolving ambiguity when a function call matches more than one declaration?

Some years ago when I was reviewing a large body of C++ code, the most common error I found was that someone had left out a parameter in a function call. The compiler was silently type-converting the other parameters to match a different overload of the same function, and the resulting behaviour was not what the programmer had intended.

Why do I insist that single-argument constructors must be explicit? Because if you don’t, then the compiler is free to use the constructor to perform automatic type conversions. Repeat after me: automatic type conversions are evil.

Why the restriction on user-defined copy constructors? Well, if you don’t declare a copy constructor, then the compiler will generate one, which will just copy the object field-by-field. This is the obvious semantics that we expect when copying an object. The usual purpose of user-defined copy constructors is to perform side-effects related to memory management. For example, you might write a copy constructor that, instead of copying a pointer field, creates a fresh copy of the object pointed to and points to that copy instead. That’s fine in an application that is free to use dynamically-allocated memory. But we don’t use dynamic memory allocation in critical software, except possibly during the initialisation phase. So there is no place in critical code for copy constructors of this form. On the other hand, it is occasionally useful to make it impossible to compile code that tries to copy an object of some class, which you can achieve by declaring a private copy constructor.

What do we gain by using C++ classes instead of structs? The major gains are encapsulation and more predictable execution. If you follow the rules above, then you have precise control over how the data members of your objects can be modified (because the data members are declared private). You also prevent the declaration or creation of uninitialized and partly-initialized objects, which are a major cause of unpredictable execution. The rule that requires every class to have at least one constructor prevents you from declaring a variable of the class without initializing it. The rule that constructors must initialize all data members prevents you from creating partly-initialized objects. Of course, you still have to make sure that your constructor really does initialize all members – but you only have to do this in one place, not everywhere you declare a variable of that class.

Which is better for critical software: C or C++ ?

February 2, 2010 Comments off

I’m well aware that there are many in the critical software business who will answer that question with “Neither; you should write in Ada, which is a much safer language than C or C++”. I’m not going to argue with that. What I am going to argue is that if you can’t or don’t want to use Ada, then rather than write in plain C, you should consider using a very limited subset of C++ .

At this point, another group of people will be telling me that C++ isn’t suitable for critical software. The arguments they put forward include the following:

  1. C++ is a complicated language with difficult semantics, not sufficiently understood by developers, compiler writers, or even the language designers.
  2. C++ compilers are much less mature than C compilers and more difficult to write, so there is a greater chance of code generation errors.
  3. Whereas the C language standard includes a list of constructs for which behaviour is undefined or implementation-defined, there is no such list for C++; therefore it is impossible to develop a C++ subset that avoids all such constructs.

Let’s address these concerns. C++ is indeed a complicated language; but I’m not proposing that developers of safety-critical systems embrace the whole of C++, or even a large part of it. I’m suggesting instead that we allow a few selected C++ constructs to be used in what would otherwise be a C program. This is not the same as what MISRA C++ or JSF C++ do. Those standards list C++ constructs that they consider are not safe to use. I’m going to take the opposite approach and list a few C++ constructs that I believe are safe to use and offer some benefit compared to writing in plain C.

Ten years ago, concern about C++ compiler maturity was entirely justified. When we were writing the C++ code generation for Perfect Developer back in 1999, we had to work around bugs in both gcc and Microsoft C++ compilers. However, these and many other C++ compilers are now very mature. Also, if you are just using the simple C++ features that I propose, you are unlikely to hit the difficult areas where compiler bugs may still lurk – no more likely than if you use plain C.

The lack of a complete list of C++ constructs with undefined and implementation-defined behaviour is a real concern. However, in practice C++ developers avoid the dark corners of the language, and they rarely to hit problems with undefined behaviour – apart from all the usual ones that are also present in C (e.g. accessing arrays out-of-bounds). There are a few areas of concern – such as the order of initialisation of static variables – but these are well-known. And I’m proposing that we use only a small fraction of the C++ language – we won’t go anywhere near those dark corners.

I hope I’ve persuaded you that if you are currently writing critical software in C, then writing in a very limited subset of C++ need be no less safe. Of course, you’ll want to make sure that you have a good-quality C++ compiler for your target hardware, and some means of enforcing the subset (we can help you with that). But what are the benefits? They are, in no particular order:

  • To let you write code that is better structured and easier to maintain;
  • To avoid some “bad” C constructs, using safer C++ constructs instead;
  • To make the code easier to verify.

I’ll go into more detail in my next post.