Archive

Archive for February, 2010

Using constrained types in C

February 26, 2010 4 comments

When writing critical software, one of the advantages cited for using Ada rather than C is that Ada lets you define constrained types, like this:

type Percentage is Integer range 0 .. 100;

Read more…

Reasoning about null-terminated strings in C/C++

February 23, 2010 Comments off

In my last post I described how ArC supports reasoning about array access, by allowing you to refer to the bounds of an array in a specification. If the code itself needs to know the size of an array, then the size is provided by other means, for example by passing it as an extra parameter. However, when using arrays of characters, standard practice in C is not to pass the number of elements, but to use a null character to indicate the end. Read more…

The Taming of the Pointer – Part 2

February 22, 2010 6 comments

In last Wednesday’s post I mentioned three ways in which pointers are troublesome in C/C++, and I introduced the null ArC keyword to mitigate one of them. Now I’ll turn to the second issue: the fact that given (say) a variable or parameter of type int*, the type does not allow us to determine whether it refers to a single int, or to an array of ints – nor, if it refers to an array, can we find how many elements the array contains. Read more…

Using strongly-typed Booleans in C and C++

February 19, 2010 10 comments

One of the glaring omissions from the original C language was provision of a Boolean type. Booleans in C are represented by integers, with false being represented as zero and true being represented as 1. When an integer value is used as a condition, all nonzero values are intrepreted as true.

Strong typing is a valuable asset when writing code – whether critical or not – because type checks can and do uncover errors. So how can we use strongly-typed Booleans in C and C++? Read more…

Taming Pointers in C/C++

February 17, 2010 Comments off

When doing verification or deep static analysis of C and C++, pointers are troublesome in several ways:

  • Zero (i.e. NULL) is an allowed value of every pointer type in the language. Occasionally we want to allow null pointers, for example in the link field of the last element of a linked list. More usually, we don’t want to allow null. Verification requires that anywhere we use the * or [ ] operator on a pointer, we can be sure that it is not null.
  • C and C++ do not distinguish between pointers to single variables and pointers to arrays. So, where we have a parameter or variable of type T*, we can’t tell whether it is supposed to point to a variable or an array. If it points to a single variable, then we mustn’t do pointer arithmetic or indexing on it. The verifier must be able to check this.
  • Array parameters in C/C++ are passed as pointers. Aside from the problem that we can’t distinguish array pointers from pointers to single variables, we also have the problem that there is no size information contained in an array pointer.
  • Anywhere we use pointers to mutable data, there is the possibility of aliasing. In other words, there may be more than one pointer to the same data. The verifier needs to take account of the fact that changes to data made through one pointer may affect the values subsequently read through another pointer.

Although pointers are less troublesome in Ada, the aliasing problem still exists. The SPARK verifiable subset of Ada handles this by banning pointers altogether. Unfortunately, this isn’t an option in a C/C++ subset for critical systems, because pointers are the only mechanism for passing parameters by reference.

I’ll deal with the issue of unwanted nullability first. One solution is to add invariants stating that particular variables of pointer type cannot be NULL. Similarly, where a function takes parameters of pointer type, we can write preconditions that these parameters cannot be NULL. Here are a couple of examples:

struct Status1 {
const char* message;
...
invariant(message != NULL)
}

void sum(int *arr, int size)
pre(arr != NULL)
{ ... }

The problem with this approach is that you need a lot of these invariants and preconditions because, more often than not, it makes no sense to allow NULL. So in ArC we take the opposite approach. We assume that pointers are not allowed to be NULL except where you say otherwise. In the above examples, you can leave out the precondition and invariant if you don’t want to allow either pointer to be NULL.

To tell ArC that a pointer is allowed to be NULL, you flag it with the null attribute, like this:

struct Status2 {
const char* null message;
...
}

void setMessage(const char * null msg) { ... }

This greatly reduces the amount of annotation needed, because the null annotation is more concise than a precondition or invariant, and it is needed less often. As you might expect, null is another macro defined in arc.h that expands to emptiness when you compile the code. Syntactically, it behaves like const or volatile.

That’s all for today – I’ll discuss how we handle the other problems with pointers later.

More reasons why C++ can be safer than C

February 12, 2010 Comments off

In previous postings I’ve explained how you can use C along with a few selected C++ features to write better and less error-prone code than you can in plain C. However, even if you don’t use any C++ constructs at all, you can still get a few benefits by compiling your C code using a C++ compiler. Here’s why:

Type of string literals: in C, a string literal has type char[]. There is nothing to prevent you from passing a string literal as a parameter to a function that writes to it. The C standard specifies that the effect of writing to a string literal is undefined, so you had better avoid it. Things are better in C++, where the type of a string literal is const char[]. If you want to use a C++ string literal in some context that eventually modifies it, you will have to cast away const-ness – which is a much easier thing for a static checker to detect.

Type of enumeration constants: in C you can declare enumerations, but the enumeration constants you declare have type int. So an enumeration declaration is just a shorthand for declaring a pile of  integer constants, and provides no additional type safety. In C++, each enumeration is a distinct type. So you can’t assign a value of type int to a variable of an enumeration type, unless you use an explicit conversion. Unfortunately, you can still assign a value of an enumeration type to a variable of type int, because enumeration types can be implicitly converted by integral promotion. However, a static checker can easily detect this implicit conversion and warn about it.

Type of character literals: In C a character literal has type int. Therefore, if you want to pass a character literal as a parameter to a function expecting a char, or you want to assign a character literal to a variable of type char, then you are performing a narrowing type conversion. Good practice states that such a conversion must be explicit, and a static checker should report any implicit int to char conversion. This is of course absurd. In C++ a character literal has type char as you would expect.

If you don’t need to keep the ability to compile your program using a C compiler, you can take advantage of a few other improvements of C++ over C. For example, you can use the // form of comment – safer than the /*..*/ form because you can see immediately where it ends (i.e. the end of the line). You can also stop writing typedef struct _Foo { … } Foo and just write struct Foo { .. } instead, because in C++ any class, struct or enum declaration defines a type name. Finally, you can stop writing those dangerous function-like macros, because the inline keyword of C++ lets you declare a proper function instead, without (for any reasonable compiler) worrying about efficiency.

Is there any risk in compiling existing C code using a C++ compiler? Well, there are a few ways in which you can write programs that have different meanings in C and C++. One is to use a character literal as the operand of sizeof ( ). Some static checkers will warn about this, and it’s banned in my safety-critical C++ subset. Another way is to declare a struct or an enum with the same name as a variable in an outer scope, and then take sizeof() that name – possible, but unlikely because types are typically declared only at file scope. See this Wikipedia article for an summary of the ways in which C++ isn’t backwards compatible with C.

Switching compilers does of course introduce risks. Fortunately, many vendors use a single compiler for both C and C++. If your C compiler also supports C++, then you can just tell it to compile in C++ mode instead of C mode. The chances are that you will be using very little functionality of that compiler that you weren’t using before – until you start using major new language features of C++. However, if for you moving to C++ compilation requires using a compiler from a different vendor, then you’ll probably want to wait for a new project before you do it.

I’ve digressed a little from the main theme of this blog, which is verification of safety-critical code written in C/C++. I’ll get back to that in my next post.

Safer explicit type conversion

February 11, 2010 4 comments

In C there is just one syntax for explicit type conversion: the cast expression, with syntax (T) e where T is a type and e is an expression. This notation covers a multitude of sins. It can be used to do any of the following, or more than one at the same time:

  • Convert one numeric type to another
  • Reinterpret a pointer to one type as a pointer to another type, an integral type as a pointer type, or vice versa
  • Remove the const or volatile attribute of the expression’s type

Although C++ still supports the C-style cast syntax, it also provides better type-conversion operators with more specialised meanings:

  • static_cast<T>(e) is provided to convert between numeric types
  • reintrepret_cast<T>(e) is provided to convert between pointer types, or between integral and pointer types
  • const_cast<T>(e) is provided to remove const and/or volatile from the type

There is also dynamic_cast<T>(e) but that is not relevant here because we’re not allowing classes to inherit from each other.

These C++ type conversion operators have significant advantages over the C cast notation. First, they make it clear what category of type conversion we intend to do. This prevents us from intending to do a type conversion in one category but accidentally doing a conversion in a more dangerous category; or inadvertantly discarding const-ness or volatility along with the conversion we intended. Second, it is easy to pick out the more dangerous categories when reviewing source code. For example, the conversions that reinterpret_cast does are generally more dangerous than the ones performed by static_cast. Third, the operand of the type conversion appears in brackets, removing the possibility of precedence errors. Fourth, you can easily search source text for instances of these type conversions.

Therefore, my “C with little bits of C++” language for developing critical systems includes the following rules:

  1. You are allowed to use the  static_cast, reinterpret_cast and const_cast features of C++.
  2. You are not allowed to use the C cast notation.

For critical system development, reinterpret_cast and const_cast should generally be avoided. If you have been writing in C and using a MISRA compliance checker, then MISRA C 2004 ruless 11.1 through 11.5 restrict what you can do with C casts having the semantics of reinterpret_cast and const_cast. To get equivalent checks wheh you use the C++ constructs, you’ll need to use a MISRA C++ compliance checker, or (preferably) ban the use of reinterpret_cast and const_cast completely.