Using strongly-typed Booleans in C and C++

Home > C and C++ in critical systems > Using strongly-typed Booleans in C and C++

Using strongly-typed Booleans in C and C++

February 19, 2010 davidcrocker

One of the glaring omissions from the original C language was provision of a Boolean type. Booleans in C are represented by integers, with false being represented as zero and true being represented as 1. When an integer value is used as a condition, all nonzero values are intrepreted as true.

Strong typing is a valuable asset when writing code – whether critical or not – because type checks can and do uncover errors. So how can we use strongly-typed Booleans in C and C++?

If you choose to write in a limited subset of C++ rather than C (as I have advocated in previous posts) then you are halfway there already. C++ provides a built-in bool type with values false and true. Unfortunately, there are implicit type conversions between integral types and bool in both directions, and also from pointer and enumeration types to bool. But these conversions are easily uncovered by static checking. So, in the ArC processor, we insist that all such conversions are done explicitly with static_cast. This is a stronger requirement than the MISRA C++ rules.

What if you are writing in C and can’t easily make the transition to C++? One possibility is to use C’99 rather than C’90, because C’99 does provide a Boolean type. But C’99 compilers are not widely available.

Fortunately, it’s possible to use strongly-typed Booleans in C’90 too. We’ve written the ArC processor so that when it is run on C’90 source, it applies exactly the same rules as it does on C++ source. The ArC keyword bool is assumed to be a primitive type, with values false and true. Relational operators are assumed to yield bool, so are && and ||. Operands of && and || are required to have type bool, and so are the conditions in if statements, loop statements and conditional expressions.

When the source is not being processed by ArC, the following declaration is made visible in arc.h:

typedef enum _bool { false = 0, true = 1 } bool;

so the code compiles as normal. In practice, we usually find that clients already have their own macro or type definitions to express Booleans; so the actual mechanism we provide is a little more complicated in order to accommodate those definitions.

Categories: C and C++ in critical systems Tags: formal verification, strong typing

Comments (10)

Chris Hobbs

February 19, 2010 at 18:07

You say, “Strong typing is a valuable asset when writing code” and I would agree. But many people confuse “strong” typing with “static” typing. I write extensively in both C and Python (more Python than C at the moment).

As your post points out, C is poor on the strength of its typing but rigid on typing being static. Python typing is much stronger (although even it allows you to add 3 to the value True (giving 4!) which I find a fault in the language) but, of course, encourages dynamic typing.

What do you feel about dynamic typing? It seems to me that it makes program structure so much simpler and therefore less error-prone. But it does move some problems from compile-time to runtime.
davidcrocker

February 20, 2010 at 12:39

The major benefit of static typing is that it makes a large class of errors detectable at compile time. Languages with dynamic typing let you write shorter code, but at the expense of not detecting these errors until run-time. In the context of safety-critical software (which is usually embedded software), we can’t afford to have run-time errors, because they may compromise the safety of the system, or require immediate system shutdown. Neither can we necessarily guarantee to find all possible run-time errors by testing. So it is better to eliminate these errors at compile time by using static typing.

If we are doing formal verification, then in principle we could safely use dynamically-typed languages, if we prove that no run-time errors will occur. However, languages with dynamic typing are typically less efficient at run-time than languages with static typing, and normally require allocation of dynamic memory, because the amount of storage associated with a variable is not fixed at compile time. These are further reasons why they are unsuitable for use in critical embedded systems.

Another interesting class of languages is functional languages with strong static typing and type inference, such as Haskell and F#. The use of type inference means that you do not have to declare the type of a variable, parameter or function return in most situations – the type is inferred from the way you use it, and made generic if appropriate.
Chris Hobbs

February 21, 2010 at 15:07

I don’t, of course, want to flog any dead horses but would like to think a little more about runtime rather than compile-time faults. As I said in the last line of my previous comment, dynamic typing does tend to move problems from compile-time to runtime. The received wisdom of the ages says that this is a BAD thing but I think that that’s a view worth challenging.

A couple of assumptions I make:

1. Static code checking cannot find most coding errors. Most errors in code are irrelevant to the safe operation of a system and undetectable to syntactical analysis. I have no statistical evidence for saying “most” but as I read code, most of the errors I or other programmers seem to have made are innocuous. malloc(100) when you meant malloc(10) is clearly an error but, unless you’re very short of memory, it doesn’t lead to a fault and it is almost impossible to find with syntactical checking.

2. Static code checking can never avoid false positives. The lines

unsigned p[4];
p[4] = 1;

(from that really depressing article on static checking in this month’s Comms of the ACM) are picked up by a lot of static code checkers as representing a problem. However, if the code is designed to test the system’s reaction to array overflow, it’s perfectly correct code.

I believe (and your comment on my comment supports that belief) that the detection of actual and potential runtime errors arising from dynamic typing *can* be detected by static analysis.

Runtime errors are a pain in C because of what I think is C’s most unsafe characteristic: it does not support exceptions and therefore relies on a motley collection of disgusting error codes (-1 means an error for function A, whereas it’s 0 for function B, ….) rather than a clean exception mechanism. A program can easily ignore the return code from malloc(). It could not ignore an exception being thrown by malloc().

If C had a coherent exception handling mechanism then I firmly believe that I could accept the switch from compile-time error detection to runtime error management for the gain in simplicity brought about by dynamic typing. If the probability of failure were reduced by good static checking then any lingering doubts would be removed.

How often have you seen a program checking the returncode of printf()? Compare:

>>> print “%d” % “this is a string”
Traceback (most recent call last):
File “”, line 1, in
TypeError: %d format: a number is required, not str

from a dynamically-typed language with

#include
int main()
{
char string[24] = “hello”;
printf(“%d\n”, string);
return 0;
}

which prints -1078867820.

The first program, by throwing an exception, makes itself impossible to ignore. The second, by just giving a weak “warning” during compilation (which, like most programmers, I ignored) is easy to miss.

Give me the runtime failure rather than the compile-time warning any time.

Sorry to rant on so long. I think that there’s a lot of received wisdom we need to challenge to get to the goal of safe software.
davidcrocker

February 22, 2010 at 06:19

Chris, while discussing languages and type systems is interesting, the theme of this blog is using C/C++ safely in critical software. So I’ll be brief in my reply. I cannot agree with your assertion that most errors in code are irrelevant to the safe operation of the system – where is your evidence for that? One common coding error in C/C++ is failing to ensure that arrays cannot be indexed out-of-bounds. This type of coding error has allowed viruses have taken over millions of PCs, which now send you and I spam emails every day (or worse). False positives from static checkers are of course a nuisance, and need to be minimised. However, your example is not a good one, because the effect of indexing an array out of bounds in C/C++ languages is UNDEFINED according to the language standards, so I have no right to expect any reasonable behaviour if I do. Regarding exceptions vs. error codes, you ask how often do I see a program checking the result of printf? If it’s one of my programs, the answer is always. How often have you seen a program that catches all exceptions that may have been thrown, each at an appropriate point? I do prefer exceptions to error codes; but leaving an exception uncaught so that the runtime reports it and stops the program is not an option in critical embedded software. As for your printf example, the compile-time warning is MUCH more valuable than the run-time error. We won’t ship the program until we correct the code to eliminate the warning. Suppose we didn’t have the warning, and during testing it just happened that we don’t try any inputs that cause the run-time error. So we ship the software with the bug still present… and end up having to recall 100,000 cars when users start experiencing the bug. There may be contexts in which dynamic typing and run-time error reporting is appropriate, but critical embedded software isn’t one of them.
AnonCSProf

March 8, 2010 at 03:58

Forcing programmers to use static_cast to cast from integer (or pointer) to boolean is unfortunate. It is not the C way. It is idiomatic C to write something like:

rv = call_something();
if (rv)
return rv;

or

char *p = malloc(…);
if (!p)
die(“out of memory”);

Forbidding these comes at a significant cost to readability and familiarity. Why is this necessary?

Second complaint: I’ve seen lots of bad code that uses an enum that sets true=1. I can’t tell you how much code I’ve seen that does something like:

if (x==true)
do_something();
if (x!=true)
do_something_else();

Do you see the bug? If x==2, then badness ensues. Perhaps your static verifier takes care of this, but I view any code that #defines true to 1 (or uses an enum to set true to 1) as putting off bad smells. It seems like it’s a shame if your system requires programmers to follow this bad-smelling idiom.
davidcrocker

March 8, 2010 at 09:01

I agree, requiring explicit casts from integers and pointers to Boolean is not the traditional C way. The question is, which is the right way when developing critical software? The designers of Java and C# (both based on C syntax) evidently decided that implicit type conversion to bool wasn’t a good idea at all. The authors of MISRA C also decided to ban implicit conversion to bool in most contexts – see rules 12.6 and 13.2. I concur with these views. In my opinion, K&R got it wrong (along with several other things, such as implicit narrowing type conversions). I don’t agree that forbidding “if(!p)” comes as a significant disadvantage in readability. As for familiarity, C programmers who move to the world of critical software development need to drop lots of bad habits anyway.

Regarding your second complaint, I have two answers: (1) comparisons with ‘true’ and ‘false’ are redundant and should generate a warning, the two conditions should be written as (x) and (!x); (2) if x has type bool then it can’t have the value 2. Enforcing either of these requires that support for strongly-typed Booleans is provided by either the language or by a separate program like ArC.
Chris Hobbs

March 9, 2010 at 21:44

I would hesitate over the strength of the rule about not comparing with ‘true’ and ‘false’. It seems to work OK with a variable called x but in the first case, for readability, we have to require that a Boolean variable name a predicate:

isComplete
isNotChecked

Then the if statements without an explicit comparison with ‘true’ or ‘false’ are OK as long as they’re not inverted:

if (isComplete) // OK
if (isNotChecked) // OK
if (!isNotChecked) // NOK: too easy to misread
if (isNotChecked == False) // Really nasty but better than the above
- davidcrocker
  
  March 10, 2010 at 10:08
  
  I agree, the double negative makes if(!isNotChecked) difficult to read. My preference is to avoid Boolean variables with “Not” in the name, i.e. I would use a Boolean called isChecked instead of one called isNotChecked.
Jim Ronback

February 1, 2013 at 07:06

Suppose you wanted safety critical boolean variables to be less susceptible to single bit hardware errors due to cosmic rays, e.g., heavy energetic particles that could flip a bit, what would be the best way to encode safety critical boolean variable to make it easy to correctly do logic operations with them and still be robust to detect an invalid representation with one or more flipped bits from being used?
- davidcrocker
  
  February 1, 2013 at 13:14
  
  I think I’d use an enumeration type instead of a Boolean, with enumeration constants True and False whose values differ by several bits. Then I’d use a switch statement everywhere that I wanted to test the variable, with a default-case to handle the error condition. Of course, this is likely to lead eCv (and some static analysis programs) to report that the default-case is dead code.

No trackbacks yet.

Comments are closed.

The Taming of the Pointer – Part 2 Taming Pointers in C/C++

David Crocker's Verification Blog

Using strongly-typed Booleans in C and C++

Email Subscription

Latest posts

Categories

Pages

Archives

David Crocker's Verification Blog

Using strongly-typed Booleans in C and C++

Share this:

Related

Email Subscription

Latest posts

Categories

Pages

Archives