Review

Created at: 2024-07-20

I've been meaning to read other books from Kernighan for a while after reading K&R's C programming book and "Unix a Memoir". The rational is simple: I like his direct and simple writing and he has worked with some very important historical figures in computer science. Also Rob Pike, who co-authored this book, is known for his work on the Go language, and before that, for working at Bell Labs.

This book is relatively old (1999), but I had a sense that much of the advice would still be relevant. That was correct. I would put this book in the "Classics" category of Computer Science for sure.

Summary Page

This is the last page of the book, which summarises the advice:

The following is a list of highlights from the book that I'll likely reference again:

Naming things

Much information comes from context and scope; the broader the scope of a variable, the more information should be conveyed by its name. Use descriptive names for globals, short names for locals. Global variables, by definition, can crop up anywhere in a program, so they need names long enough shorter names suffice for local variables; within a function, n may be sufficient, npoints is fine, and numberOfPoints is overkill.

Use active names for functions. Function names should be based on active verbs, perhaps followed by nouns: Functions that return a boolean (true or false) value should be named so that the return value is unambiguous. Conditional expressions that include negations are always hard to understand Use the natural form for expressions. Write expressions as you might speak them aloud. Conditional expressions that include negations are always hard to understand

Macros

Even parenthesizing the macro properly does not address the multiple evaluation problem. If an operation is expensive or common enough to be wrapped up, use a function.

Define numbers as constants, not macros. C programmers have traditionally used #define to manage magic number values.

And macros are a dangerous way to program because they change the lexical structure of the program underfoot. Let the language proper do the work. In C and C++, integer constants can be defined with an enum statement, C also has const values but they cannot be used as array bounds, so the enum statement remains the method of choice in C.

Sizeof

For similar reasons, sizeof(array[0]) may be better than sizeof(int) because it’s one less thing to change if the type of the array changes.

Sizes of data types. The sizes of basic data types in C and C++ are not defined; other than the basic rules that sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long)

Arrays

In C and C++, a parameter that is an array of strings can be declared as char *array[] or char **array. Although these forms are equivalent, the first makes it clearer how the parameter will be used.

Casting

The return value of realloc does not need to be cast to its final type because C promotes the void* automatically. But C++ does not; there the cast is required. One can argue about whether it is safer to cast (cleanliness, honesty) or not to cast (the cast can hide genuine errors). We chose to cast because it makes the program legal in both C and C++; the price is less error-checking from the C compiler, but that is offset by the extra checking available from using two compilers.

Interfaces and Libraries

This is a pervasive and growing concern in software: as libraries, interfaces, and tools become more complicated, they become less understood and less controllable. When everything works, rich programming environments can be very productive, but when they fail, there is little recourse. Indeed, we may not even realize that something is wrong if the problems involve performance or subtle logic errors.

Without these principles, the result is often the sort of haphazard interfaces that frustrate and impede programmers every day.

Global Variables

Avoid global variables; wherever possible it is better to pass references to all data through function arguments.

Memory and Allocation

Free a resource in the same layer that allocated it. One way to control resource allocation and reclamation is to have the same library, package, or interface that allocates a resource be responsible for freeing it. Another way of saying this is that the allocation state of a resource should not change across the interface. Our CSV libraries read data from files that have already been opened, so they leave them open when they are done. The caller of the library needs to close the files.

A few machines allow ints to be stored on odd boundaries, but most demand that an n-byte primitive data type be stored at an n-byte boundary, for example that doubles, which are usually 8 bytes long, are stored at addresses that are multiples of 8. On top of this, the compiler writer may make further adjustments, such as forcing alignment for performance reasons.

Errors

Detect errors at a low level, handle them at a high level. As a general principle, errors should be detected at as low a level as possible, but handled at a high level. In most cases, the caller should determine how to handle an error, not the callee.

Exceptions should not be used for handling expected return values. Reading from a file will eventually produce an end of file; this should be handled with a return value, not by an exception.

Exceptions are often overused. Because they distort the flow of control, they can lead to convoluted constructions that are prone to bugs. It is hardly exceptional to fail to open a file; generating an exception in this case strikes us as over-engineering. Exceptions are best reserved for truly unexpected events, such as file systems filling up or floating-point errors.

Follow up Recommendations

One practical book based on hard-won experience is Large-Scale C++ Software Design by John Lakos (Addison-Wesley, 1996), which discusses how to build and manage truly large C++ programs. David Hanson’s C Interfaces and Implementations (Addison-Wesley, 1997) is a good treatment for C programs.

Practices

Read before typing. One effective but under-appreciated debugging technique is to read the code very carefully and think about it for a while without making changes. There’s a powerful urge to get to the keyboard and start modifying the program to see if the bug goes away.

Portability

The char type in C and C++ may be signed or unsigned, and need not even have exactly 8 bits. Leaving such issues up to the compiler writer may allow more efficient implementations and avoid restricting the hardware the language will run on, at the risk of making life harder for programmers.

Brand new features such as // comments and complex in C, or features specific to one architecture such as the keywords near and far, are guaranteed to cause trouble. If a feature is so unusual or unclear that to understand it you need to consult a “language lawyer”—an expert in reading language definitions—don’t use it.

Sign

Signedness of char. In C and C++, it is not specified whether the char data type is signed or unsigned. This can lead to trouble when combining chars and ints, such as in code that calls the int-valued routine getchar().

Even if char is signed, however, the code isn’t correct. The comparison will succeed at EOF, but a valid input byte of 0xFF will look just like EOF and terminate the loop prematurely. So regardless of the sign of char, you must always store the return value of getchar in an int for comparison with EOF.

Side Effects

Don’t use side effects except for a very few idiomatic constructions like a[i++] = 0;

Boundaries

Don’t compare a char to EOF. Always use sizeof to compute the size of types and objects. Never right shift a signed value. Make sure the data type is big enough for the range of values you are storing in it.

Preprocessor

regular if statement with a constant condition may work just as well (as ifdef): enum { DEBUG = 0 };

if (DEBUG) { printf(...); } If DEBUG is zero, most compilers won’t generate any code for this, but they will check the syntax of the excluded code. An #ifdef, by contrast, can conceal syntax errors that will prevent compilation if the #ifdef is later enabled.

\r and \n

There is one continuing irritation with exchanging text: PC systems use a carriage return '\r' and a newline or line-feed '\n' to terminate each line, while Unix systems use only newline. The carriage return is an artifact of an ancient device called a Teletype that had a carriage-return (CR) operation to return the typing mechanism to the beginning of a line, and a separate line-feed operation (LF) to advance it to the next line.

Endianess

It is not safe to write an int (or short or long) from one computer and read it as an int on another computer. For example, if the source computer writes with fread(&x, sizeof(x), 1, stdin); the value of x will not be preserved if the machines have different byte orders. If x starts as 0x1000 it may arrive as 0x0010.