The code documentation fallacy

Say you start work on a new code base. Would you, as a user, rather have 90% or 10% of its API functions commented with Doxygen or something similar?

Using my psychic powers I suspect that you chose 90%.

It seems like the obvious thing. Lots of companies even have a mandate that all API functions (or >90% of them) must be documented. Not having comments is just bad. This seems like a perfectly obvious no-brainer issue.

But is it really?

Unfortunately there are some problems with this assumption. The main one being that the comments will be written by human beings. What they probably end up being is something like this.

/*
 * Takes a foobar and frobnicates it.
 *
 * @param f the foobar to be frobnicated.
 * @param strength how strongly to frobnicate.
 * @return the frobnicated result.
 */
int frobnicate_foobar(Foobar f, int strength);

This something I like to call documentation by word order shuffle. Now we can ask the truly relevant question: what additional information does this kind of a comment provide?

The answer is, of course, absolutely nothing. It is only noise. No, actually it is even worse: it is noise that has a very large probability of being wrong. When some coder changes the function, it is very easy to forget to update the comments.

On the other hand, if only 10% of the functions are documented, most functions don’t have any comments, but the ones that do probably have something like this:

/** 
 * The Foobar argument must not have been initialized in a different
 * thread because that can lead to race conditions.
 */ 
int frobnicate_foobar(Foobar f, int strength)

This is the kind of comment that is actually useful. Naturally it would be better to check for the specified condition inside the function but sometimes you can’t. Having it in a comment is the right thing to do in these cases. Not having tons of junk documentation makes these kinds of remarks stand out. This means, paradoxically, that having less comments leads to better documentation and user experience.

As a rough estimate, 95% of functions in any code base should be so simple and specific that their signature is all you need to use them. If they are not, API design has failed: back to the drawing board.

Dealing with bleeding edge software with btrfs snapshots

Developing with the newest of the new packages is always a bit tricky. Every now and then they break in interesting ways. Sometimes they corrupt the system so much that downgrading becomes impossible. Extreme circumstances may corrupt the system’s package database and so on. Traditionally fixing this has meant reinstalling the entire system, which is unpleasant and time consuming. Fortunately there is now a better way: snapshotting the system with btrfs.

The following guide assumes that you are running btrfs as your root file system. Newest quantal can boot off of btrfs root, but there may be issues, so please read the documentation in the wiki.

The basic concept in snapshotting your system is called a subvolume. It is kind of like a subpartition inside the main btrfs partition. By default Ubuntu’s installer creates a btrfs root partition with two subvolumes called @ and @home. The first one of these is mounted as root and the latter as the home directory.

Suppose you are going to do something really risky, and want to preserve your system. First you mount the raw btrfs partition somewhere:

sudo mkdir /mnt/root
sudo mount /dev/sda1 /mnt/root
cd /mnt/root

Here /dev/sda1 is your root partition. You can mount it like this even though the subvolumes are already mounted. If you do an ls, you see two subdirectories, @ and @home. Snapshotting is simple:

sudo btrfs subvolume snapshot @ @snapshot-XXXX

This takes maybe on second and when the command returns the system is secured. You are now free to trash your system in whatever way you want, though you might want to unmount /mnt/root so you don’t accidentally destroy your snapshots.

Restoring the snapshot is just as simple. Mount /mnt/root again and do:

sudo mv @ @broken
sudo subvolume snapshot @snapshot-XXXX @

If you are sure you don’t need @snapshot-XXX any more, you can just rename it @. You can do this even if you are booted in the system, i.e. are using @ as your current system root fs.

Reboot your machine and your system has been restored to the state it was when running the snapshot command. As an added bonus your home directory does not rollback, but retains all changes made during the trashing, which is what you want most of the time. If you want to rollback home as well, just snapshot it at the same time as the root directory.

You can get rid of useless and broken snapshots with this command:

sudo btrfs subvolume delete @useless-snapshot

You can’t remove subvolumes with rm -r, even if run with sudo.

Relative popularity of build systems

The conventional wisdom in build systems is that GNU Autotools is the one true established standard and other ones are used only rarely.

But is this really true?

I created a script that downloads all original source packages from Ubuntu’s main pool. If there were multple versions of the same project, only the newest was chosen. Then I created a second script that goes through those packages and checks what build system they actually use. Here’s the breakdown:

CMake:           348     9%
Autofoo:        1618    45%
SCons:            10     0%
Ant:             149     4%
Maven:            41     1%
Distutil:        313     8%
Waf:               8     0%
Perl:            341     9%
Make(ish):       351     9%
Customconf:       45     1%
Unknown:         361    10%

Here Make(ish) means packages that don’t have any other build system, but do have a makefile. This usually indicates building via custom makefiles. Correspondingly customconf is for projects that don’t have any other build system, but have a configure file. This is usually a handwritten shell or Python file.

This data is skewed by the fact that the pool is just a jumble of packages. It would be interesting to run this analysis separately for precise, oneiric etc to see the progression over time. For truly interesting results you would run it against the whole of Debian.

The relative popularity of CMake and Autotools is roughly 20/80. This shows that Autotools is not the sole dominant player for C/C++ it once was. It’s still far and away the most popular, though.

The unknown set contains stuff such as Rake builds. I simply did not have time to add them all. It also has a lot of fonts, which makes sense, since you don’t really build those in the traditional sense.

The scripts can be downloaded here. A word of warning: to run the analysis you need to download 13 GB of source. Don’t do it just for the heck of it. The parser script does not download anything, it just produces a list of urls. Download the packages with wget -i.

Some orig packages are compressed with xz, which Python’s tarfile module can’t handle. You have to repack them yourself prior to running the analysis script.

API design fail school is in session

Today’s API design fail case study is Iconv. It is a library designed to convert text from one encoding to another. The basic API is very simple as it has only three function calls. Unfortunately two of them are wrong.

Let’s start with the initialisation. It looks like this:

iconv_t iconv_open(const char *tocode, const char *fromcode);

Having the tocode argument before the fromcode argument is wrong. It goes against the natural ordering that people have of the world. You convert from something to something and not to something from something. If you don’t believe me, go back and read the second sentence of this post. Notice how it was completely natural to you and should you try to change the word order in your mind, it will seem contrived and weird.

But let’s give this the benefit of the doubt. Maybe there is a good reason for having the order like this. Suppose the library was meant to be used only by people writing RPN calculators in Intel syntax assembly using the Curses graphics library. With that in mind, let’s move on to the second function as it is described in the documentation.

size_t iconv (iconv_t cd, const char* * inbuf, size_t * inbytesleft,
 char* * outbuf, size_t * outbytesleft); 

In this function the order is the opposite: source comes before target. Having the order backwards is bad, but having an inconsistent API such as this is inexcusable.

But wait, there is more!

If you look at the actual installed header file, this is not the API it actually provides. The second argument is not const in the implementation. So either you strdup your input string to keep it safe or cast away your const and hope/pray that the implementation does not fiddle around with it.

The API function is also needlessly complex, taking pointers to pointers and so on. This makes the common case of I have this string here and I want to convert it to this other string here terribly convoluted. It causes totally reasonable code like this to break.

char *str = read_from_file_or_somewhere();
iconv(i, &str, size_str, &outbuf, size_outbuf);

Iconv will change where str points to and if it was your only pointer to the data (which is very common) you have just lost access to it. To get around this you have to instantiate a new dummy pointer variable and pass that to iconv. If you don’t and try to use the mutilated pointers to, say, deallocate a temporary buffer you get interesting and magical crashes.

Passing the conversion types to iconv_open as strings is also tedious. You can never tell if your converter will work or not. If it fails, Iconv will not tell you why. Maybe you have a typo. Maybe this encoding has magically disappeared in this version. For this reason the encoding types should be declared in an enum. If there are very rare encodings that don’t get built on all platforms, there should be a function to query their existence.

A better API for iconv would take the current conversion function and rename it to iconv_advanced or something. The basic iconv function (the one 95% of people use 95% of the time) should look something like this:

int iconv(encoding fromEncoding, encoding toEncoding,
  errorBehaviour eb,
  const char *source, size_t sourceSize,
  char *target, size_t targetSize);

ErrorBehaviour tells what to do when encountering errors (ignore, stop, etc). The return value could be total number of characters converted or some kind of an error code. Alternatively it could allocate the target buffer by itself, possibly with a user defined allocator function.

The downside of this function is that it takes 7 arguments, which is a bit too much. The first three could be stored in an iconv_t type for clarity.

Scream if you want to go faster (with C++)!

We all know that compiling C++ is slow.

Fewer people know why, or how to make it faster. Other people do, for example the developers at Remedy made the engine of Alan Wake compile from scratch in five minutes. The payoff for this is increased productivity, because the edit-compile-run cycle gets dramatically faster.

There are several ways to speed up your compiles. This post looks at reworking your #includes.

Quite a bit of C++ compilation time is spent parsing headers for STL, Qt and whatever else you may be using. But how long does it actually take?

To find out, I wrote a script to generate C++ source. You can download it here. What it does is generate source files that have some includes and one dummy function. The point is to simulate two different use cases. In the first each source file includes a random subset of the includes. One file might use std::map and QtCore, another one might use Boost’s strings and so on. In the second case all possible includes are put in a common header which all source files include. This simulates “maximum developer convenience” where all functions are available in all files without any extra effort.

To generate the test data, we run the following commands:

mkdir good bad
./generate_code.py --with-boost --with-qt4 good
./generate_code.py --with-boost --with-qt4 --all-common bad

Compilation is straightforward:

cd good; cmake .; time make; cd ..
cd bad; cmake .; time make; cd ..

By default the script produces 100 source files. When the includes are kept in individual files, compiling takes roughly a minute. When they are in a common header, it takes three minutes.

Remember: the included STL/Boost/Qt4 functionality is not used in the code. This is just the time spent including and parsing their headers. What this example shows is that you can remove 2 minutes of your build time, just by including C++ headers smartly.

The delay scales linearly. For 300 files the build times are 2 minutes 40 seconds and 7 minutes 58 seconds. That’s over five minutes lost on, effectively, no-ops. The good news is that getting rid of this bloat is relatively easy, though it might take some sweat.

  1. Never include any (internal) header in another header if you can use a forward declaration. Include the header in the implementation file.
  2. Never include system headers (STL, etc) in your headers unless absolutely necessary, such as due to inheritance. If your class uses e.g. std::map internally, hide it with pImpl. If your class API requires these headers, change it so that it doesn’t or use something more lightweight (e.g. std::iterator instead of std::vector).
  3. Never, never, ever include system stuff in your public headers. That slows down not just your own compilation time, but also every single user of your library. The only exception is when your library is a plugin or extension to an existing library and even then your includes need to be minimal.

Speed bumps hide in places where you least expect them

The most common step in creating software is building it. Usually this means running make or equivalent and waiting. This step is so universal that most people don’t even think about it actively. If one were to see what the computer is doing during build, one would see compiler processes taking 100% of the machine’s CPUs. Thus the system is working as fast as it possibly can.

Right?

Some people working on Chromium doubted this and built their own replacement of Make called Ninja. It is basically the same as Make: you specify a list of dependencies and then tell it to build something. Since Make is one of the most used applications in the world and has been under development since the 70s, surely it is as fast as it can possibly be done.

Right?

Well, let’s find out. Chromium uses a build system called Gyp that generates makefiles. Chromium devs have created a Ninja backend for Gyp. This makes comparing the two extremely easy.

Compiling Chromium from scratch on a dual core desktop machine with makefiles takes around 90 minutes. Ninja builds it in less than an hour. A quad core machine builds Chromium in ~70 minutes. Ninja takes ~40 minutes. Running make on a tree with no changes at all takes 3 minutes. Ninja takes 3 seconds.

So not only is Ninja faster than Make, it is faster by a huge margin and especially on the use case that matters for the average developer: small incremental changes.

What can we learn from this?

There is an old (and very wise) saying that you should never optimize before you measure. In this case the measurement seemed to indicate that nothing was to be done: CPU load was already maximized by the compiler processes. But sometimes your tools give you misleading data. Sometimes they lie to you. Sometimes the “common knowledge” of the entire development community is wrong. Sometimes you just have to do the stupid, irrational, waste-of-time -thingie.

This is called progress.

PS I made quick-n-dirty packages of a Ninja git checkout from a few days ago and put them in my PPA. Feel free to try them out. There is also an experimental CMake backend for Ninja so anyone with a CMake project can easily try what kind of a speedup they would get.

Solution to all API and ABI mismatch issues

One of the most annoying things about creating shared libraries for other people to use is API and ABI stability. You start going somewhere, make a release and then realize that you have to totally change the internals of the library. But you can’t remove functions, because that would break existing apps. Nor can you change structs, the meanings of fields or any other maintenance task to make your job easier. The only bright spot in the horizont is that eventually you can do a major release and break compatibility.

We’ve all been there and it sucks. If you choose to ignore stability because, say, you have only a few users who can just recompile their stuff, you get into endless rebuild cycles and so on. But what if there was a way to eliminate all this in one, swift, elegant stroke?

Well, there is.

Essentially every single library can be reduced to one simple function call that looks kind of like this.

library_result library_do(const char *command, library_object *obj, ...)

The command argument tells the library what to do. The arguments tell it what to do it to and the result tells what happened. Easy as pie!

So, to use a car analogy, here’s an example of how you would start a car.

library_object *car;
library_result result = library_do("initialize car", NULL);
car = RESULT_TO_POINTER(result);
library_do("start engine", car);
library_do("push accelerometer", car);

Now you have a moving car and you have also completely isolated the app from the library using an API that will never need to be changed. It is perfectly forwards, backwards and sideways compatible.

And it gets better. You can query capabilities on the fly and act accordingly.

if(RESULT_TO_BOOLEAN(library_do("has automatic transmission", car))
  do_something();

Dynamic detection of features and changing behavior based on them makes apps work with every version of the library ever. The car could even be changed into a moped, tractor, or a space shuttle and it would still work.

For added convenience the basic commands could be given as constant strings in the library’s header file.

Deeper analysis

If you, dear reader, after reading the above text thought, even for one microsecond, that the described system sounds like a good idea, then you need to stop programming immediately.

Seriously!

Take your hands away from the keyboard and just walk away. As an alternative I suggest taking up sheep farming in New Zealand. There’s lots of fresh air and a sense of accomplishment.

The API discussed above is among the worst design abominations imaginable. It is the epitome of Making My Problem Your Problem. Yet variants of it keep appearing all the time.

The antipatterns and problems in this one single function call would be enough to fill a book. Here are just some of them.

Loss of type safety

This is the big one. The arguments in the function call can be anything and the result can be anything. So which one of the following should you use:

library_do("set x", o, int_variable);
library_do("set x", o, &int_variable);
library_do("set x", o, double_variable);
library_do("set x", o, &double_variable);
library_do("set x", o, value_as_string)

You can’t really know without reading the documentation. Which you have to do every single time you use any function. If you are lucky, the calling convention is the same on every function. It probably is not. Since the compiler does not and can not verify correctness, what you essentially have is code that works either by luck or faith.

The only way to know for sure what to do is to read the source code of the implementation.

Loss of tools

There are a lot of nice tools to help you. Things such as IDE code autocompletion, API inspectors, Doxygen, even the compiler itself as discussed above.

If you go the generic route you throw away all of these tools. They account for dozens upon dozens of man-years just to make your job easier. All of that is gone. Poof!

Loss of debuggability

One symptom of this disease is putting data in dictionaries and other high level containers rather than variables to “allow easy expansion in the future”. This is workable in languages such as Java or Python, but not in C/C++. Here is screengrab from a gdb session demonstrating why this is a terrible idea:

(gdb) print map
$1 = {_M_t = {
    _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<No data fields>}, <No data fields>},
      _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {_M_color = std::_S_red, _M_parent = 0x607040,
        _M_left = 0x607040, _M_right = 0x607040}, _M_node_count = 1}}}

Your objects have now become undebuggable. Or at the very least extremely cumbersome, because you have to dig out the information you need one tedious step at a time. If the error is non-obvious, it’s source code diving time again.

Loss of performance

Functions are nice. They are type-safe, easy to understand and fast. The compiler might even inline them for you. Generic action operators are not.

Every single call to the library needs to first go through a long if/else tree to inspect which command was given or do a hash table lookup or something similar. This means that every single function call turns into a a massive blob of code that destroys branch prediction and pipelining and all those other wonderful things HW engineers have spent decades optimizing for you.

Loss of error-freeness

The code examples above have been too clean. They have ignored the error cases. Here’s two lines of code to illustrate the difference.

x = get_x(obj); // Can not possibly fail
status = library_do("get x", obj); // Anything can happen

Since the generic function can not provide any guarantees the way a function can, you have to always inspect the result it provides. Maybe you misspelled the command. Maybe this particular object does not have an x value. Maybe it used to but the library internals have changed (which was the point of all this, remember?). So the user has to inspect every single call even for operations that can not possibly fail. Because they can, they will, and if you don’t check, it is your fault!

Loss of consistency

When people are confronted with APIs such as these, the first thing they do is to write wrapper functions to hide the ugliness. Instead of a direct function call you end up with a massive generic invocation blob thingie that gets wrapped in a function call that is indistinguishable from the direct function call.

The end result is an abstraction layer covered by an anti-abstraction layer; a concretisation layer, if you will.

Several layers, actually, since every user will code their own wrapper with their own idiosyncrasies and bugs.

Loss of language features

Let’s say you want the x and y coordinates from an object. Usually you would use a struct. With a generic getter you can not, because a struct implies memory layout and thus is a part of API and ABI. Since we can’t have that, all arguments must be elementary data types, such as integers or strings. What you end up with are constructs such as this abomination here (error checking and the like omitted for sanity):

obj = RESULT_TO_POINTER(library_do("create FooObj", NULL);
library_do("set constructor argument a", obj, 0);
library_do("set constructor argument b", obj, "hello");
library_do("set constructor argument c", obj, 5L);
library_do("run constructor", obj)

Which is so much nicer than

object *obj = new_object(0, "hello", 5); // No need to cast to Long, the compiler does that automatically.

Bonus question: how many different potentially failing code paths can you find in the first code snippet and how much protective code do you need to write to handle all of them?

Where does it come from?

These sorts of APIs usually stem from their designers’ desire to “not limit choices needlessly”, or “make it flexible enough for any change in the future”. There are several different symptoms of this tendency, such as the inner platform effect, the second system effect and soft coding. The end result is usually a framework framework framework.

How can one avoid this trap? There is really no definitive answer, but there is a simple guideline to help you get there. Simply ask yourself: “Is this code solving the problem at hand in the most direct and obvious way possible?” If the answer is no, you probably need to change it. Sooner rather than later.

Things just working

I have a Macbook with a bcm4331 wireless chip that has not been supported in Linux. The driver was added to kernel 3.2. I was anxious to test this when I upgraded to precise.

After the update there was no net connection. The network indicator said “Missing firmware”. So I scoured the net and found the steps necessary to extract the firmware file to the correct directory.

I typed the command and pressed enter. That exact second my network indicator started blinking and a few seconds later it had connected.

Without any configuration, kernel module unloading/loading or “refresh state” button prodding.

It just worked. Automatically. As it should. And even before it worked it gave a sensible and correct error message.

To whoever coded this functionality: I salute you.

More uses for btrfs snapshots

I played around with btrfs snapshots and discovered two new interesting uses for them. The first one deals with unreliable operations. Suppose you want to update a largish SVN checkout but your net connection is slightly flaky. The reason can be anything, bad wires, overloaded server, electrical outages, and so on.

If SVN is interrupted mid-transfer, it will most likely leave your checkout in a non-consistent state that can’t be fixed even with ‘svn cleanup’. The common wisdom on the Internet is that the way to fix this is to delete or rename the erroneous directory and do a ‘svn update’, which will either work or not. With btrfs snapshots you can just do a snapshot of your source tree before the update. If it fails, just nuke the broken directory and restore your snapshot. Then try again. If it works, just get rid of the snapshot dir.

What you essentially gain are atomic operations on non-atomic tasks (such as svn update). This has been possible before with ‘cp -r’ or similar hacks, but they are slow. Btrfs snapshots can be done in the blink of an eye and they don’t take extra disk space.

The other use case is erroneous state preservation. Suppose you hack on your stuff and encounter a crashing bug in your tools (such as bzr or git). You file a bug on it and then get back to doing your own thing. A day or two later you get a reply on your bug report saying “what is the output of command X”. Since you don’t have the given directory tree state around any more, you can’t run the command.

But if you snapshot your broken tree and store it somewhere safe, you can run any analysis scripts on it any time in the future. Even possibly destructive ones, because you can always run the analysis scripts in a fresh snapshot. Earlier these things were not feasible because making copies took time and possibly lots of space. With snapshots they don’t.

Fun stuff with btrfs

I work on, among other things, Chromium. It uses SVN as its revision control system. There are several drawbacks to this, which are well known (no offline commits etc). They are made worse by Chromium’s enormous size. An ‘svn update’ can easily take over an hour.

Recently I looked into using btrfs’s features to make things easier. I found that with very little effort you can make things much more workable.

First you create a btrfs subvolume.

btrfs subvolume create chromium_upstream

Then you check out Chromium to this directory using the guidelines given in their wiki. Now you have a pristine upstream SVN checkout. Then build it once. No development is done in this directory. Instead we create a new directory for our work.

btrfs subvolume snapshot chromium_upstream chromium_feature_x

And roughly three seconds later you have a fresh copy of the entire source tree and the corresponding build tree. Any changes you make to individual files in the new directory won’t cause a total rebuild (which also takes hours). You can hack with complete peace of mind knowing that in the event of failure you can start over with two simple commands.

sudo btrfs subvolume delete chromium_feature_x
btrfs subvolume snapshot chromium_upstream chromium_feature_x

Chromium upstream changes quite rapidly, so keeping up with it with SVN can be tricky. But btrfs makes it easier.

cd chromium_upstream
gclient sync # Roughly analogous to svn update.
cd ..
btrfs subvolume snapshot chromium_upstream chromium_feature_x_v2
cd chromium_feature_x/src && svn diff > ../../thingy.patch && cd ../..
cd chromium_feature_x_v2/src && patch -p0 < ../../thingy.patch && cd ../..
sudo btrfs subvolume delete chromium_feature_

This approach can be taken with any tree of files: images, even multi-gigabyte video files. Thanks to btrfs’s design, multiple copies of these files take roughly the same amount of disk space as only one copy. It’s kind of like having backup/restore and revision control built into your file system.