Canonical Voices

Jussi Pakkanen

Every day computer users do a variation of a simple task: selecting an element from a list of choices. Examples include installing packages with ‘apt-get install packagename’, launching an application from the Dash with its name, selecting your country from a list on web pages and so on.

The common thing in all these use cases is intolerance for errors. If you have just one typo in your text, the correct choice will not be found. The only way around is to erase the query and type it again from scratch. This is something that people have learned to do without thinking.

It’s not very good usability, though. If the user searches for, say, Firefox by typing “friefox” by accident, surely the computer should be able to detect what the user meant and offer that as an alternative.

The first user-facing program in Ubuntu to offer this kind of error tolerance was the HUD. It used the Levenshtein distance as a way of determining user intent. In computer science terminology this is called approximate string matching or, informally, fuzzy matching.

Once the HUD was deployed, the need to have this kind of error correction everywhere became apparent. Thus we sent out to create a library to make error tolerant matching easy to embed. This library is called libcolumbus.

Technical info

Libcolumbus has been designed with the following goals in mind:

  • it must be small
  • it must be fast
  • it must be easy to embed
  • it is optimized for online typing

The last of these means that you can do queries at any time, even if the user is still typing.

At the core of libcolumbus is the Levenshtein distance algorithm. It is a well known and established way of doing fuzzy matching. Implementations are used in lots of different places, ranging from heavy duty document retrieval engines such as Lucene and
Xapian all the way down to Bash command completion. There is even a library that does fuzzy regexp matching.

What sets Columbus apart from these are two things, both of which are well known and documented but less often used: a fast search implementation and custom errors.

The first feature is about performance. The fast Levenshtein implementation in libcolumbus is taken almost verbatim from this public domain implementation. The main speedup comes from using a trie to store the words instead of iterating over all items on every query. As a rough estimate, a brute force implementation can do 50-100 queries a second with a data set of 3000 words. The trie version can do 600 queries/second on a data set of 50000 words.

The second feature is about quality of results. It is best illustrated with an example. Suppose there are two items to choose from, “abc” and “abp”. If the user types “abo”, which one of these should be chosen? In the classical Levenshtein sense both of the choices are identical: they are one replace operation away from the query string.

However from a usability point of view “abp” is the correct answer, because the letter p is right next to the letter o and very far from the letter c. The user probably meant to hit the key o but just missed it slightly. Libcolumbus allows you to set custom errors for these
kinds of substitutions. If the standard substitution error is 100, one could set the error for substitution error for adjacent keys to a smaller value, say 20. This causes words with “simple typos” to be ranked higher automatically.

There are several other uses for custom errors:

  • diacritical characters such as ê, é and è can be mapped to have very small errors to each other
  • fuzzy number pad typing can be enabled by assigning mapping errors from the number to corresponding letters (e.g. ’3′ to ‘d’, ‘e’ and ‘f’) as well as adjacent letters (i.e. those on number keys ’2′ and ’6′)
  • spam can be detected by assigning low errors for letters and numbers that look similar, such as ’1′ -> ‘i’ and ’4′ -> ‘a’ to match ‘v14gr4′ to ‘viagra’

Libcolumbus contains sample implementations for all these except for the last one. It also allows setting insert and delete errors at the beginning and end of the match. When set to low values this makes the algorithm do a fuzzy substring search. The online matching discussed above is implemented with this. It allows the library to match the query term “fier” to “firefox” very fast.

Get the code

Our goal for the coming cycle is to enable error tolerant matching in as many locations as possible. Those developers who wish to try it on their application can get the source code here.

The library is implemented in C++0x. The recommended API to use is the C++ one. However since many applications can not link in C++ libraries, we also provide a plain C API. It is not as extensive as the C++ one, but we hope to provide full coverage there too.

The main thing to understand is the data model. Libcolumbus deals in terms of documents. A document consists of a (user provided) document ID and a named collection of texts. The ID field is guaranteed to be large enough to hold a pointer. Here’s an example of what a document could look like:

id: 42
  name: packagename
  description: This package does something.

Each line is a single word field name followed by the text it contains. A document can contain an arbitrary number of fields.  This is roughly analogous to what MongoDB uses. It should be noted that libcolumbus does not read any data files. The user needs to create document objects programmatically. The example above is just a visualisation.

When the documents are created and passed to the main matcher object for processing, the system is ready to be queried. The result of queries is a list of document IDs and corresponding relevancies. Relevancy is just a number whose meaning is roughly “bigger relevancy means better”. The exact values are arbitrary and may change even between queries. End-user applications usually don’t need to bother with them.

There is one thing to be mindful, though. The current implementation has a memory backend only. Its memory usage is moderate but it has not yet been thoroughly optimized. If your data set size is a few hundred unique words, you probably don’t have to care. A few thousand takes around 5 MB which may be a problem in low memory
devices. Tens of thousands of words take tens of megabytes which may be too much for many use cases. Both memory optimizations and a disk backend are planned but for now you might want to stick to smallish data sets.

Read more
Jussi Pakkanen

There have been several posts in this blog about compile speed. However most have been about theory. This time it’s all about measurements.

I took the source code of Scribus, which is a largeish C++ application and looked at how much faster I could make it compile. There are three different configurations to test. The first one is building with default settings out of the box. The second one is about changes that can be done without changing any source code, meaning building with the Ninja backend instead of Make and using Gold instead of ld. The third configuration adds precompiled headers to the second configuration.

The measurements turned out to have lots of variance, which I could not really nail down. However it seemed to affect all configurations in the same way at the same time so the results should be comparable. All tests were run on a 4 core laptop with 4 GB of ram. Make was run with ‘-j 6′ as that is the default value of Ninja.

Default:    11-12 minutes
Ninja+Gold: ~9 minutes
PCH:        7 minutes

We can see that a bit of work the compile time can be cut almost in half. Enabling PCH does not require changing any existing source files (though you’ll get slightly better performance if you do). All in all it takes less than 100 lines of CMake code to enable precompiled headers, and half of that is duplicating some functionality that CMake should be exposing already. For further info, see this bug.

Is it upstream? Can I try it? Will it work on my project?

The patch is not upstreamed, because it is not yet clean enough. However you can check out most of it in this merge request to Unity. In Unity’s case the speedup was roughly 40%, though only one library build time was measured. The total build time impact is probably less.

Note that if you can’t just grab the code and expect magic speedups. You have to select which headers to precompile and so on.

Finally, for a well tuned code base, precompiled headers should only give around 10-20% speed boost. If you get more, it probably means that you have an #include maze in your header files. You should probably get that fixed sooner rather than later.

Read more
Jussi Pakkanen

A relatively large portion of software development time is not spent on writing, running, debugging or even designing code, but waiting for it to finish compiling. This is usually seen as necessary evil and accepted as an unfortunate fact of life. This is a shame, because spending some time optimizing the build system can yield quite dramatic productivity gains.

Suppose a build system takes some thirty seconds to run for even trivial changes. This means that even in theory you can do at most two changes a minute. In practice the rate is a lot lower. If the build step takes only a few seconds, trying out new code becomes a lot faster. It is easier to stay in the zone when you don’t have to pause every so often to wait for your tools to finish doing their thing.

Making fundamental changes in the code often triggers a complete rebuild. If this takes an hour or more (there are code bases that take 10+ hours to build), people try to avoid fundamental changes as much as possible. This causes loss of flexibility. It becomes very tempting to just do a band-aid tweak rather than thoroughly fix the issue at hand. If the entire rebuild could be done in five to ten minutes, this issue would become moot.

In order to make things fast, we first have to understand what is happening when C/C++ software is compiled. The steps are roughly as follows:

  1. Configuration
  2. Build tool startup
  3. Dependency checking
  4. Compilation
  5. Linking

We will now look at each step in more detail focusing on how they can be made faster.


This is the first step when starting to build. Usually means running a configure script or CMake, Gyp, SCons or some other tool. This can take anything from one second to several minutes for very large Autotools-based configure scripts.

This step happens relatively rarely. It only needs to be run when changing configurations or changing the build configuration. Short of changing build systems, there is not much to be done to make this step faster.

Build tool startup

This is what happens when you run make or click on the build icon on an IDE (which is usually an alias for make). The build tool binary starts and reads its configuration files as well as the build configuration, which are usually the same thing.

Depending on build complexity and size, this can take anywhere from a fraction of a second to several seconds. By itself this would not be so bad. Unfortunately most make-based build systems cause make to be invocated tens to hundreds of times for every single build. Usually this is caused by recursive use of make (which is bad).

It should be noted that the reason Make is so slow is not an implementation bug. The syntax of Makefiles has some quirks that make a really fast implementation all but impossible. This problem is even more noticeable when combined with the next step.

Dependency checking

Once the build tool has read its configuration, it has to determine what files have changed and which ones need to be recompiled. The configuration files contain a directed acyclic graph describing the build dependencies. This graph is usually built during the configure step. Suppose we have a file called which contains this line of code:

#include "OtherClass.hh"

This means that whenever OtherClass.hh changes, the build system needs to rebuild Usually this is done by comparing the timestamp of SomeClass.o against OtherClass.hh. If the object file is older than the source file or any header it includes, the source file is rebuilt.

Build tool startup time and the dependency scanner are run on every single build. Their combined runtime determines the lower bound on the edit-compile-debug cycle. For small projects this time is usually a few seconds or so. This is tolerable.

The problem is that Make scales terribly to large projects. As an example, running Make on the codebase of the Clang compiler with no changes takes over half a minute, even if everything is in cache. The sad truth is that in practice large projects can not be built fast with Make. They will be slow and there’s nothing that can be done about it.

There are alternatives to Make. The fastest of them is Ninja, which was built by Google engineers for Chromium. When run on the same Clang code as above it finishes in one second. The difference is even bigger when building Chromium. This is a massive boost in productivity, it’s one of those things that make the difference between tolerable and pleasant.

If you are using CMake or Gyp to build, just switch to their Ninja backends. You don’t have to change anything in the build files themselves, just enjoy the speed boost. Ninja is not packaged on most distributions, though, so you might have to install it yourself.

If you are using Autotools, you are forever married to Make. This is because the syntax of autotools is defined in terms of Make. There is no way to separate the two without a backwards compatibility breaking complete rewrite. What this means in practice is that Autotool build systems are slow by design, and can never be made fast.


At this point we finally invoke the compiler. Cutting some corners, here are the approximate steps taken.

  1. Merging includes
  2. Parsing the code
  3. Code generation/optimization

Let’s look at these one at a time. The explanations given below are not 100% accurate descriptions of what happens inside the compiler. They have been simplified to emphasize the facets important to this discussion. For a more thorough description, have a look at any compiler textbook.

The first step joins all source code in use into one clump. What happens is that whenever the compiler finds an include statement like #include “somefile.h”, it finds that particular source file and replaces the #include with the full contents of that file. If that file contained other #includes, they are inserted recursively. The end result is one big self-contained source file.

The next step is parsing. This means analyzing the source file, splitting it into tokens and building an abstract syntax tree. This step translates the human understandable source code into a computer understandable unambiguous format. It is what allows the compiler to understand what the user wants the code to do.

Code generation takes the syntax tree and transforms it into machine code sequences called object code. This code is almost ready to run on a CPU.

Each one of these steps can be slow. Let’s look at ways to make them faster.

Faster #includes

Including by itself is not slow, slowness comes from the cascade effect. Including even one other file causes everything included in it to be included as well. In the worst case every single source file depends on every header file. This means that touching any header file causes the recompilation of every source file whether they use that particular header’s contents or not.

Cutting down on interdependencies is straightforward. Only #include those headers that you actually use. In addition, header files must not include any other header files if at all possible. The main tool for this is called forward declaration. Basically what it means is that instead of having a header file that looks like this:

#include "SomeClass.hh"

class MyClass {
  SomeClass s;

You have this:

class SomeClass;

class MyClass {
  SomeClass *s;

Because the definition of SomeClass is not know, you have to use pointers or references to it in the header.

Remember that #including MyClass.hh would have caused SomeClass.hh and all its #includes to be added to the original source file. Now they aren’t, so the compiler’s work has been reduced. We also don’t have to recompile the users of MyClass if SomeClass changes. Cutting the dependency chain like this everywhere in the code base can have a major effect in build time, especially when combined with the next step. For a more detailed analysis including measurements and code, see here.

Faster parsing

The most popular C++ libraries, STL and Boost, are implemented as header only libraries. That is, they don’t have a dynamically linkable library but rather the code is generated anew into every binary file that uses them. Compared to most C++ code, STL and Boost are complex. Really, really complex. In fact they are most likely the hardest pieces of code a C++ compiler has to compile. Boost is often used as a stress test on C++ compilers, because it is so difficult to compile.

It is not an exaggeration to say that for most C++ code using STL, parsing the STL headers is up to 10 times slower than parsing all the rest. This leads to massively slow build times because of class headers like this:

#include <vector>

class SomeClass {
  vector<int> numbers;


As we learned in the previous chapter, this means that every single file that includes this header must parse STL’s vector definition, which is an internal implementation detail of SomeClass and even if they would not use vector themselves. Add some other class include that uses a map, one for unordered_map, a few Boost includes and what do you end up with? A code base where compiling any file requires parsing all of STL and possibly Boost. This is a factor of 3-10 slowdown on compile times.

Getting around this is relatively simple, though takes a bit of work. It is known as the pImpl idiom. One way of achieving it is this:


struct someClassPrivate;

class SomeClass {
  someClassPrivate *p;

---- implementation ---
#include <vector>
struct someClassPrivate {
  vector<int> numbers;

SomeClass::SomeClass() {
  p = new someClassPrivate;

SomeClass::~SomeClass() {
  delete p;

Now the dependency chain is cut and users of SomeClass don’t have to parse vector. As an added bonus the vector can be changed to a map or anything else without needing to recompile files that use SomeClass.

 Faster code generation

Code generation is mostly an implementation detail of the compiler, and there’s not much that can be done about it. There are a few ways to make it faster, though.

Optimizing code is slow. In every day development all optimizations should be disabled. Most build systems do this by default, but Autotools builds optimized binaries by default. In addition to being slow, this makes debugging a massive pain, because most of the time trying to print the value of some variable just prints out “value optimised out”.

Making Autotools build non-optimised binaries is relatively straightforward. You just have to run configure like this: ./configure CFLAGS=’O0 -g’ CXXFLAGS=’-O0 -g’. Unfortunately many people mangle their autotools cflags in config files so the above command might not work. In this case the only fix is to inspect all autotools config files and fix them yourself.

The other trick is about reducing the amount of generated code. If two different source files use vector<int>, the compiler has to generate the complete vector code in both of them. During linking (discussed in the next chapter) one of them is just discarded. There is a way to tell the compiler not to generate the code in the other file using a technique that was introduced in C++0x called extern templates. They are used like this.

file A:

#include <vector>
template class std::vector<int>;

void func() {
  std::vector<int> numbers;

file B:

#include <vector>
extern template class std::vector<int>;

void func2() {
  std::vector<int> idList;

This instructs the compiler not to generate vector code when compiling file B. The linker makes it use the code generated in file A.

Build speedup tools

CCache is an application that stores compiled object code into a global cache. If the same code is compiled again with the same compiler flags, it grabs the object file from the cache rather than running the compiler. If you have to recompile the same code multiple times, CCache may offer noticeable speedups.

A tool often mentioned alongside CCache is DistCC, which increases parallelism by spreading the build to many different machines. If you have a monster machine it may be worth it. On regular laptop/desktop machines the speed gains are minor (it might even be slower).

Precompiled headers

Precompiled headers is a feature of some C++ compilers that basically serializes the in-memory representation of parsed code into a binary file. This can then be read back directly to memory instead of reparsing the header file when used again. This is a feature that can provide massive speedups.

Out of all the speedup tricks listed in this post, this has by far the biggest payoff. It turns the massively slow STL includes into, effectively, no-ops.

So why is it not used anywhere?

Mostly it comes down to poor toolchain support. Precompiled headers are fickle beasts. For example with GCC they only work between two different compilation units if the compiler switches are exactly the same. Most people don’t know that precompiled headers exist, and those that do don’t want to deal with getting all the details right.

CMake does not have direct support for them. There are a few modules floating around the Internet, but I have not tested them myself. Autotools is extremely precompiled header hostile, because its syntax allows for wacky and dynamic alterations of compiler flags.

Faster Linking

When the compiler compiles a file and comes to a function call that is somewhere outside the current file, such as in the standard library or some other source file, it effectively writes a placeholder saying “at this point jump to function X”. The linker takes all these different compiled files and connects the jump points to their actual locations. When linking is done, the binary is ready to use.

Linking is surprisingly slow. It can easily take minutes on relatively large applications. As an extreme case, linking the Chromium browser on ARM takes 3 gigs of RAM and takes 18 hours.

Yes, hours.

The main reason for this is that the standard GNU linker is quite slow. Fortunately there is a new, faster linker called Gold. It is not the default linker yet, but hopefully it will be soon. In the mean time you can install and use it manually.

A different way of making linking faster is to simply cut down on these symbols using a technique called symbol visibility. The gist of it is that you hide all non-public symbols from the list of exported symbols. This means less work and memory use for the linker, which makes it faster.


Contrary to popular belief, compiling C++ is not actually all that slow. The STL is slow and most build tools used to compile C++ are slow. However there are faster tools and ways to mitigate the slow parts of the language.

Using them takes a bit of elbow grease, but the benefits are undeniable. Faster build times lead to happier developers, more agility and, eventually, better code.

Read more
Jussi Pakkanen

Say you start work on a new code base. Would you, as a user, rather have 90% or 10% of its API functions commented with Doxygen or something similar?

Using my psychic powers I suspect that you chose 90%.

It seems like the obvious thing. Lots of companies even have a mandate that all API functions (or >90% of them) must be documented. Not having comments is just bad. This seems like a perfectly obvious no-brainer issue.

But is it really?

Unfortunately there are some problems with this assumption. The main one being that the comments will be written by human beings. What they probably end up being is something like this.

 * Takes a foobar and frobnicates it.
 * @param f the foobar to be frobnicated.
 * @param strength how strongly to frobnicate.
 * @return the frobnicated result.
int frobnicate_foobar(Foobar f, int strength);

This something I like to call documentation by word order shuffle. Now we can ask the truly relevant question: what additional information does this kind of a comment provide?

The answer is, of course, absolutely nothing. It is only noise. No, actually it is even worse: it is noise that has a very large probability of being wrong. When some coder changes the function, it is very easy to forget to update the comments.

On the other hand, if only 10% of the functions are documented, most functions don’t have any comments, but the ones that do probably have something like this:

 * The Foobar argument must not have been initialized in a different
 * thread because that can lead to race conditions.
int frobnicate_foobar(Foobar f, int strength)

This is the kind of comment that is actually useful. Naturally it would be better to check for the specified condition inside the function but sometimes you can’t. Having it in a comment is the right thing to do in these cases. Not having tons of junk documentation makes these kinds of remarks stand out. This means, paradoxically, that having less comments leads to better documentation and user experience.

As a rough estimate, 95% of functions in any code base should be so simple and specific that their signature is all you need to use them. If they are not, API design has failed: back to the drawing board.

Read more
Jussi Pakkanen

Developing with the newest of the new packages is always a bit tricky. Every now and then they break in interesting ways. Sometimes they corrupt the system so much that downgrading becomes impossible. Extreme circumstances may corrupt the system’s package database and so on. Traditionally fixing this has meant reinstalling the entire system, which is unpleasant and time consuming. Fortunately there is now a better way: snapshotting the system with btrfs.

The following guide assumes that you are running btrfs as your root file system. Newest quantal can boot off of btrfs root, but there may be issues, so please read the documentation in the wiki.

The basic concept in snapshotting your system is called a subvolume. It is kind of like a subpartition inside the main btrfs partition. By default Ubuntu’s installer creates a btrfs root partition with two subvolumes called @ and @home. The first one of these is mounted as root and the latter as the home directory.

Suppose you are going to do something really risky, and want to preserve your system. First you mount the raw btrfs partition somewhere:

sudo mkdir /mnt/root
sudo mount /dev/sda1 /mnt/root
cd /mnt/root

Here /dev/sda1 is your root partition. You can mount it like this even though the subvolumes are already mounted. If you do an ls, you see two subdirectories, @ and @home. Snapshotting is simple:

sudo btrfs subvolume snapshot @ @snapshot-XXXX

This takes maybe on second and when the command returns the system is secured. You are now free to trash your system in whatever way you want, though you might want to unmount /mnt/root so you don’t accidentally destroy your snapshots.

Restoring the snapshot is just as simple. Mount /mnt/root again and do:

sudo mv @ @broken
sudo subvolume snapshot @snapshot-XXXX @

If you are sure you don’t need @snapshot-XXX any more, you can just rename it @. You can do this even if you are booted in the system, i.e. are using @ as your current system root fs.

Reboot your machine and your system has been restored to the state it was when running the snapshot command. As an added bonus your home directory does not rollback, but retains all changes made during the trashing, which is what you want most of the time. If you want to rollback home as well, just snapshot it at the same time as the root directory.

You can get rid of useless and broken snapshots with this command:

sudo btrfs subvolume delete @useless-snapshot

You can’t remove subvolumes with rm -r, even if run with sudo.

Read more
Jussi Pakkanen

The conventional wisdom in build systems is that GNU Autotools is the one true established standard and other ones are used only rarely.

But is this really true?

I created a script that downloads all original source packages from Ubuntu’s main pool. If there were multple versions of the same project, only the newest was chosen. Then I created a second script that goes through those packages and checks what build system they actually use. Here’s the breakdown:

CMake:           348     9%
Autofoo:        1618    45%
SCons:            10     0%
Ant:             149     4%
Maven:            41     1%
Distutil:        313     8%
Waf:               8     0%
Perl:            341     9%
Make(ish):       351     9%
Customconf:       45     1%
Unknown:         361    10%

Here Make(ish) means packages that don’t have any other build system, but do have a makefile. This usually indicates building via custom makefiles. Correspondingly customconf is for projects that don’t have any other build system, but have a configure file. This is usually a handwritten shell or Python file.

This data is skewed by the fact that the pool is just a jumble of packages. It would be interesting to run this analysis separately for precise, oneiric etc to see the progression over time. For truly interesting results you would run it against the whole of Debian.

The relative popularity of CMake and Autotools is roughly 20/80. This shows that Autotools is not the sole dominant player for C/C++ it once was. It’s still far and away the most popular, though.

The unknown set contains stuff such as Rake builds. I simply did not have time to add them all. It also has a lot of fonts, which makes sense, since you don’t really build those in the traditional sense.

The scripts can be downloaded here. A word of warning: to run the analysis you need to download 13 GB of source. Don’t do it just for the heck of it. The parser script does not download anything, it just produces a list of urls. Download the packages with wget -i.

Some orig packages are compressed with xz, which Python’s tarfile module can’t handle. You have to repack them yourself prior to running the analysis script.

Read more
Jussi Pakkanen

Steve Denning has held a wonderful presentation on how management should be be done in the 21st century. His main point is that instead of making money, the main goal of a company should be to delight its customers. He reasons that if the main goal is making money, it leads into a corporate structure that abhors innovation which makes the corporation vulnerable to agile startups.

The video is extremely recommended for everyone who deals with management in any way (including those who are being managed). The material in the presentation may seem very familiar to you either because it mirrors your own working experience or because you recognize the patterns it presents in other areas as well.

One well known piece of popular culture resonates very strongly with the presentation: the original Star Wars trilogy. The Galactic Empire is a good analogy of a large, established corporation that is being challenged by a small but nimble Rebel Alliance.

No need to plan, just do what you are told

Let us start our analysis by contrasting the meeting and decision making processes of the Empire and the Alliance. Probably the most famous meeting scene in the entire trilogy happens in Star Wars when Empire officials are discussing the rebellion and the lost Death Star plans. A simple overview of the meeting tells us that it will not be a successful one.

The meeting happens around one huge table. It is very probable that people can’t hear what participants on the other side of the table are saying. This is even more probable when you notice that most people are old generals who probably have poor hearing. They also look like they would rather be anywhere else than at the meeting. However the issues they are discussing are vital and the outcomes will shape the entire future of the Empire. More specifically the end of it.

As far as we can tell, the point of the meeting is to determine what to do should the Alliance really have a copy of the Death Star plans. All that everyone remembers of the meeting is Force choking. This is quite sad, because the issues raised are important ones. Lord Vader has not been able to find the lost tapes. Neither has he been able to find the rebel base. We also know that he has no real, workable plan to achieve these goals. Rather than work on the problem with a group of military experts Vader instead chooses to save his face by shooting the messenger.

The end result is that no actual work gets done. There are no contingency plans, no alternative approaches, nothing. The entire strategy of the Empire, the largest, most powerful organization in the galaxy, is It Will Work Because I Say It Will Work, now STFU and GBTW. It is also made painfully clear that questioning the choices of the Dear Leader is hazardous.

Compare this with the Rebel Alliance. All pieces of information and opinions are dealt with in a positive manner. When someone suspects that the targeting computer would not be able to hit the Death Star’s exhaust port, Luke does not attack him but rather gives a personal example showing how it is possible. When Han reports that a probable imperial probe droid has found them, the people in charge trust him and start working on evaluation. He could have gotten a response along the lines of do you have any idea how much resources we have spent to build this base, do you expect us to abandon all that based on a hunch. Even C-3PO, the lowest rung of the rebel ladder, can give his signal analysis results directly to the main command and his expertise is valued.

The Rebel Alliance is about solving problems, trust and agility. The Empire is about top-down leadership and management by mandates and shouting. We all know which one of them won in the end.

Failing with people: Management by Vader

If we view the Empire as a corporation then we can say that the Emperor is roughly the chairman of the board whereas Darth Vader is the CEO. He is in charge of the operational branches of the Empire. His decisions define the corporate culture. Let us examine his leadership using the battle of Hoth as a case study.

From a management point of view the largest conflict is between Darth Vader and Admiral Ozzel. Details on Ozzel’s military career are spotty but we can assume that he has gone through the Imperial Military/Space Navy/whatever Academy, gotten good grades, worked hard on his career and eventually has reached the rank of Admiral. Darth Vader, by comparison, is a whiny kid from a backwards desert planet with no formal military training and who has gotten his current rank through nepotism [1].

The conflict between these two comes to its peak when Vader feels that Ozzel has flown the fleet too close to the suspected rebel base. Let’s think about that for a second. The mission they are engaged in is basically a surprise attack. The goal is to catch the enemy unaware, crush them fast and prevent any escape attempts. This being the case flying in hyperspace as close as possible to the target planet and attacking immediately is the right thing to do. The alternative approach, and the one apparently preferred by Vader, would be to come out of hyperspace far from the planet and then fly slowly closer and attack. This strategy would have given the rebels ample time to jump every ship to hyperspace long before the Star Destroyers could have fired a single shot.

From a management point of view this single episode has many failures. First of all Vader did not trust his employees but rather started telling them what to do through micromanagement. Secondly even though he had a very clear vision on how the assault should be handled, he did not explain it to his underlings beforehand. He just magically assumed that they would do the right thing. Maybe he had forgotten that only Sith Lords have the ability to read people’s minds. Thirdly, when the fleet had left lightspeed, punishing Ozzel was the stupidest thing he could possibly do. The attack plan was in motion, nothing could change that. Any punishment should have happened only after the campaign. Summary executions in the middle of troop deployment only serves to weaken morale.

Kinda makes you wonder if the only reason Vader choked Ozzel was that he could be used as a scapegoat in case of failure.

This kind of behavior happens again and again through the trilogy. During the assault on Death Star, Vader explicitly tells professional TIE fighter pilots not to shoot at their targets, either because he wants all the glory to himself or because he thinks they are too stupid to hit anything. He orders the entire fleet inside an asteroid field causing billions of credits worth of damages and several thousand deaths. They could just have waited outside the asteroid field because it is well established that one can’t jump to hyperspace from inside the field. He also micromanages the search by demanding constant progress updates.

Come to think of it, almost every single management and executive decision Vader makes is wrong. He would have run the entire Empire down to the ground even if he hadn’t killed the Emperor. In corporations this kind of manager is unfortunately all too common. The higher up the chain he is, the more damage he can do. If he holds major amounts of stock, things are even worse because then he becomes really hard to get rid of.

Motivating the masses: the case of Stormtrooper apathy

One of the most ridiculed aspects of the Star Wars trilogy are the stormtroopers and especially their shooting accuracy. In corporations stormtroopers correspond to regular low level workers. The ones that actually get all the grunt work done. If you compare stormtroopers to rebel fighters, you find that rebel forces are consistently better. They shoot more accurately, have more imagination and just generally get things done better.

One might speculate that this is because all the top talent goes to the Rebel Alliance because it is the hot new cool stuff. In reality they are both recruiting from the same talent pool. Moreover it can be speculated that the best of the best of the best would go to the most glamorous and prestigious schools i.e. The Imperial Academy. Why would they instead join, effectively, a terrorist cell with a very low life expectancy unless they have a personal bone to pick with the Empire. [2]

The basic skill level of a stormtrooper is pretty much the same as the average Joe in the rebel alliance. And yet they perform terribly. As an example, let’s examine the scene in Star Wars just after our heroes have escaped from the trash compactor. They run into a group of seven stormtroopers. A few shots are fired and the entire group starts running away. If one examines the footage closely, at the time they start their retreat they can only see Han and bits of princess Leia. Luke and Chewie are behind a wall.

Think about that for a while. These are professional soldiers that come across some hippie and a girl. They are armed with deadly force, are specifically trained and have massive strength in numbers. Yet their instinctive decision is to run away. It’s kind of like having police officers who hide in their parent’s basements whenever they hear that a crime has been committed.

What could be the reason for this? The answer is simple: motivation. The common man inside the stormtrooper uniform probably does not care about the goals of the Empire. He just wants to get his paycheck and go home. What he really does not want is to get killed in any way. If you look at the behavior and motivation of stormtroopers throughout the series, doing everything possible not to get killed is pretty high on the list. Underachieving in their every day tasks is part of this because success means promotion, which means bigger probability of dealing with Vader, which in turn means higher probability of death by random Force choking than death in battle.

If this is the structure of your organization, the question is not why aren’t the workers performing well. The question is why would any sane person want to perform well.

There is one reason. We can deduce that by examining the cases where stormtroopers behave like an actual, efficient, deadly fighting force. There aren’t many of these, but let’s start with the beginning of Star Wars, the assault on princess Leia’s Corellian cruiser. The assault force knows what they are doing, shoot accurately, and take over the ship very efficiently.

There are a few other cases where this happens as well, but they are quite rare. There is one thing they have in common, though. The troops perform well only when Vader is personally overseeing them. This is classic management by fear. Every single troop knows that if they fail, they will get force choked to death. They might get choked even if they do just ok, just to set an example. So they really do their best.

The biggest problem with this approach is that it does not scale. Vader can’t be everywhere. Things work fine when he’s there. When he leaves, an entire legion of his best soldiers gets defeated by a dozen teddy bears with stone age technology.

Actually, let me take that back. The biggest problem is not the lack of scalability. The biggest problem is that Vader probably thinks that his troops are truly the best of the best. Why wouldn’t he? Whenever he is around, things work smoothly and efficiently. Who’s going to tell him that his so called elite troops are in fact complete garbage? Captain Needa?

This is the reason companies like Toyota and Google thrive. They care about their employees. They want them to participate in the decision making process. They want them to be part of the family, so to say, rather than being resources to be shifted around, shouted at and and summarily executed (though I don’t think any Fortune 500 company does executions at the present time).

The meaning of (a company’s) life

The main thesis of Steve Denning’s presentation is that the common view that a company’s purpose is to make money is flawed. Instead they should be delighting their customers. Making money is a result, not the goal. With that in mind, let’s ask a simple question.

What is the ultimate purpose of the Empire?

We hear very little about their goals on health care or education. As far as we can tell, the Empire is only the manifestation of the Emperor’s lust for power. He doesn’t care about the people. His only interest is in the power trip he gets from bossing them around. Just like certain corporations see their customers only as sponges to squeeze as much money out of as possible.

There are consequences.

When Luke talks to Obi-Wan for the first time he says “It’s not that I like the Empire, I hate it. But there’s nothing I can do about it now.” His delivery seems to indicate that this is a common attitude towards the Empire.

Remind of any corporations you know?

If we accept the special editions as canon, once the Emperor died, people started spontaneously partying in the streets, knocking down statues and shooting fireworks. After the tipping point everyone dropped the Empire like it was going out of fashion. One imagines that even people high up on the Empire’s chain of command would go around stating how they have always secretly supported the goals of the Rebellion.

The world is full of companies that have used their dominant position to extract money with inferior products. They have focused on cost cutting and profit maximisation rather than improving their customers’ lives. And they have been successful for a while. Once there has been a competitor that do cares about these things, the dominant player has usually collapsed. For an example see what has happened to Nokia after the release of the iPhone.

The only protection against collapse is to make your customers consistently happy. Should someone come out tomorrow with a new magical superphone that is up to 90% better than iPhone. Would current iPhone users switch out in masses? No they would not, because they have bought their current phone because it was the best for them, the one they really wanted. Not because it was the “crappy-but-only-possible” choice.

If your company is producing products of the latter type, your days are already numbered, but you just don’t know it yet. Just when you think you are at the height of your power, someone will grab you without warning and throw you over a railing. Most likely you will blame your failure on them. But you are wrong. You have brought your downfall on yourself.

Also, you are dead.


[1] Assuming that the Force Darth Sidious uses to inseminate Shmi Skywalker comes from himself. Somehow. In a way I don’t really want to know.

[2] The prequels seem to indicate that stormtroopers are clones. However that is probably not the case anymore in the time frame of the original trilogy. The original clones all spoke with the same voice. Stormtroopers speak with different voices. There are also variations in size and behavior. If they were clones, i.e. dispensable cannon fodder, it would make even less sense for them to be concerned about self-preservation.

Read more
Jussi Pakkanen

Today’s API design fail case study is Iconv. It is a library designed to convert text from one encoding to another. The basic API is very simple as it has only three function calls. Unfortunately two of them are wrong.

Let’s start with the initialisation. It looks like this:

iconv_t iconv_open(const char *tocode, const char *fromcode);

Having the tocode argument before the fromcode argument is wrong. It goes against the natural ordering that people have of the world. You convert from something to something and not to something from something. If you don’t believe me, go back and read the second sentence of this post. Notice how it was completely natural to you and should you try to change the word order in your mind, it will seem contrived and weird.

But let’s give this the benefit of the doubt. Maybe there is a good reason for having the order like this. Suppose the library was meant to be used only by people writing RPN calculators in Intel syntax assembly using the Curses graphics library. With that in mind, let’s move on to the second function as it is described in the documentation.

size_t iconv (iconv_t cd, const char* * inbuf, size_t * inbytesleft,
 char* * outbuf, size_t * outbytesleft); 

In this function the order is the opposite: source comes before target. Having the order backwards is bad, but having an inconsistent API such as this is inexcusable.

But wait, there is more!

If you look at the actual installed header file, this is not the API it actually provides. The second argument is not const in the implementation. So either you strdup your input string to keep it safe or cast away your const and hope/pray that the implementation does not fiddle around with it.

The API function is also needlessly complex, taking pointers to pointers and so on. This makes the common case of I have this string here and I want to convert it to this other string here terribly convoluted. It causes totally reasonable code like this to break.

char *str = read_from_file_or_somewhere();
iconv(i, &str, size_str, &outbuf, size_outbuf);

Iconv will change where str points to and if it was your only pointer to the data (which is very common) you have just lost access to it. To get around this you have to instantiate a new dummy pointer variable and pass that to iconv. If you don’t and try to use the mutilated pointers to, say, deallocate a temporary buffer you get interesting and magical crashes.

Passing the conversion types to iconv_open as strings is also tedious. You can never tell if your converter will work or not. If it fails, Iconv will not tell you why. Maybe you have a typo. Maybe this encoding has magically disappeared in this version. For this reason the encoding types should be declared in an enum. If there are very rare encodings that don’t get built on all platforms, there should be a function to query their existence.

A better API for iconv would take the current conversion function and rename it to iconv_advanced or something. The basic iconv function (the one 95% of people use 95% of the time) should look something like this:

int iconv(encoding fromEncoding, encoding toEncoding,
  errorBehaviour eb,
  const char *source, size_t sourceSize,
  char *target, size_t targetSize);

ErrorBehaviour tells what to do when encountering errors (ignore, stop, etc). The return value could be total number of characters converted or some kind of an error code. Alternatively it could allocate the target buffer by itself, possibly with a user defined allocator function.

The downside of this function is that it takes 7 arguments, which is a bit too much. The first three could be stored in an iconv_t type for clarity.

Read more
Jussi Pakkanen

We all know that compiling C++ is slow.

Fewer people know why, or how to make it faster. Other people do, for example the developers at Remedy made the engine of Alan Wake compile from scratch in five minutes. The payoff for this is increased productivity, because the edit-compile-run cycle gets dramatically faster.

There are several ways to speed up your compiles. This post looks at reworking your #includes.

Quite a bit of C++ compilation time is spent parsing headers for STL, Qt and whatever else you may be using. But how long does it actually take?

To find out, I wrote a script to generate C++ source. You can download it here. What it does is generate source files that have some includes and one dummy function. The point is to simulate two different use cases. In the first each source file includes a random subset of the includes. One file might use std::map and QtCore, another one might use Boost’s strings and so on. In the second case all possible includes are put in a common header which all source files include. This simulates “maximum developer convenience” where all functions are available in all files without any extra effort.

To generate the test data, we run the following commands:

mkdir good bad
./ --with-boost --with-qt4 good
./ --with-boost --with-qt4 --all-common bad

Compilation is straightforward:

cd good; cmake .; time make; cd ..
cd bad; cmake .; time make; cd ..

By default the script produces 100 source files. When the includes are kept in individual files, compiling takes roughly a minute. When they are in a common header, it takes three minutes.

Remember: the included STL/Boost/Qt4 functionality is not used in the code. This is just the time spent including and parsing their headers. What this example shows is that you can remove 2 minutes of your build time, just by including C++ headers smartly.

The delay scales linearly. For 300 files the build times are 2 minutes 40 seconds and 7 minutes 58 seconds. That’s over five minutes lost on, effectively, no-ops. The good news is that getting rid of this bloat is relatively easy, though it might take some sweat.

  1. Never include any (internal) header in another header if you can use a forward declaration. Include the header in the implementation file.
  2. Never include system headers (STL, etc) in your headers unless absolutely necessary, such as due to inheritance. If your class uses e.g. std::map internally, hide it with pImpl. If your class API requires these headers, change it so that it doesn’t or use something more lightweight (e.g. std::iterator instead of std::vector).
  3. Never, never, ever include system stuff in your public headers. That slows down not just your own compilation time, but also every single user of your library. The only exception is when your library is a plugin or extension to an existing library and even then your includes need to be minimal.

Read more
Jussi Pakkanen

The main point of open source is that anyone can send patches to improve projects. This, of course, is very damaging to the Super Ego of the head Cowboy Coder in charge. Usually this means that he has to read patch, analyze it, understand it, and then write a meaningful rejection email.

Or you could just use one of the strategies below. They give you tools to reject any patch with ease.

The Critical Resource

Find any increase in resources (no matter how tiny or contrived) and claim that to be a the most scarce thing in the universe. Then reject due to increased usage.

A sample discussion might go something like this:

- Here’s a patch that adds a cache for recent results making the expensive operation 300% faster.

- This causes an increase in memory usage which is unacceptable.

- The current footprint is 50 MB, this cache only adds less than 10k and the common target machine running this app has 2GB of memory.

- You are too stupid to understand memory optimisation. Go away.

 The suffering minority

When faced with a patch that makes things better for 99.9% of the cases and slightly worse for the rest, focus only on the 0.01%. Never comment on the majority. Your replies must only ever discuss the one group you (pretend to) care about.

- I have invented this thing called the auto-mobile. This makes it easier for factory workers to come to work every morning.

- But what about those that live right next to the factory? Requiring them to purchase and maintain auto-mobiles is a totally unacceptable burden.

- No-one is forcing anyone. Every employer is free to obtain their own auto-mobiles if they so choose.

- SILENCE! I will not have you repress my workers!

 The Not Good  Enough

Think up a performance requirement that the new code does not fulfill. Reject. If the submitter makes a new patch which does meet the requirement, just make it stricter until they give up.

- This patch drops the average time from 100 ms to 30 ms.

- We have a hard requirement that the operation must take only 10 ms. This patch is too slow, so rejecting.

- But the current code does not reach that either, and this patch gets us closer to the requirement.

- No! Not fast enough! Not going in.

 The Prevents Portability

Find any advanced feature. Reject based this feature not being widely available and thus increases the maintenance burden.

- Here is a patch to fix issue foo.

- This patch uses compiler feature bar, which is not always available.

- It has been available in every single compiler in the world since 1987.

- And what if we need to compile with a compiler from 1986? What then, mr smartypants? Hmmm?

The Does not Cure World Hunger

This approach judges the patch not on what actually is, but rather what it is not. Think of a requirement, no matter how crazy or irrelevant, and reject.

- This patch will speed up email processing by 4%.

- Does it prevent every spammer in the world from sending spam, even from machines not running our software?

- No.

- How dare you waste my time with this kind of useless-in-the-grand-scheme-of-things patch!

The absolute silence

This is arguably the easiest. Never, ever reply to any patches you don’t care about. Eventually the submitter gives up and goes away all by himself.

Read more
Jussi Pakkanen

What currently happens when you drag two fingers on a touchpad is that the X server intercepts those touches and sends mouse wheel events to applications. The semantics of a mouse wheel event are roughly “move down/up three lines”. This is jerky and not very pleasant. There has been no way of doing pixel perfect scrolling.

With the recent work on X multitouch and the uTouch gesture stack, smoothness has now become possible. Witness pixel accurate scrolling in Chromium in this Youtube video.

The remaining jerkiness in the video is mainly caused by Chromium redrawing its window contents from scratch whenever the viewport is moved.

The code is available in Chromium’s code review site.

Read more
Jussi Pakkanen

The most common step in creating software is building it. Usually this means running make or equivalent and waiting. This step is so universal that most people don’t even think about it actively. If one were to see what the computer is doing during build, one would see compiler processes taking 100% of the machine’s CPUs. Thus the system is working as fast as it possibly can.


Some people working on Chromium doubted this and built their own replacement of Make called Ninja. It is basically the same as Make: you specify a list of dependencies and then tell it to build something. Since Make is one of the most used applications in the world and has been under development since the 70s, surely it is as fast as it can possibly be done.


Well, let’s find out. Chromium uses a build system called Gyp that generates makefiles. Chromium devs have created a Ninja backend for Gyp. This makes comparing the two extremely easy.

Compiling Chromium from scratch on a dual core desktop machine with makefiles takes around 90 minutes. Ninja builds it in less than an hour. A quad core machine builds Chromium in ~70 minutes. Ninja takes ~40 minutes. Running make on a tree with no changes at all takes 3 minutes. Ninja takes 3 seconds.

So not only is Ninja faster than Make, it is faster by a huge margin and especially on the use case that matters for the average developer: small incremental changes.

What can we learn from this?

There is an old (and very wise) saying that you should never optimize before you measure. In this case the measurement seemed to indicate that nothing was to be done: CPU load was already maximized by the compiler processes. But sometimes your tools give you misleading data. Sometimes they lie to you. Sometimes the “common knowledge” of the entire development community is wrong. Sometimes you just have to do the stupid, irrational, waste-of-time -thingie.

This is called progress.

PS I made quick-n-dirty packages of a Ninja git checkout from a few days ago and put them in my PPA. Feel free to try them out. There is also an experimental CMake backend for Ninja so anyone with a CMake project can easily try what kind of a speedup they would get.

Read more
Jussi Pakkanen

One of the most annoying things about creating shared libraries for other people to use is API and ABI stability. You start going somewhere, make a release and then realize that you have to totally change the internals of the library. But you can’t remove functions, because that would break existing apps. Nor can you change structs, the meanings of fields or any other maintenance task to make your job easier. The only bright spot in the horizont is that eventually you can do a major release and break compatibility.

We’ve all been there and it sucks. If you choose to ignore stability because, say, you have only a few users who can just recompile their stuff, you get into endless rebuild cycles and so on. But what if there was a way to eliminate all this in one, swift, elegant stroke?

Well, there is.

Essentially every single library can be reduced to one simple function call that looks kind of like this.

library_result library_do(const char *command, library_object *obj, ...)

The command argument tells the library what to do. The arguments tell it what to do it to and the result tells what happened. Easy as pie!

So, to use a car analogy, here’s an example of how you would start a car.

library_object *car;
library_result result = library_do("initialize car", NULL);
car = RESULT_TO_POINTER(result);
library_do("start engine", car);
library_do("push accelerometer", car);

Now you have a moving car and you have also completely isolated the app from the library using an API that will never need to be changed. It is perfectly forwards, backwards and sideways compatible.

And it gets better. You can query capabilities on the fly and act accordingly.

if(RESULT_TO_BOOLEAN(library_do("has automatic transmission", car))

Dynamic detection of features and changing behavior based on them makes apps work with every version of the library ever. The car could even be changed into a moped, tractor, or a space shuttle and it would still work.

For added convenience the basic commands could be given as constant strings in the library’s header file.

Deeper analysis

If you, dear reader, after reading the above text thought, even for one microsecond, that the described system sounds like a good idea, then you need to stop programming immediately.


Take your hands away from the keyboard and just walk away. As an alternative I suggest taking up sheep farming in New Zealand. There’s lots of fresh air and a sense of accomplishment.

The API discussed above is among the worst design abominations imaginable. It is the epitome of Making My Problem Your Problem. Yet variants of it keep appearing all the time.

The antipatterns and problems in this one single function call would be enough to fill a book. Here are just some of them.

Loss of type safety

This is the big one. The arguments in the function call can be anything and the result can be anything. So which one of the following should you use:

library_do("set x", o, int_variable);
library_do("set x", o, &int_variable);
library_do("set x", o, double_variable);
library_do("set x", o, &double_variable);
library_do("set x", o, value_as_string)

You can’t really know without reading the documentation. Which you have to do every single time you use any function. If you are lucky, the calling convention is the same on every function. It probably is not. Since the compiler does not and can not verify correctness, what you essentially have is code that works either by luck or faith.

The only way to know for sure what to do is to read the source code of the implementation.

Loss of tools

There are a lot of nice tools to help you. Things such as IDE code autocompletion, API inspectors, Doxygen, even the compiler itself as discussed above.

If you go the generic route you throw away all of these tools. They account for dozens upon dozens of man-years just to make your job easier. All of that is gone. Poof!

Loss of debuggability

One symptom of this disease is putting data in dictionaries and other high level containers rather than variables to “allow easy expansion in the future”. This is workable in languages such as Java or Python, but not in C/C++. Here is screengrab from a gdb session demonstrating why this is a terrible idea:

(gdb) print map
$1 = {_M_t = {
    _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<No data fields>}, <No data fields>},
      _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {_M_color = std::_S_red, _M_parent = 0x607040,
        _M_left = 0x607040, _M_right = 0x607040}, _M_node_count = 1}}}

Your objects have now become undebuggable. Or at the very least extremely cumbersome, because you have to dig out the information you need one tedious step at a time. If the error is non-obvious, it’s source code diving time again.

Loss of performance

Functions are nice. They are type-safe, easy to understand and fast. The compiler might even inline them for you. Generic action operators are not.

Every single call to the library needs to first go through a long if/else tree to inspect which command was given or do a hash table lookup or something similar. This means that every single function call turns into a a massive blob of code that destroys branch prediction and pipelining and all those other wonderful things HW engineers have spent decades optimizing for you.

Loss of error-freeness

The code examples above have been too clean. They have ignored the error cases. Here’s two lines of code to illustrate the difference.

x = get_x(obj); // Can not possibly fail
status = library_do("get x", obj); // Anything can happen

Since the generic function can not provide any guarantees the way a function can, you have to always inspect the result it provides. Maybe you misspelled the command. Maybe this particular object does not have an x value. Maybe it used to but the library internals have changed (which was the point of all this, remember?). So the user has to inspect every single call even for operations that can not possibly fail. Because they can, they will, and if you don’t check, it is your fault!

Loss of consistency

When people are confronted with APIs such as these, the first thing they do is to write wrapper functions to hide the ugliness. Instead of a direct function call you end up with a massive generic invocation blob thingie that gets wrapped in a function call that is indistinguishable from the direct function call.

The end result is an abstraction layer covered by an anti-abstraction layer; a concretisation layer, if you will.

Several layers, actually, since every user will code their own wrapper with their own idiosyncrasies and bugs.

Loss of language features

Let’s say you want the x and y coordinates from an object. Usually you would use a struct. With a generic getter you can not, because a struct implies memory layout and thus is a part of API and ABI. Since we can’t have that, all arguments must be elementary data types, such as integers or strings. What you end up with are constructs such as this abomination here (error checking and the like omitted for sanity):

obj = RESULT_TO_POINTER(library_do("create FooObj", NULL);
library_do("set constructor argument a", obj, 0);
library_do("set constructor argument b", obj, "hello");
library_do("set constructor argument c", obj, 5L);
library_do("run constructor", obj)

Which is so much nicer than

object *obj = new_object(0, "hello", 5); // No need to cast to Long, the compiler does that automatically.

Bonus question: how many different potentially failing code paths can you find in the first code snippet and how much protective code do you need to write to handle all of them?

Where does it come from?

These sorts of APIs usually stem from their designers’ desire to “not limit choices needlessly”, or “make it flexible enough for any change in the future”. There are several different symptoms of this tendency, such as the inner platform effect, the second system effect and soft coding. The end result is usually a framework framework framework.

How can one avoid this trap? There is really no definitive answer, but there is a simple guideline to help you get there. Simply ask yourself: “Is this code solving the problem at hand in the most direct and obvious way possible?” If the answer is no, you probably need to change it. Sooner rather than later.

Read more
Jussi Pakkanen

Things just working

I have a Macbook with a bcm4331 wireless chip that has not been supported in Linux. The driver was added to kernel 3.2. I was anxious to test this when I upgraded to precise.

After the update there was no net connection. The network indicator said “Missing firmware”. So I scoured the net and found the steps necessary to extract the firmware file to the correct directory.

I typed the command and pressed enter. That exact second my network indicator started blinking and a few seconds later it had connected.

Without any configuration, kernel module unloading/loading or “refresh state” button prodding.

It just worked. Automatically. As it should. And even before it worked it gave a sensible and correct error message.

To whoever coded this functionality: I salute you.

Read more
Jussi Pakkanen

I played around with btrfs snapshots and discovered two new interesting uses for them. The first one deals with unreliable operations. Suppose you want to update a largish SVN checkout but your net connection is slightly flaky. The reason can be anything, bad wires, overloaded server, electrical outages, and so on.

If SVN is interrupted mid-transfer, it will most likely leave your checkout in a non-consistent state that can’t be fixed even with ‘svn cleanup’. The common wisdom on the Internet is that the way to fix this is to delete or rename the erroneous directory and do a ‘svn update’, which will either work or not. With btrfs snapshots you can just do a snapshot of your source tree before the update. If it fails, just nuke the broken directory and restore your snapshot. Then try again. If it works, just get rid of the snapshot dir.

What you essentially gain are atomic operations on non-atomic tasks (such as svn update). This has been possible before with ‘cp -r’ or similar hacks, but they are slow. Btrfs snapshots can be done in the blink of an eye and they don’t take extra disk space.

The other use case is erroneous state preservation. Suppose you hack on your stuff and encounter a crashing bug in your tools (such as bzr or git). You file a bug on it and then get back to doing your own thing. A day or two later you get a reply on your bug report saying “what is the output of command X”. Since you don’t have the given directory tree state around any more, you can’t run the command.

But if you snapshot your broken tree and store it somewhere safe, you can run any analysis scripts on it any time in the future. Even possibly destructive ones, because you can always run the analysis scripts in a fresh snapshot. Earlier these things were not feasible because making copies took time and possibly lots of space. With snapshots they don’t.

Read more
Jussi Pakkanen

I work on, among other things, Chromium. It uses SVN as its revision control system. There are several drawbacks to this, which are well known (no offline commits etc). They are made worse by Chromium’s enormous size. An ‘svn update’ can easily take over an hour.

Recently I looked into using btrfs’s features to make things easier. I found that with very little effort you can make things much more workable.

First you create a btrfs subvolume.

btrfs subvolume create chromium_upstream

Then you check out Chromium to this directory using the guidelines given in their wiki. Now you have a pristine upstream SVN checkout. Then build it once. No development is done in this directory. Instead we create a new directory for our work.

btrfs subvolume snapshot chromium_upstream chromium_feature_x

And roughly three seconds later you have a fresh copy of the entire source tree and the corresponding build tree. Any changes you make to individual files in the new directory won’t cause a total rebuild (which also takes hours). You can hack with complete peace of mind knowing that in the event of failure you can start over with two simple commands.

sudo btrfs subvolume delete chromium_feature_x
btrfs subvolume snapshot chromium_upstream chromium_feature_x

Chromium upstream changes quite rapidly, so keeping up with it with SVN can be tricky. But btrfs makes it easier.

cd chromium_upstream
gclient sync # Roughly analogous to svn update.
cd ..
btrfs subvolume snapshot chromium_upstream chromium_feature_x_v2
cd chromium_feature_x/src && svn diff > ../../thingy.patch && cd ../..
cd chromium_feature_x_v2/src && patch -p0 < ../../thingy.patch && cd ../..
sudo btrfs subvolume delete chromium_feature_

This approach can be taken with any tree of files: images, even multi-gigabyte video files. Thanks to btrfs’s design, multiple copies of these files take roughly the same amount of disk space as only one copy. It’s kind of like having backup/restore and revision control built into your file system.

Read more
Jussi Pakkanen

The four stages of command entry

Almost immediately after the first computers were invented, people wanted them to do as they command. This process has gone through four distinct phases.

The command line

This was the original way. The user types his command in its entirety and presses enter. The computer then parses it and does what it is told. There was no indication on whether the written command was correct or not. The only way to test it was to execute it.

Command completion

An improvement to writing the correct command. The user types in a few letters from the start of the desired command or file name and presses tab. If there is only one choice that begins with those letters, the system autofills the rest. Modern autocompletion systems can fill in command line arguments, host names and so on.

Live preview

This is perhaps best known from IDEs. When the user types some letters, the IDE presents all choices that correspond to those letters in a pop up window below the cursor. The user can then select one of them or keep writing. Internet search sites also do this.

Live preview with error correction

One thing in common with all the previous approaches is that the input must be perfect. If you search for Firefox but accidentally type in “ifrefox”, the systems return zero matches. Error correcting systems try to find what the user wants even if the input contains errors. This is a relatively new approach, with examples including Unity’s new HUD and Google’s search (though the live preview does not seem to do error correction).

The future

What is the next phase in command entry? I really have no idea, but I’m looking forward to seeing it.

Read more
Jussi Pakkanen

Complexity kills

The biggest source of developer headache is complexity. Specifically unexpected complexity. The kind that pops out of nowhere from the simplest of settings and makes you rip your hair out.

As an example, here is a partial and simplified state machine for what should happen when using a laptop’s trackpad.

If you have an idea of what should happen in the states marked “WTF?”, do send me email.

Read more
Jussi Pakkanen

What is worse than having a problem?

The only thing worse than having a problem is having a poor solution to a problem.


Because that prevents a good solution from being worked out. The usual symptom of this is having a complicated and brittle Rube Goldberg machine to do something that really should be a just simpler. It’s just that nobody bothers to do the Right Thing, because the solution that we have almost kinda, sorta works most of the time so there’s nothing to worry about, really.

Some examples include the following:

  • X used to come with a configurator application that would examine your hardware and print a conf file, which you could then copy over (or merge with) the existing conf file. Nowadays X does the probing automatically.
  • X clipboard was a complete clusterf*ck, but since middle button paste mostly worked it was not seen as an issue.
  • The world is filled with shell script fragments with the description “I needed this for something long ago, but I don’t remember the reason any more and am afraid to remove it”.
  • Floppies (remember those?) could be ejected without unmounting them causing corruption and other fun.

How can you tell when you have hit one of these issues? One sign is that you get one of the following responses:

  • “Oh, that’s a bit unfortunate. But if you do [complicate series of steps] it should work.”
  • “You have to do X before you do Y. Otherwise it just gets confused.”
  • “It does not do X, but you can do almost the same with [complicated series of steps] though watch out for [long list of exceptions].”
  • “Of course it will fail [silently] if you don’t have X. What else could it do?”
  • “You ran it with incorrect parameters. Just delete all your configuration files [even the hidden ones] and start over.”

If you ever find yourself in the situation of getting this kind of advice, or, even worse, giving it out to other people, please consider just spending some effort to fixing the issue properly. You will be loved and adored if you do.

Read more
Jussi Pakkanen

You know how we laugh at users of some other OS’s for running random binary files they get from the Internet.

Well we do it as well. And instead of doing it on our personal machines, we do it on those servers that run our most critical infrastructure?

Here is a simple step by step plan that you can use to take over all Linux distributions’ master servers.

  1. Create a free software project. It can be anything at all.
  2. Have it included in the distros you care about.
  3. Create/buy a local exploit trojan.
  4. Create a new minor release of your project.
  5. Put your trojan inside the generated configure script
  6. Boom! You have now rooted the build machines (with signing keys etc) of every single distro.

Why does this exploit work? Because configure is essentially an uninspectable blob of binary code. No-one is going to audit that code and the default packaging scripts use configure scripts blindly if they exist.

Trojans in configure scripts have been found in the wild.

So not only are the Autotools a horrible build system, they are also a massive security hole. By design.

Post scriptum: A simple fix to this is to always generate the configure script yourself rather than using the one that comes with the tarball. But then you lose the main advantage of Autotools: that you don’t need special software installed on the build machine.

Read more