Build speed again: intuition lies to you

The problem

Suppose you have a machine with 8 cores. Also suppose you have the following source packages that you want to compile from scratch.

eog_3.8.2.orig.tar.xz
grilo_0.2.6.orig.tar.xz
libxml++2.6_2.36.0.orig.tar.xz
Python-3.3.2.tar.bz2
glib-2.36.4.tar.xz
libjpeg-turbo_1.3.0.orig.tar.gz
wget_1.14.orig.tar.gz
grail-3.1.0.tar.bz2
libsoup2.4_2.42.2.orig.tar.xz

You want to achieve this as fast as possible. How would you do it?

Think carefully before proceeding.

The solution

Most of you probably came up with the basic idea of compiling one after the other with ‘make -j 8′ or equivalent. There are several reasons to do this, the main one being that this saturates the CPU.

The other choice would be to start the compilation on all subdirs at the same time but with ‘make -j 1′. You could also run two parallel build jobs with ‘-j 4′ or four with ‘-j 2′.

But surely that would be pointless. Doing one thing at the time maximises data locality so the different build trees don’t have to compete with each other for cache.

Right?

Well, let’s measure what actually happens.

timez

The first bar shows the time when running with ‘-j 8′. It is slower than all other combinations. In fact it is over 40% (one minute) slower than the fastest one, although all alternatives are roughly as fast.

Why is this?

In addition to compilation and linking processes, there are parts in the build that can not be parallelised. There are two main things in this case. Can you guess what they are?

What all of these projects had in common is that they are built with Autotools. The configure step takes a very long time and can’t be parallelised with -j. When building consecutively, even with perfect parallelisation, the build time can never drop below the sum of configure script run times. This is easily half a minute each on any non-trivial project even on the fastest i7 machine that money can buy.

The second thing is time that is lost inside Make. Its data model makes it very hard to optimize. See all the gory details here.

The end result of all this is a hidden productivity sink, a minute lost here, one there and a third one over there. Sneakily. In secret. In a way people have come to expect.

These are the worst kinds of productivity losses because people honestly believe that this is just the way things are, have always been and shall be evermore. That is what their intuition and experience tells them.

The funny thing about intuition is that it lies to you. Big time. Again and again.

The only way out is measurements.