Canonical Voices

Posts tagged with 'development'

Jussi Pakkanen

Bug finding tools

In Canonical’s recent devices sprint I held a presentation on automatic bug detection tools. The slides are now available here and contain info on tools such as:

Enjoy.

 

Read more
Jussi Pakkanen

A use case that pops up every now and then is to have a self-contained object that needs to be accessed from multiple threads. The problem appears when the object, as part of its usual things calls its own methods. This leads to tricky locking operations, a need to use a recursive mutex or something else that is nonoptimal.

Another common approach is to use the pimpl idiom, which hides the contents of an object inside a hidden private object. There are ample details on the internet, but the basic setup of a pimpl’d class is the following. First of all we have the class header:

class Foo {
public:
    Foo();
    void func1();
    void func2();

private:
    class Private;
    std::unique_ptr<Private> p;
};

Then in the implementation file you have first the defintiion of the private class.

class Foo::Private {
public:
    Private();
    void func1() { ... };
    void func2() { ... };

private:
   void privateFunc() { ... };
   int x;
};

Followed by the definition of the main class.

Foo::Foo() : p(new Private) {
}

void Foo::func1() {
    p->func1();
}

void Foo::func2() {
    p->func2();
}

That is, Foo only calls the implementation bits in Foo::Private.

The main idea to realize is that Foo::Private can never call functions of Foo. Thus if we can isolate the locking bits inside Foo, the functionality inside Foo::Private becomes automatically thread safe. The way to accomplish this is simple. First you add a (public) std::mutex m to Foo::Private. Then you just change the functions of Foo to look like this:

void Foo::func1() {
    std::lock_guard<std::mutex> guard(p->m);
    p->func1()
}

void Foo::func2() {
    std::lock_guard<std::mutex> guard(p->m);
    p->func2();
}

This accomplishes many things nicely:

  • Lock guards make locks impossible to leak, no matter what happens
  • Foo::Private can pretend that it is single-threaded which usually makes implementation a lot easier

The main drawback of this approach is that the locking is coarse, which may be a problem when squeezing out ultimate performance. But usually you don’t need that.

Read more
Jussi Pakkanen

There are usually two different ways of doing something. The first is the correct way. The second is the easy way.

As an example of this, let’s look at using the functionality of C++ standard library. The correct way is to use the fully qualified name, such as std::vector or std::chrono::milliseconds. The easy way is to have using std; and then just using the class names directly.

The first way is the “correct” one as it prevents symbol clashes and for a bunch of other good reasons. The latter leads to all sorts of problems and for this reason many style guides etc prohibit its use.

But there is a catch. Software is written by humans and humans have a peculiar tendency.

They will always do the easy thing.

There is no possible way for you to prevent them from doing that, apart from standing behind their back and watching every letter they type.

Any sort of system that relies, in any way, on the fact that people will do the right thing rather than the easy thing are doomed to fail from the start. They. Will. Not. Work. And they can’t be made to work. Trying to force it to work leads only to massive shouting and bad blood.

What does this mean to you, the software developer?

It means that the only way your application/library/tool/whatever is going to succeed is that correct thing to do must also be the simplest thing to do. That is the only way to make people do the right thing consistently.

Read more
Jussi Pakkanen

Code review is generally acknowledged to be one of the major tools in modern software development. The reasons for this are simple, it spreads knowledge of the code around the team, obvious bugs and design flaws are spotted early, which makes everyone happy and so on.

But yet our code bases are full of horrible flaws, glaring security holes, unoptimizable algorithms and everything that drives a grown man to sob uncontrollably. These are the sorts of things code review was designed to stop and prevent, so why are they there. Let’s examine this with a thought experiment.

Suppose you are working on a team. Your team member Joe has been given the task of implementing new functionality. For simplicity’s sake let us assume that the functionality is adding together two integers. Then off Joe goes and returns after a few days (possibly weeks) with something.

And I really mean Something.

Instead of coding the addition function, he has created an entire new framework for arbitrary arithmetic operations. The reasoning for this is that it is “more general” because it can represent any mathematical operation (only addition is implemented, though, and trying to use any other operation fails silently with corrupt data). The core is implemented in a multithreaded async callback spaghetti hell that only has a data race on 93% of the time (the remaining 7% covers the one existing test case).

There is only one possible code review for this kind of an achievement.

Review result: Rejected
Comments: If there was a programmer's equivalent to chemical
castration, it would already have been administered to you.

In the ideal world that would be it. The real world has certain impurities as far as this thing goes. The first thing to note is that you have to keep working with whoever wrote the code for an unforeseeable amount of time. Aggravating your coworkers consistenly is not a very nice thing to do and gives you the unfortunate label of “assh*ole”. Things get even worse if Joe is in any way your boss, because critizising his code may get you on the fast track to the basement or possibly the unemployment line. The plain fact, however, is that this piece of code must never be merged. It can’t be fixed by review comments. All of it must be thrown away and replaced with something sane.

At this point office politics enter the fray. Most corporations have deadlines to meet and products to ship. Should you try to block the entry of this code (which implements a Feature, no less, or at least a fraction of one) makes you the bad guy. The code that Joe has written is an expense and if there is one thing organisations do not want to hear it is the fact that they have just wasted a ton of effort. The Code is There and it Must Be Used this Instant! Expect to hear comments of the following kind:

  • Why are you being so negative?
  • The Product Vision requires addition of two numbers. Why are you working against the Vision?
  • Do you want to be the guy that single-handedly destroyed the entire product?
  • This piece of code adds a functionality we did not have before. It is imperative that we get it in now (the product is expected to ship in one year from now)!
  • There is no time to rewrite Joe’s work so we must merge this (even though reimplementing just the functionality would take less effort than even just fixing the obvious bugs)

This onslaught continues until eventually you give in, do the “team decision”, accept the merge and drink yourself unconscious fully aware of the fact that you have to fix all these bugs once someone starts using them (you are not allowed to rewrite it, you must fix the existing code, for that is Law). For some reason or another Joe seems to have magically been transferred somewhere else to work his magic.

For this simple reason code review does not work very well in most offices. If you only ever get comments about how to format your braces, this may be affecting you. In contrast code reviews work quite well in other circumstances. The first one of them is the Linux kernel.

The code that gets into the kernel is being watched over by lots of people. More importantly it is being watched over by people who don’t work for you, your company or their subsidiaries. Linus Torvalds does not care one iota about your company’s quarterly sales goals, launch dates or corporate goals. The only thing he cares about is whether your code is any good. If it is not, it won’t get merged and there is nothing you can do about it. There is no middle manager you can appeal to or HR you can usurp on someone. Unless you have proven yourself, the code reviewers will treat you like an enemy. Anything you do will be scrutinised, analysed, dissected and even outright rejected. This intercorporate fire wall is good because it ensures that terrible code is not merged (sometimes poor code falls through the cracks, though, but such is life). On the other hand this sort of thing causes massive flame wars every now and then.

This does not work in corporate environments, though, for the reasons listed. One way to make it work is to have a master code reviewer who does not care about what other people might think. Someone who can summarily reject awful code without a lengthy “let’s see if we can make it better” discussion. Someone who, when the sales people come to him demanding something to be done half-assed, can tell them to buzz of. Someone who does not care about hurting people’s feelings.

In other words, a psychopath.

Like most things in life, having a psychopath in charge of your code has some downsides. Most of them flow from the fact that psychopaths are usually not very nice to work with.Also, one of the things that is worse than not having code review is having a psychopath master code reviewer that is incompetent or otherwise deluded. Unfortunately most psychopaths are of the latter kind.

So there you have it: the path to high quality code is paved with psychopaths and sworn enemies.

Read more
Jussi Pakkanen

Threads are a bit like fetishes: some people can’t get enough of them and other people just can’t see what the point is. This leads to eternal battles between “we need the power” and “this is too complex”. These have a tendency to never end well.

One inescapable fact about multithreaded and asynchronous programming is that it is hard. A rough estimate says that a multithreaded solution is between ten and 1000 times harder to design, write, debug and maintain than a single threaded one. Clearly, this should not be done without heavy duty performance needs. But how much is that?

Let’s do an experiment to find out. Let’s create a simple C++ network echo server the source code of which can be downloaded here. It can serve an arbitrary amount of clients but it uses only one thread to do so. The implementation uses a simple epoll loop over the open connections.

For our test we use 10 clients that do 10 000 queries each. To reduce the effects of network latency, the clients run on the same machine. The test hardware is a Nexus 4 running the latest Ubuntu phone.

The test finishes in 11 seconds, which means that a single threaded server can serve roughly 10 000 requests a second using basic ARM hardware. It should be noted that because the clients run on the same machine, they are stealing CPU time from the server. The service rates would be bigger if the server process got its own processor. It would also be bigger if compiler optimizations had been enabled but who needs those, anyway.

The end result of all this is that unless you need massive amounts of queries per second or your backend is incredibly slow, multithreading probably won’t do you much good and you’ll be much better of doing everything single-threaded. You’ll spend a lot less time in a debugger and will be generally happier as well.

Even if you need these, multithreading might still not be the way to go. There are other ways of parallelization, such as using multiple processes, which provides additional memory safety and error tolerance as well. This is not to say threads are bad. They are a wonderful tool for many different use cases. You should just be aware the some times the best way to use threads is not to use them at all.

Actually, make that “most times”.

Read more
Jussi Pakkanen

People often wonder why even the simplest of things seem to take long to implement. Often this is accompanied by uttering the phrase made famous by Jeremy Clarkson: how hard can it be.

Well let’s find out. As an example let’s look into a very simple case of creating a shared library that grabs a screen shot from a video file. The problem description is simplicity itself: open the file with GStreamer, seek to a random location and grab the pixels from the buffer. All in all, ten lines of code, should take a few hours to implement including unit tests.

Right?

Well, no. The very first problem is selecting a proper screenshot location. It can’t be in the latter half of the video, for instance. The simple reason for this is that it may then contain spoilers and the mere task of displaying the image might ruin the video file for viewers. So let’s instead select some suitable point, like 2/7:ths of the way in the video clip.

But in order to do that you need to first determine the length of the clip. Fortunately GStreamer provides functionality for this. Less fortunately some codec/muxer/platform/whatever combinations do not implement it. So now we have the problem of trying to determine a proper clip location for a file whose duration we don’t know. In order to save time and effort let’s just grab the screen shot at ten seconds in these cases.

The question now becomes what happens if the clip is less than ten seconds long? Then GStreamer would (probably) seek to the end of the file and grab a screenshot there. Videos often end in black so this might lead to black thumbnails every now and then. Come to think of it, that 2/7:th location might accidentally land on a fade so it might be all black, too. What we need is an image analyzer that detects whether the chosen frame is “interesting” or not.

This rabbit hole goes down quite deep so let’s not go there and instead focus on the other part of the problem.

There are mutually incompatible versions of GStreamer currently in use: 0.10 and 1.0. These two can not be in the same process at the same time due interesting technical issues. No matter which we pick, some client application might be using the other one. So we can’t actually link against GStreamer but instead we need to factor this functionality out to a separate executable. We also need to change the system’s global security profile so that every app is allowed to execute this binary.

Having all this functionality we can just fork/exec the binary and wait for it to finish, right?

In theory yes, but multimedia codecs are tricky beasts, especially hardware accelerated ones on mobile platforms. They have a tendency to freeze at any time. So we need to write functionality that spawns the process, monitors its progress and then kills it if it is not making progress.

A question we have not asked is how does the helper process provide its output to the library? The simple solution is to write the image to a file in the file system. But the question then becomes where should it go? Different applications have different security policies and can access different parts of the file system, so we need a system state parser for that. Or we can do something fancier such as creating a socket pair connection between the library and the client executable and have the client push the results through that. Which means that process spawning just got more complicated and you need to define the serialization protocol for this ad-hoc network transfer.

I could go on but I think the point has been made abundantly clear.

Read more
Jussi Pakkanen

A common step in a software developer’s life is building packages. This happens both directly on you own machine and remotely when waiting for the CI server to test your merge requests.

As an example, let’s look at the libcolumbus package. It is a common small-to-medium sized C++ project with a couple of dependencies. Compiling the source takes around 10 seconds, whereas building the corresponding package takes around three minutes. All things considered this seems like a tolerable delay.

But can we make it faster?

The first step in any optimization task is measurement. To do this we simulated a package builder by building the source code in a chroot. It turns out that configuring the source takes one second, compiling it takes around 12 seconds and installing build dependencies takes 2m 29s. These tests were run on an Intel i7 with 16GB of RAM and an SSD disk. We used CMake’s Make backend with 4 parallel processes.

Clearly, reducing the last part brings the biggest benefits. One simple approach is to store a copy of the chroot after dependencies are installed but before package building has started. This is a one-liner:

sudo btrfs subvolume snapshot -r chroot depped-chroot

Now we can do anything with the chroot and we can always return back by deleting it and restoring the snapshot. Here we use -r so the backed up snapshot is read-only. This way we don’t accidentally change it.

With this setup, prepping the chroot is, effectively, a zero time operation. Thus we have cut down total build time from 162 seconds to 13, which is a 12-fold performance improvement.

But can we make it faster?

After this fix the longest single step is the compilation. One of the most efficient ways of cutting down compile times is CCache, so let’s use that. For greater separation of concerns, let’s put the CCache repository on its own subvolume.

sudo btrfs subvolume create chroot/root/.ccache

We build the package once and then make a snapshot of the cache.

sudo btrfs subvolume snapshot -r chroot/root/.ccache ccache

Now we can delete the whole chroot. Reassembling it is simple:

sudo btrfs subvolume snapshot depped-chroot chroot
sudo btrfs subvolume snapshot ccache chroot/root/.ccache

The latter command gave an error about incorrect ioctls. The same effect can be achieved with bind mounts, though.

When doing this the compile time drops to 0.6 seconds. This means that we can compile projects over 100 times faster.

But can we make it faster?

At this point all individual steps take a second or so. Optimizing them further would yield negligible performance improvements. In actual package builds there are other steps that can’t be easily optimized, such as running the unit test suite, running Lintian, gathering and verifying the package and so on.

If we look a bit deeper we find that these are all, effectively, single process operations. (Some build systems, such as Meson, will run unit tests in parallel. They are in the minority, though.) This means that package builders are running processes which consume only one CPU most of the time. According to usually reliable sources package builders are almost always configured to work on only one package at a time.

Having a 24 core monster builder run single threaded executables consecutively does not make much sense. Fortunately this task parallelizes trivially: just build several packages at the same time. Since we could achieve 100 times better performance for a single build and we can run 24 of them at the same time, we find that with a bit of effort we can achieve the same results 2400 times faster. This is roughly equivalent to doing the job of an entire data center on one desktop machine.

The small print

The numbers on this page are slightly optimistic. However the main reduction in performance achieved with chroot snapshotting still stands.

In reality this approach would require some tuning, as an example you would not want to build LibreOffice with -j 1. Keeping the snapshotted chroots up to date requires some smartness, but these are all solvable engineering problems.

Read more
Jussi Pakkanen

One of the main ways of reducing code complexity (and thus compile times) in C/C++ is forward declaration. The most basic form of it is this:

class Foo;

This tells the compiler that there will be a class called Foo but it does not specify it in more detail. With this declaration you can’t deal with Foo objects themselves but you can form pointers and references to them.

Typically you would use forward declarations in this manner.

class Bar;

class Foo {
  void something();
  void method1(Bar *b);
  void method2(Bar &b);
};

Correspondingly if you want to pass the objects themselves, you would typically do something like this.

#include"Bar.h"

class Foo {
  void something();
  void method1(Bar b);
  Bar method2();
};

This makes sense because you need to know the binary layout of Bar in order to pass it properly to and from a method. Thus a forward declaration is not enough, you must include the full header, otherwise you can’t use the methods of Foo.

But what if some class does not use either of the methods that deal with Bars? What if it only calls method something? It would still need to parse all of Bar (and everything it #includes) even though it never uses Bar objects. This seems inefficient.

It turns out that including Bar.h is not necessary, and you can instead do this:

class Bar;

class Foo {
  void something();
  void method1(Bar b);
  Bar method2();
};

You can define functions taking or returning full objects with forward declarations just fine. The catch is that those users of Foo that use the Bar methods need to include Bar.h themselves. Correspondingly those that do not deal with Bar objects themselves do not need to include Bar.hh ever, even indirectly. If you ever find out that they do, it is proof that your #includes are not minimal. Fixing these include chains will make your source files more isolated and decrease compile times, sometimes dramatically.

You only need to #include the full definition of Bar if you need:

  • to use its services (constructors, methods, constants, etc)
  • to know its memory layout

In practice the latter means that you need to either call or implement a function that takes a Bar object rather than a pointer or reference to it.

For other uses a forward declaration is sufficient.

Post scriptum

The discussion above holds even if Foo and Bar are templates, but making template classes as clean can be a lot harder and may in some instances be impossible. You should still try to minimize header includes as much as possible.

Read more
Jussi Pakkanen

With the release of C++11 something quite extraordinary has happened. Its focus on usable libraries, value types and other niceties has turned C++, conceptually, into a scripting language.

This seems like a weird statement to make, so let’s define exactly what we mean by that. Scripting languages differ from classical compiled languages such as C in the following ways:

  • no need to manually manage memory
  • expressive syntax, complex functionality can be implemented in just a couple of lines of code
  • powerful string manipulation functions
  • large standard library

As of C++11 all these hold true for C++. Let’s examine this with a simple example. Suppose we want to write a program that reads all lines from a file and writes them in a different file in sorted order. This is classical scripting language territory. In C++11 this code would look something like the following (ignoring error cases such as missing input arguments).

#include<string>
#include<vector>
#include<algorithm>
#include<fstream>

using namespace std;

int main(int argc, char **argv) {
  ifstream ifile(argv[1]);
  ofstream ofile(argv[2]);
  string line;
  vector<string> data;
  while(getline(ifile, line)) {
    data.push_back(line);
  }
  sort(data.begin(), data.end());
  for(const auto &i : data) {
    ofile << i << std::endl;
  }
  return 0;
}

That is some tightly packed code. Ignoring include boilerplate and the like leaves us with roughly ten lines of code. If you were to do this with plain C using only its standard library merely implementing getline functionality reliably would take more lines of code. Not to mention it would be tricky to get right.

Other benefits include:

  • every single line of code is clear, understandable and expressive
  • memory leaks can not happen, could be reworked into a library function easily
  • smaller memory footprint due to not needing a VM
  • compile time with -O3 is roughly the same as Python VM startup and has to be done only once
  • faster than any non-JITted scripting language

Now, obviously, this won’t mean that scripting languages will disappear any time soon (you can have my Python when you pry it from my cold, dead hands). What it does do is indicate that C++ is quite usable in fields one traditionally has not expected it to be.

Read more
Jussi Pakkanen

The problem

Suppose you have a machine with 8 cores. Also suppose you have the following source packages that you want to compile from scratch.

eog_3.8.2.orig.tar.xz
grilo_0.2.6.orig.tar.xz
libxml++2.6_2.36.0.orig.tar.xz
Python-3.3.2.tar.bz2
glib-2.36.4.tar.xz
libjpeg-turbo_1.3.0.orig.tar.gz
wget_1.14.orig.tar.gz
grail-3.1.0.tar.bz2
libsoup2.4_2.42.2.orig.tar.xz

You want to achieve this as fast as possible. How would you do it?

Think carefully before proceeding.

The solution

Most of you probably came up with the basic idea of compiling one after the other with ‘make -j 8′ or equivalent. There are several reasons to do this, the main one being that this saturates the CPU.

The other choice would be to start the compilation on all subdirs at the same time but with ‘make -j 1′. You could also run two parallel build jobs with ‘-j 4′ or four with ‘-j 2′.

But surely that would be pointless. Doing one thing at the time maximises data locality so the different build trees don’t have to compete with each other for cache.

Right?

Well, let’s measure what actually happens.

timez

The first bar shows the time when running with ‘-j 8′. It is slower than all other combinations. In fact it is over 40% (one minute) slower than the fastest one, although all alternatives are roughly as fast.

Why is this?

In addition to compilation and linking processes, there are parts in the build that can not be parallelised. There are two main things in this case. Can you guess what they are?

What all of these projects had in common is that they are built with Autotools. The configure step takes a very long time and can’t be parallelised with -j. When building consecutively, even with perfect parallelisation, the build time can never drop below the sum of configure script run times. This is easily half a minute each on any non-trivial project even on the fastest i7 machine that money can buy.

The second thing is time that is lost inside Make. Its data model makes it very hard to optimize. See all the gory details here.

The end result of all this is a hidden productivity sink, a minute lost here, one there and a third one over there. Sneakily. In secret. In a way people have come to expect.

These are the worst kinds of productivity losses because people honestly believe that this is just the way things are, have always been and shall be evermore. That is what their intuition and experience tells them.

The funny thing about intuition is that it lies to you. Big time. Again and again.

The only way out is measurements.

 

Read more
Jussi Pakkanen

We all like C++’s container classes such as maps. The main negative thing about them is persistance. Ending your process makes the data structure go away. If you want to store it, you need to write code to serialise it to disk and then deserialise it back to memory again when you need it. This is tedious work that has to be done over and over again.

It would be great if you could command STL containers to write their data to disk instead of memory. The reductions in application startup time alone would be welcomed by all. In addition most uses for small embedded databases such as SQLite would go away if you could just read stuff from persistent std::maps.

The standard does not provide for this because serialisation is a hard problem. But it turns out this is, in fact, possible to do today. The only tools you need are the standard library and basic standards conforming C++.

Before we get to the details, please note this warning from the society of responsible coding.

evil

What follows is the single most evil piece of code I have ever written. Do not use it unless you understand the myriad of ways it can fail (and possibly not even then).

The basic problem is that C++ containers work only with memory but serialisation requires writing bytes to disk. The tried and true solution for this problem is memory mapped files. It is a technique where a certain portion of process’ memory is mapped to a backing file. Any changes to the memory layout will be written to the disk by the kernel. This gives us memory serialisation.

This is only half of the problem, though. STL containers and others allocate the memory they need through operator new. The way new works is implementation defined. It may give out addresses that are scattered around the memory space. We can’t mmap the entire address space because it would take too much space and serialise lots of stuff we don’t care about.

Fortunately C++ allows you to specify custom allocators for containers. An allocator is an object that does memory allocations for the object it is tied to. This indirection allows us to write our own allocator that gives out raw memory chunks from the mmapped memory area.

But there is still a problem. Since pointers refer to absolute memory locations we would need to have the mmapped memory area in the same location in every process that wants to use it. It turns out that you can enforce the address at which the memory mapping is to be done. This gives us an outline on how to achieve our goal.

  • create an empty file for backing (10 MB in this example)
  • mmap it in place
  • populate the data structure with objects allocated in the mmapped area
  • close creator program
  • start reader program, mmap the data and cast the root object into existance

And that’s it. Here’s how it looks in code. First some declarations:

*mmap_start = (void*)139731133333504;
size_t offset = 1024;

template <typename T>
class MmapAlloc {
  ....
  pointer allocate(size_t num, const void *hint = 0) {
    long returnvalue = (long)mmap_start + offset;
    size_t increment = num * sizeof(T) + 8;
    increment -= increment % 8;
    offset += increment;
    return (pointer)returnvalue;
  }
  ...
};

typedef std::basic_string<char, std::char_traits<char>,
  MmapAlloc<char>> mmapstring;
typedef std::map<mmapstring, mmapstring, std::less<mmapstring>,
  MmapAlloc<mmapstring> > mmapmap;

First we declare the absolute memory address of the mmapping (it can be anything as long as it won’t overlap an existing allocation). The allocator itself is extremely simple, it just hands out memory offset bytes in the mapping and increments offset by the amount of bytes allocated (plus alignment). Deallocated memory is never actually freed, it remains unused (destructors are called, though). Last we have typedefs for our mmap backed containers.

Population of the data sets can be done like this.

int main(int argc, char **argv) {
    int fd = open("backingstore.dat", O_RDWR);
    void *mapping;
    mapping = mmap(mmap_start, 10*1024*1024,
      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0);
    if(mapping == MAP_FAILED) {
        printf("MMap failed.\n");
        return 1;
    }
    mmapstring key("key");
    mmapstring value("value");
    if(fd < 1) {
        printf("Open failed.\n");
        return 1;
    }
    auto map = new(mapping)mmapmap();
    (*map)[key] = value;
    printf("Sizeof map: %ld.\n", (long)map->size());
    printf("Value of 'key': %s\n", (*map)[key].c_str());
    return 0;
}

We construct the root object at the beginning of the mmap and then insert one key/value pair. The output of this application is what one would expect.

Sizeof map: 1.
Value of 'key': value

Now we can use the persisted data structure in another application.

int main(int argc, char **argv) {
    int fd = open("backingstore.dat", O_RDONLY);
    void *mapping;
    mapping = mmap(mmap_start, 10*1024*1024, PROT_READ,
     MAP_SHARED | MAP_FIXED, fd, 0);
    if(mapping == MAP_FAILED) {
        printf("MMap failed.\n");
        return 1;
    }
    std::string key("key");
    auto *map = reinterpret_cast<std::map<std::string,
                                 std::string> *>(mapping);
    printf("Sizeof map: %ld.\n", (long)map->size());
    printf("Value of 'key': %s\n", (*map)[key].c_str());
    return 0;
}

Note in particular how we can specify the type as std::map<std::string, std::string> rather than the custom allocator version in the creator application. The output is this.

Sizeof map: 1.
Value of 'key': value

It may seem a bit anticlimactic, but what it does is quite powerful.

Extra evil bonus points

If this is not evil enough for you, just think about what other things can be achieved with this technique. As an example you can have the backing file mapped to multiple processes at the same time, in which case they all see changes live. This allows you to have things such as standard containers that are shared among processes.

Read more
Jussi Pakkanen

Some C++ code bases seem to compile much more slowly than others. It is hard to compare them directly because they very often have different sizes. Thus it is hard to encourage people to work on speed because there are no hard numbers to back up your claims.

To get around this I wrote a very simple compile time measurer. The code is available here. The basic idea is quite simple: provide a compiler wrapper that measures the duration of each compiler invocation and the amount of lines (including comments, empty lines etc) the source file had. Usage is quite simple. First you configure your code base.

CC='/path/to/smcc.py gcc' CXX='/path/to/smcc.py g++' configure_command

Then you compile it.

SMCC_FILE=/path/to/somewhere/sm_times.txt compile_command

Finally you run the analyzer script on the result file.

sm-analyze.py /path/to/sm_times.txt

The end result is the average amount of lines compiled per second as well as per-file compile speed sorted from slowest to fastest.

I ran this on a couple of code bases and here are the results. The test machine was a i7 with 16GB of ram using eight parallel compile processes. Unoptimized debug configuration was always chosen.

                   avg   worst     best
Libcolumbus     287.79   48.77  2015.60
Mediascanner     52.93    5.64   325.55
Mir             163.72   10.06 17062.36
Lucene++         65.53    7.57   874.88
Unity            45.76    1.86  1016.51
Clang           238.31    1.51 20177.09
Chromium        244.60    1.28 49037.79

For comparison I also measured a plain C code base.

                   avg   worst     best
GLib           4084.86  101.82 19900.18

We can see that C++ compiles quite a lot slower than plain C. The main interesting thing is that C++ compilation speed can change an order of magnitude between projects. The fastest is libcolumbus, which has been designed from the ground up to be fast to compile.

What we can deduce from this experiment is that C++ compilation speed is a feature of the code base, not so much of the language or compiler. It also means that if your code base is a slow one, it is possible to make it compile up to 10 times faster without any external help. The tools to do it are simple: minimizing interdependencies and external deps. This is one of those things that is easy to do when starting anew but hard to retrofit to code bases that resemble a bowl of ramen. The payoff, however, is undeniable.

Read more
Jussi Pakkanen

Pimpl is a common idiom in C++. It means hiding the implementation details of a class with a construct that looks like this:

class pimpl;

class Thing {
private:
  pimpl *p:
public:
 ...
};

This cuts down on compilation time because you don’t have to #include all headers required for the implementation of this class. The downside is that p needs to be dynamically allocated in the constructor, which means a call to new. For often constructed objects this can be slow and lead to memory fragmentation.

Getting rid of the allocation

It turns out that you can get rid of the dynamic allocation with a little trickery. The basic approach is to preserve space in the parent object with, say, a char array. We can then construct the pimpl object there with placement new and delete it by calling the destructor.

A header file for this kind of a class looks something like this:

#ifndef PIMPLDEMO_H
#define PIMPLDEMO_H

#define IMPL_SIZE 24

class PimplDemo {
private:
  char data[IMPL_SIZE];

 public:
  PimplDemo();
  ~PimplDemo();

  int getNumber() const;
};

#endif

IMPL_SIZE is the size of the pimpl object. It needs to be manually determined. Note that the size may be different on different platforms.

The corresponding implementation looks like this.

#include"pimpldemo.h"
#include<vector>

using namespace std;

class priv {
public:
  vector<int> foo;
};

#define P_DEF priv *p = reinterpret_cast<priv*>(data)
#define P_CONST_DEF const priv *p = reinterpret_cast<const priv*>(data)

PimplDemo::PimplDemo() {
  static_assert(sizeof(priv) == sizeof(data), "Pimpl array has wrong size.");
  P_DEF;
  new(p) priv;
  p->foo.push_back(42); // Just for show.
}

PimplDemo::~PimplDemo() {
  P_DEF;
  p->~priv();
}

int PimplDemo::getNumber() const {
  P_CONST_DEF;
  return (int)p->foo.size();
}

Here we define two macros that create a variable for accessing the pimpl. At this point we can use it just as if were defined in the traditional way. Note the static assert that checks, at compile time, that the space we have reserved for the pimpl is the same as what the pimpl actually requires.

We can test that it works with a sample application.

#include<cstdio>
#include<vector>
#include"pimpldemo.h"

int main(int argc, char **argv) {
  PimplDemo p;
  printf("Should be 1: %d\n", p.getNumber());
  return 0;
}

The output is 1 as we would expect. The program is also Valgrind clean so it works just the way we want it to.

When should I use this technique?

Never!

Well, ok, never is probably a bit too strong. However this technique should be used very sparingly. Most of the time the new call is insignificant. The downside of this approach is that it adds complexity to the code. You also have to keep the backing array size up to date as you change the contents of the pimpl.

You should only use this approach if you have an object in the hot path of your application and you really need to squeeze the last bit of efficiency out of your code. As a rough guide only about 1 of every 100 classes should ever need this. And do remember to measure the difference before and after. If there is no noticeable improvement, don’t do it.

Read more
Jussi Pakkanen

The quest for software quality has given us lots of new tools: new compiler warnings, making the compiler treat all warnings as errors, style checkers, static analyzers and the like. These are all good things to have. Sooner or later in a project someone will decide to make them mandatory. This is fine as well, and the reason we have continuous integration servers.

Still, some people might, at times, propose MRs that cause CI to emit errors. The issues are fixed, some time is lost when doing this but on the whole it is no big deal. But then someone comes to the conclusion that these checks should be mandatory on every build so the errors never get to the CI server. Having all builds pristine all the time is great, the reasoning goes, because then errors are found as soon as possible. This is as per the agile manifesto and universally a good thing to have.

Except that it is not. It is terrible! It is a massive drain on productivity and the kind of thing that makes people hate their job and all things related to it.

This is a strong and somewhat counter-intuitive statement. Let’s explore it with an example. Suppose we have this simple snippet of code.

  x = this_very_long_function_that_does_something(foo, bar, baz10, foofoofoo);

Now let’s suppose we have a bug somewhere. As part of the debugging cycle we would like to check what would happen if x had the value 3 instead of whatever value the function returns. The simple way to check is to change the code like this.

  x = this_very_long_function_that_does_something(foo, bar, baz10, foofoofoo);
  x = 3;

This does not give you the result you want. Instead you get a compile/style/whatever checker error. Why? Because you assign to variable x twice without using the first value for anything. This is called a dead assignment. It may cause latent bugs, so the checker issues an error halting the build.

Fair enough, let’s do this then.

  this_very_long_function_that_does_something(foo, bar, baz10, foofoofoo);
  x = 3;

This won’t work either. The code is ignoring the return value of the function, which is an error (in certain circumstances but not others).

Grumble, grumble. On to iteration 3.

  //x = this_very_long_function_that_does_something(foo, bar, baz10, foofoofoo);
  x = 3;

This will also fail. The line with the comment is over 80 characters wide and this is not tolerated by many style guides, presumably because said code bases are being worked on by people who only have access to text consoles. On to attempt 4.

//x = this_very_long_function_that_does_something(foo, bar, baz10, foofoofoo);
  x = 3;

This won’t work either for two different reasons. Case 1: the variables used as arguments might not be used at all and will therefore trigger unused variable warnings. Case 2: if any argument is passed by reference or pointer, their state might not be properly updated.

The latter case is the worst, because no static checker can detect it. Requiring the code to conform to a cosmetic requirement caused it to grow an actual bug.

Getting this kind of test code through all the testers is a lot of work. Sometimes more than the actual debug work. Most importantly, it is completely useless work. Making this kind of exploratory code fully polished is useless because it will never, ever enter any kind of production. If it does, your process has bigger issues than code style. Any time spent working around a style checker is demotivational wasted effort.

But wait, it gets worse!

Type checkers are usually slow. They can take over ten times longer to do their checks than just plain compiling the source code. Which means that developer productivity with these tools is, correspondingly, several times lower than it could be. Programmers are expensive, having them sitting around watching preprocessor checker text scroll slowly by in a terminal is not a very good use of their time.

Fortunately there is a correct solution for this. Make the style checks a part of the unit test suite, possibly as part of an optional suite of slow tests. Run said tests on every CI merge. This allows developers to be fast and productive most of the time but precise and polished when required.

Read more
Jussi Pakkanen

Boost is great. We all love it. However there is one gotcha that you have to keep in mind about it, which is the following:

You must never, NEVER expose Boost in your public headers!

Even if you think you have a valid case, you don’t! Exposing Boost is 100% absolutely the wrong thing to do always!

Why is that?

Because Boost has no stability guarantees. It can and will change at any time. What this means is that if you compile your library against a certain version of Boost, all people who ever link to it must use the exact same version. If an application links against two libraries that use a different version of Boost, it will not work and furthermore can’t be made to work. Unless you are a masochist and have different parts of your app #include different Boost versions. Also: don’t ever do that!

The message is clear: your public headers must never include anything from Boost. This is easy to check with grep. Make a test script and run it as part of your test suite. Failure to do so may cause unexpected lead poisoning courtesy of disgruntled downstream developers.

Read more
Jussi Pakkanen

The decision by Go to not provide exceptions has given rise to a renaissance of sorts to eliminate exceptions and go back to error codes. There are various reasons given, such as efficiency, simplicity and the fact that exceptions “suck”.

Let’s examine what exceptions really are through a simple example. Say we need to write code to download some XML, parse and validate it and then extract some piece of information. There are several different ways in which this can fail: network may be down, the server won’t respond, the XML is malformed and so on. Suppose then that we encounter an error. The call stack probably looks like this:

Func1 is the function that drives this functionality and Func7 is where the problem happens. In this particular case we don’t care about partial results. If we can’t do all steps, we just give up. The error propagation starts by Func7 returning an error code to Func6. Func6 detects this and returns an error to Func5. This keeps happening until Func1 gets the error and reports failure to its caller.

Should Func7 throw an exception, functions 6-2 would not need to do anything. The compiler takes care of everything, Func1 catches the exception and reports the error.

This very simple example tells us what exceptions really are: a reliable way of moving up the call stack multiple frames at a time.

It also tells us what their main feature is: they provide a way to centralise error handling in one place.

It should be noted that exceptions do not force centralised error handling. Any Function between 1 and 7 can catch any exception if that is deemed the best thing to do. The developer only needs to write code in those locations. In contrast to error codes require extra code at every single intermediate step. This might not seem so much in this particular case, after all there are only 6 functions to change. Unfortunately in reality things look like this:

That is, functions usually call several other functions to get their job done. This means that if the average call stack depth is N, the developer needs to write O(2^N) error handling stubs. They also need to be tested, which means writing tons of mock classes. If any single one of these checks is wrong or missing, the system has a latent bug.

Even worse, most error code handlers look roughly like this:

ec = do_something();
if(ec) {
  do_some_cleanup();
  return ec;
}

What this code actually does is replicate the behaviour of exceptions. The only difference is that the developer needs to write this anew every single time, which opens the door for bugs.

Design lesson to be learned

Usually when you design an API, there are two choices: either it can be very simple or feature rich. The latter usually takes more time for the API developer to get right but saves effort for its users. In the case of exceptions, it requires work in the compiler, linker and runtime. Depending on circumstances, either one of these may be a valid choice.

When choosing between these two it is often beneficial to step back and look at it from a wider perspective. If the simpler choice was taken, what would happen? If it seems that in most cases (say >80%) people would only use the simple approach to mimic the behaviour of the feature rich one, it is a pretty strong hint that you should provide the feature rich one (or maybe even both).

This problem can go the other way, too. If the framework only provides a very feature rich and complex api, which people then use to simulate the simpler approach. The price of good design is eternal vigilance.

Read more
Jussi Pakkanen

If you read discussions on the Internet about memory allocation (and who doesn’t, really), one surprising tidbit that always comes up is that in Linux, malloc never returns null because the kernel does a thing called memory overcommit. This is easy to verify with a simple test application.

#include<stdio.h>
#include<malloc.h>

int main(int argc, char **argv) {
  while(1) {
    char *x = malloc(1);
    if(!x) {
      printf("Malloc returned null.\n");
      return 0;
    }
    *x = 0;
  }
  return 1;
}

This app tries to malloc memory one byte at a time and writes to it. It keeps doing this until either malloc returns null or the process is killed by the OOM killer. When run, the latter happens. Thus we have now proved conclusively that malloc never returns null.

Or have we?

Let’s change the code a bit.

#include<stdio.h>
#include<malloc.h>

int main(int argc, char **argv) {
  long size=1;
  while(1) {
    char *x = malloc(size*1024);
    if(!x) {
      printf("Malloc returned null.\n");
      printf("Tried to alloc: %ldk.\n", size);
      return 0;
    }
    *x = 0;
    free(x);
    size++;
  }
  return 1;
}

In this application we try to allocate a block of ever increasing size. If the allocation is successful, we release the block before trying to allocate a bigger one. This program does receive a null pointer from malloc.

When run on a machine with 16 GB of memory, the program will fail once the allocation grows to roughly 14 GB. I don’t know the exact reason for this, but it may be that the kernel reserves some part of the address space for itself and trying to allocate a chunk bigger than all remaining memory fails.

Summarizing: malloc under Linux can either return null or not and the non-null pointer you get back is either valid or invalid and there is no way to tell which one it is.

Happy coding.

Read more
Jussi Pakkanen

Let’s talk about revision control for a while. It’s great. Everyone uses it. People love the power and flexibility it provides.

However, if you read about happenings from over ten years ago or so, we find that the situation was quite different. Seasoned developers were against revision control. They would flat out refuse to use it and instead just put everything on a shared network drive or used something crazier, such as the revision control shingle.

Thankfully we as a society have gone forwards. Not using revision control is a firing offense. Most people would flat out refuse to accept a job that does not use revision control regardless of anything short of a few million euros in cash up front. Everyone accepts that revision control is the building block of quality. This is good.

It is unfortunate that this view is severely lacking in other aspects of software development. Let’s take as an example tests. There are actually people, in visible places, that publicly and vocally speak against writing tests. And for some reason we as a whole sort of accept that rather and not immediately flag that out as ridiculous nonsense.

A first example was told to me by a friend working on a quite complex piece of mathematical code. When he discovered that there were no tests at all measuring that it worked he was replied this: “If you are smart enough to be hired to work on this code, you are smart enough not to need tests.” I really wish this were an isolated incident, but in my heart I know that is not the case.

The second example is a posting made a while back by a well known open source developer. It had a blanket statement saying that test driven development is bad and harmful. The main point seemed to be a false dichotomy between good software with no tests and poor software with tests.

Even if testing is done, the implementation may be just a massive bucketful of fail. As an example, here you can read how people thought audio codecs should be tested.

As long as this kind of thinking is tolerated, no matter how esteemed a person says it, we are in the same place as medicine was during the age of bloodletting and leeches. This is why software is considered to be unreliable, buggy piece of garbage that costs hundreds of millions. And the only way out of it is a change of collective attitude. Unfortunately those often take quite a long time to happen, but a man can dream, can he not?

Read more
Jussi Pakkanen

One of the grand Unix traditions is that source code is built directly inside the source tree. This is the simple approach, which has been used for decades. In fact, most people do not even consider doing something else, because this is the way things have always been done.

The alternative to an in-source build is, naturally, an out-of-source build. In this build type you create a fresh subdirectory and all files generated during the build (object files, binaries etc) are written in that directory. This very simple change brings about many advantages.

Multiple build directories with different setups

This is the main advantage of separate build directories. When developing you typically want to build and test the software under separate conditions. For most work you want to have a build that has debug symbols on and all optimizations disabled. For performance tests you want to have a build with both debug and optimizations on. You might want to compile the code with both GCC and Clang to test compatibility and get more warnings. You might want to run the code through any one of the many static analyzers available.

If you have an in-source build, then you need to nuke all build artifacts from the source tree, reconfigure the tree and then rebuild. You also need to return the old settings because you probably don’t want to run a static analyzer on your day-to-day development work, mostly because it is up to 10 times slower than a non-optimized build.

Separate build directories provide a nice solution to this problem. Since all their state is stored in a separate build directory, you can have as many build directories per one source directory as you want. They will not stomp on each other. You only need to configure your build directories once. When you want to build any specific configuration, you just run Make/Ninja/whatever in that subdirectory. Assuming your build system is good (i.e. not Autotools with AM_MAINTAINER_MODE hacks) this will always work.

No need to babysit generated files

If you look at the .bzrignore file of a common Autotools project, it typicaly has on the order of a dozen or so rules for files such as Makefiles, Makefile.ins, libtool files and all that stuff. If your build system generates .c source files which it then compiles, all those files need to be in the ignore file. You could also have a blanket rule of ‘*.c’ but that is dangerous if your source tree consists of handwritten C source. As files come and go, the ignore file needs to be updated constantly.

With build directories all this drudgery goes away. You only need to add build directory names to the ignore file and then you are set. All new source files will show up immediately as will stray files. There is no possibility of accidentally masking a file that should be checked in revision control. Things just work.

Easy clean

Want to get rid of a certain build configuration? Just delete the subdirectory it resides in. Done! There is no chance whatsoever that any state from said build setup remains in the source tree.

Separate partitions for source and build

This gets into very specific territory but may be useful sometimes. The build directory can be anywhere in the filesystem tree. It can even be on a different partition. This allows you to put the build directory on a faster drive or possibly even on ramdisk. Security conscious people might want to put the source tree on a read-only (optionally a non-execute) file system.

If the build tree is huge, deleting it can take a lot of time. If the build tree is in a BTRFS subvolume, deleting all of it becomes a constant time operation. This may be useful in continuous integration servers and the like.

Conclusion

Building in separate build directories brings about many advantages over building in-source. It might require some adjusting, though. One specific thing that you can’t do any more is cd into a random directory in your source tree and typing make to build only that subdirectory. This is mostly an issue with certain tools with poor build system integration that insist on running Make in-source. They should be fixed to work properly with out-of-source builds.

If you decide to give out-of-tree builds a try, there is one thing to note. You can’t have in-source and out-of-source builds in the same source tree at the same time (though you can have either of the two). They will interact with each other in counter-intuitive ways. The end result will be heisenbugs and tears, so just don’t do it. Most build systems will warn you if you try to have both at the same time. Building out-of-source may also break some applications, typically tests, that assume they are being run from within the source directory.

Read more
Jussi Pakkanen

There has been gradual movement towards CMake in Canonical projects. I have inspected quite a lot of build setups written by many different people and certain antipatterns and inefficiencies seem to pop up again and again. Here is a list of the most common ones.

Clobbering CMAKE_CXX_FLAGS

Very often you see constructs such as these:

set(CMAKE_CXX_FLAGS "-Wall -pedantic -Wextra")

This seems to be correct, since this command is usually at the top of the top level CMakeLists.txt. The problem is that CMAKE_CXX_FLAGS may have content that comes from outside CMakeLists. As an example, the user might set values with the ccmake configuration tool. The construct above destroys those settings silently. The correct form of the command is this:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -pedantic -Wextra")

This preserves the old value.

Adding -g manually

People want to ensure that debugging information is on so they set this compiler flag manually. Often several times in different places around the source tree.

It should not be necessary to do this ever.

CMake has a concept of build identity. There are debug builds, which have debug info and no optimization. There are release builds which don’t have debug info but have optimization. There are relwithdebinfo builds which have both. There is also the default plain type which does not add any optimization or debug flags at all.

The correct way to enable debug is to specify that your build type is either debug or relwithdebinfo. CMake will then take care of adding the proper compiler flags for you. Specifying the build type is simple, just pass -DCMAKE_BUILD_TYPE=debug to CMake when you first invoke it. In day to day development work, that is what you want over 95% of the time so it might be worth it to create a shell alias just for that.

Using libraries without checking

This antipattern shows itself in constructs such as this:

target_link_libraries(myexe -lsomesystemlib)

CMake will pass the latter one to the compiler command line directly so it will link against the library. Most of the time. If the library does not exist, the end result is a cryptic linker error. The problem is worse still if the compiler in question does not understand the -l syntax for libraries (unfortunately those exist).

The solution is to use find_library and pass the result from that to target_link_libraries. This is a bit more work up front but will make the system more pleasant to use.

Adding header files to target lists

Suppose you have a declaration like this:

add_executable(myexe myexe.c myexe.h)

In this case myexe.h is entirely superfluous. It can just be dropped. The reason people put that in is probably because they think it is required to make CMake rebuild the target in case the header is changed. That is not necessary. CMake will use the dependency information from Gcc and add this dependency automatically.

The only exception to this rule is when you generate header files as part of your build. Then you should put them in the target file list so CMake knows to generate them before compiling the target.

Using add_dependencies

This one is simple. Say you have code such as this:

target_link_libraries(myexe mylibrary)
add_dependencies(myexe mylibrary)

The second line is unnecessary. CMake knows there is a dependency between the two just based on the first line. There is no need to say it again, so the second line can be deleted.

Add_dependencies is only required in certain rare and exceptional circumstances.

Invoking make

Sometimes people use custom build steps and as part of those invoke “make sometarget”. This is not very clean on many different levels. First of all, CMake has several different backends, such as Ninja, Eclipse, XCode and others which do not use Make to build. Thus the invocation will fail on those systems. Hardcoding make invocations in your build system prevents other people from using their preferred backends. This is unfortunate as multiple backends are one of the main strengths of CMake.

Second of all, you can invoke targets directly in CMake. Most custom commands have a DEPENDS option that can be used to invoke other targets. That is the preferred way of doing this as it works with all backends.

Assuming in-source builds

Unix developers have decades worth of muscle memory telling them to build their code in-source. This leaks into various place. As an example, test data file may be accessed assuming that the source is built in-tree and that the program is executed in the directory it resides in.

Out-of-source builds provide many benefits (which I’m not going into right now, it could fill its own article). Even if you personally don’t want to use them, many other people will. Making it possible is the polite thing to do.

Inline sed and shell pipelines

Some builds require file manipulation with sed or other such shell tricks. There’s nothing wrong with them as such. The problem comes from embedding them inside CMakeLists command invocations. They should instead be put into their own script files which are then called from CMake. This makes them more easily documentable and testable.

Invoking CMake multiple times

This last piece is not a coding antipattern but a usage antipattern. People seem to run the CMake binary over again after doing changes to their build systems. This is not necessary. For any given build directory, you only ever need to run the cmake binary once: when you first configure your project.

After the first configuration the only command you ever need to run is your build command (be it make or ninja). It will detect changes in the build system, automatically regenerate all necessary files and compile the end result. The user does not need to care. This behaviour is probably residue from being burned several times by Autotools’ maintainer mode. This is understandable, but in CMake this feature will just work.

Read more