Everything I needed to know about business management I learned from Star Wars

Steve Denning has held a wonderful presentation on how management should be be done in the 21st century. His main point is that instead of making money, the main goal of a company should be to delight its customers. He reasons that if the main goal is making money, it leads into a corporate structure that abhors innovation which makes the corporation vulnerable to agile startups.

The video is extremely recommended for everyone who deals with management in any way (including those who are being managed). The material in the presentation may seem very familiar to you either because it mirrors your own working experience or because you recognize the patterns it presents in other areas as well.

One well known piece of popular culture resonates very strongly with the presentation: the original Star Wars trilogy. The Galactic Empire is a good analogy of a large, established corporation that is being challenged by a small but nimble Rebel Alliance.

No need to plan, just do what you are told

Let us start our analysis by contrasting the meeting and decision making processes of the Empire and the Alliance. Probably the most famous meeting scene in the entire trilogy happens in Star Wars when Empire officials are discussing the rebellion and the lost Death Star plans. A simple overview of the meeting tells us that it will not be a successful one.

The meeting happens around one huge table. It is very probable that people can’t hear what participants on the other side of the table are saying. This is even more probable when you notice that most people are old generals who probably have poor hearing. They also look like they would rather be anywhere else than at the meeting. However the issues they are discussing are vital and the outcomes will shape the entire future of the Empire. More specifically the end of it.

As far as we can tell, the point of the meeting is to determine what to do should the Alliance really have a copy of the Death Star plans. All that everyone remembers of the meeting is Force choking. This is quite sad, because the issues raised are important ones. Lord Vader has not been able to find the lost tapes. Neither has he been able to find the rebel base. We also know that he has no real, workable plan to achieve these goals. Rather than work on the problem with a group of military experts Vader instead chooses to save his face by shooting the messenger.

The end result is that no actual work gets done. There are no contingency plans, no alternative approaches, nothing. The entire strategy of the Empire, the largest, most powerful organization in the galaxy, is It Will Work Because I Say It Will Work, now STFU and GBTW. It is also made painfully clear that questioning the choices of the Dear Leader is hazardous.

Compare this with the Rebel Alliance. All pieces of information and opinions are dealt with in a positive manner. When someone suspects that the targeting computer would not be able to hit the Death Star’s exhaust port, Luke does not attack him but rather gives a personal example showing how it is possible. When Han reports that a probable imperial probe droid has found them, the people in charge trust him and start working on evaluation. He could have gotten a response along the lines of do you have any idea how much resources we have spent to build this base, do you expect us to abandon all that based on a hunch. Even C-3PO, the lowest rung of the rebel ladder, can give his signal analysis results directly to the main command and his expertise is valued.

The Rebel Alliance is about solving problems, trust and agility. The Empire is about top-down leadership and management by mandates and shouting. We all know which one of them won in the end.

Failing with people: Management by Vader

If we view the Empire as a corporation then we can say that the Emperor is roughly the chairman of the board whereas Darth Vader is the CEO. He is in charge of the operational branches of the Empire. His decisions define the corporate culture. Let us examine his leadership using the battle of Hoth as a case study.

From a management point of view the largest conflict is between Darth Vader and Admiral Ozzel. Details on Ozzel’s military career are spotty but we can assume that he has gone through the Imperial Military/Space Navy/whatever Academy, gotten good grades, worked hard on his career and eventually has reached the rank of Admiral. Darth Vader, by comparison, is a whiny kid from a backwards desert planet with no formal military training and who has gotten his current rank through nepotism [1].

The conflict between these two comes to its peak when Vader feels that Ozzel has flown the fleet too close to the suspected rebel base. Let’s think about that for a second. The mission they are engaged in is basically a surprise attack. The goal is to catch the enemy unaware, crush them fast and prevent any escape attempts. This being the case flying in hyperspace as close as possible to the target planet and attacking immediately is the right thing to do. The alternative approach, and the one apparently preferred by Vader, would be to come out of hyperspace far from the planet and then fly slowly closer and attack. This strategy would have given the rebels ample time to jump every ship to hyperspace long before the Star Destroyers could have fired a single shot.

From a management point of view this single episode has many failures. First of all Vader did not trust his employees but rather started telling them what to do through micromanagement. Secondly even though he had a very clear vision on how the assault should be handled, he did not explain it to his underlings beforehand. He just magically assumed that they would do the right thing. Maybe he had forgotten that only Sith Lords have the ability to read people’s minds. Thirdly, when the fleet had left lightspeed, punishing Ozzel was the stupidest thing he could possibly do. The attack plan was in motion, nothing could change that. Any punishment should have happened only after the campaign. Summary executions in the middle of troop deployment only serves to weaken morale.

Kinda makes you wonder if the only reason Vader choked Ozzel was that he could be used as a scapegoat in case of failure.

This kind of behavior happens again and again through the trilogy. During the assault on Death Star, Vader explicitly tells professional TIE fighter pilots not to shoot at their targets, either because he wants all the glory to himself or because he thinks they are too stupid to hit anything. He orders the entire fleet inside an asteroid field causing billions of credits worth of damages and several thousand deaths. They could just have waited outside the asteroid field because it is well established that one can’t jump to hyperspace from inside the field. He also micromanages the search by demanding constant progress updates.

Come to think of it, almost every single management and executive decision Vader makes is wrong. He would have run the entire Empire down to the ground even if he hadn’t killed the Emperor. In corporations this kind of manager is unfortunately all too common. The higher up the chain he is, the more damage he can do. If he holds major amounts of stock, things are even worse because then he becomes really hard to get rid of.

Motivating the masses: the case of Stormtrooper apathy

One of the most ridiculed aspects of the Star Wars trilogy are the stormtroopers and especially their shooting accuracy. In corporations stormtroopers correspond to regular low level workers. The ones that actually get all the grunt work done. If you compare stormtroopers to rebel fighters, you find that rebel forces are consistently better. They shoot more accurately, have more imagination and just generally get things done better.

One might speculate that this is because all the top talent goes to the Rebel Alliance because it is the hot new cool stuff. In reality they are both recruiting from the same talent pool. Moreover it can be speculated that the best of the best of the best would go to the most glamorous and prestigious schools i.e. The Imperial Academy. Why would they instead join, effectively, a terrorist cell with a very low life expectancy unless they have a personal bone to pick with the Empire. [2]

The basic skill level of a stormtrooper is pretty much the same as the average Joe in the rebel alliance. And yet they perform terribly. As an example, let’s examine the scene in Star Wars just after our heroes have escaped from the trash compactor. They run into a group of seven stormtroopers. A few shots are fired and the entire group starts running away. If one examines the footage closely, at the time they start their retreat they can only see Han and bits of princess Leia. Luke and Chewie are behind a wall.

Think about that for a while. These are professional soldiers that come across some hippie and a girl. They are armed with deadly force, are specifically trained and have massive strength in numbers. Yet their instinctive decision is to run away. It’s kind of like having police officers who hide in their parent’s basements whenever they hear that a crime has been committed.

What could be the reason for this? The answer is simple: motivation. The common man inside the stormtrooper uniform probably does not care about the goals of the Empire. He just wants to get his paycheck and go home. What he really does not want is to get killed in any way. If you look at the behavior and motivation of stormtroopers throughout the series, doing everything possible not to get killed is pretty high on the list. Underachieving in their every day tasks is part of this because success means promotion, which means bigger probability of dealing with Vader, which in turn means higher probability of death by random Force choking than death in battle.

If this is the structure of your organization, the question is not why aren’t the workers performing well. The question is why would any sane person want to perform well.

There is one reason. We can deduce that by examining the cases where stormtroopers behave like an actual, efficient, deadly fighting force. There aren’t many of these, but let’s start with the beginning of Star Wars, the assault on princess Leia’s Corellian cruiser. The assault force knows what they are doing, shoot accurately, and take over the ship very efficiently.

There are a few other cases where this happens as well, but they are quite rare. There is one thing they have in common, though. The troops perform well only when Vader is personally overseeing them. This is classic management by fear. Every single troop knows that if they fail, they will get force choked to death. They might get choked even if they do just ok, just to set an example. So they really do their best.

The biggest problem with this approach is that it does not scale. Vader can’t be everywhere. Things work fine when he’s there. When he leaves, an entire legion of his best soldiers gets defeated by a dozen teddy bears with stone age technology.

Actually, let me take that back. The biggest problem is not the lack of scalability. The biggest problem is that Vader probably thinks that his troops are truly the best of the best. Why wouldn’t he? Whenever he is around, things work smoothly and efficiently. Who’s going to tell him that his so called elite troops are in fact complete garbage? Captain Needa?

This is the reason companies like Toyota and Google thrive. They care about their employees. They want them to participate in the decision making process. They want them to be part of the family, so to say, rather than being resources to be shifted around, shouted at and and summarily executed (though I don’t think any Fortune 500 company does executions at the present time).

The meaning of (a company’s) life

The main thesis of Steve Denning’s presentation is that the common view that a company’s purpose is to make money is flawed. Instead they should be delighting their customers. Making money is a result, not the goal. With that in mind, let’s ask a simple question.

What is the ultimate purpose of the Empire?

We hear very little about their goals on health care or education. As far as we can tell, the Empire is only the manifestation of the Emperor’s lust for power. He doesn’t care about the people. His only interest is in the power trip he gets from bossing them around. Just like certain corporations see their customers only as sponges to squeeze as much money out of as possible.

There are consequences.

When Luke talks to Obi-Wan for the first time he says “It’s not that I like the Empire, I hate it. But there’s nothing I can do about it now.” His delivery seems to indicate that this is a common attitude towards the Empire.

Remind of any corporations you know?

If we accept the special editions as canon, once the Emperor died, people started spontaneously partying in the streets, knocking down statues and shooting fireworks. After the tipping point everyone dropped the Empire like it was going out of fashion. One imagines that even people high up on the Empire’s chain of command would go around stating how they have always secretly supported the goals of the Rebellion.

The world is full of companies that have used their dominant position to extract money with inferior products. They have focused on cost cutting and profit maximisation rather than improving their customers’ lives. And they have been successful for a while. Once there has been a competitor that do cares about these things, the dominant player has usually collapsed. For an example see what has happened to Nokia after the release of the iPhone.

The only protection against collapse is to make your customers consistently happy. Should someone come out tomorrow with a new magical superphone that is up to 90% better than iPhone. Would current iPhone users switch out in masses? No they would not, because they have bought their current phone because it was the best for them, the one they really wanted. Not because it was the “crappy-but-only-possible” choice.

If your company is producing products of the latter type, your days are already numbered, but you just don’t know it yet. Just when you think you are at the height of your power, someone will grab you without warning and throw you over a railing. Most likely you will blame your failure on them. But you are wrong. You have brought your downfall on yourself.

Also, you are dead.

Footnotes

[1] Assuming that the Force Darth Sidious uses to inseminate Shmi Skywalker comes from himself. Somehow. In a way I don’t really want to know.

[2] The prequels seem to indicate that stormtroopers are clones. However that is probably not the case anymore in the time frame of the original trilogy. The original clones all spoke with the same voice. Stormtroopers speak with different voices. There are also variations in size and behavior. If they were clones, i.e. dispensable cannon fodder, it would make even less sense for them to be concerned about self-preservation.

API design fail school is in session

Today’s API design fail case study is Iconv. It is a library designed to convert text from one encoding to another. The basic API is very simple as it has only three function calls. Unfortunately two of them are wrong.

Let’s start with the initialisation. It looks like this:

iconv_t iconv_open(const char *tocode, const char *fromcode);

Having the tocode argument before the fromcode argument is wrong. It goes against the natural ordering that people have of the world. You convert from something to something and not to something from something. If you don’t believe me, go back and read the second sentence of this post. Notice how it was completely natural to you and should you try to change the word order in your mind, it will seem contrived and weird.

But let’s give this the benefit of the doubt. Maybe there is a good reason for having the order like this. Suppose the library was meant to be used only by people writing RPN calculators in Intel syntax assembly using the Curses graphics library. With that in mind, let’s move on to the second function as it is described in the documentation.

size_t iconv (iconv_t cd, const char* * inbuf, size_t * inbytesleft,
 char* * outbuf, size_t * outbytesleft); 

In this function the order is the opposite: source comes before target. Having the order backwards is bad, but having an inconsistent API such as this is inexcusable.

But wait, there is more!

If you look at the actual installed header file, this is not the API it actually provides. The second argument is not const in the implementation. So either you strdup your input string to keep it safe or cast away your const and hope/pray that the implementation does not fiddle around with it.

The API function is also needlessly complex, taking pointers to pointers and so on. This makes the common case of I have this string here and I want to convert it to this other string here terribly convoluted. It causes totally reasonable code like this to break.

char *str = read_from_file_or_somewhere();
iconv(i, &str, size_str, &outbuf, size_outbuf);

Iconv will change where str points to and if it was your only pointer to the data (which is very common) you have just lost access to it. To get around this you have to instantiate a new dummy pointer variable and pass that to iconv. If you don’t and try to use the mutilated pointers to, say, deallocate a temporary buffer you get interesting and magical crashes.

Passing the conversion types to iconv_open as strings is also tedious. You can never tell if your converter will work or not. If it fails, Iconv will not tell you why. Maybe you have a typo. Maybe this encoding has magically disappeared in this version. For this reason the encoding types should be declared in an enum. If there are very rare encodings that don’t get built on all platforms, there should be a function to query their existence.

A better API for iconv would take the current conversion function and rename it to iconv_advanced or something. The basic iconv function (the one 95% of people use 95% of the time) should look something like this:

int iconv(encoding fromEncoding, encoding toEncoding,
  errorBehaviour eb,
  const char *source, size_t sourceSize,
  char *target, size_t targetSize);

ErrorBehaviour tells what to do when encountering errors (ignore, stop, etc). The return value could be total number of characters converted or some kind of an error code. Alternatively it could allocate the target buffer by itself, possibly with a user defined allocator function.

The downside of this function is that it takes 7 arguments, which is a bit too much. The first three could be stored in an iconv_t type for clarity.

Scream if you want to go faster (with C++)!

We all know that compiling C++ is slow.

Fewer people know why, or how to make it faster. Other people do, for example the developers at Remedy made the engine of Alan Wake compile from scratch in five minutes. The payoff for this is increased productivity, because the edit-compile-run cycle gets dramatically faster.

There are several ways to speed up your compiles. This post looks at reworking your #includes.

Quite a bit of C++ compilation time is spent parsing headers for STL, Qt and whatever else you may be using. But how long does it actually take?

To find out, I wrote a script to generate C++ source. You can download it here. What it does is generate source files that have some includes and one dummy function. The point is to simulate two different use cases. In the first each source file includes a random subset of the includes. One file might use std::map and QtCore, another one might use Boost’s strings and so on. In the second case all possible includes are put in a common header which all source files include. This simulates “maximum developer convenience” where all functions are available in all files without any extra effort.

To generate the test data, we run the following commands:

mkdir good bad
./generate_code.py --with-boost --with-qt4 good
./generate_code.py --with-boost --with-qt4 --all-common bad

Compilation is straightforward:

cd good; cmake .; time make; cd ..
cd bad; cmake .; time make; cd ..

By default the script produces 100 source files. When the includes are kept in individual files, compiling takes roughly a minute. When they are in a common header, it takes three minutes.

Remember: the included STL/Boost/Qt4 functionality is not used in the code. This is just the time spent including and parsing their headers. What this example shows is that you can remove 2 minutes of your build time, just by including C++ headers smartly.

The delay scales linearly. For 300 files the build times are 2 minutes 40 seconds and 7 minutes 58 seconds. That’s over five minutes lost on, effectively, no-ops. The good news is that getting rid of this bloat is relatively easy, though it might take some sweat.

  1. Never include any (internal) header in another header if you can use a forward declaration. Include the header in the implementation file.
  2. Never include system headers (STL, etc) in your headers unless absolutely necessary, such as due to inheritance. If your class uses e.g. std::map internally, hide it with pImpl. If your class API requires these headers, change it so that it doesn’t or use something more lightweight (e.g. std::iterator instead of std::vector).
  3. Never, never, ever include system stuff in your public headers. That slows down not just your own compilation time, but also every single user of your library. The only exception is when your library is a plugin or extension to an existing library and even then your includes need to be minimal.

Rejecting patches the easy way

The main point of open source is that anyone can send patches to improve projects. This, of course, is very damaging to the Super Ego of the head Cowboy Coder in charge. Usually this means that he has to read patch, analyze it, understand it, and then write a meaningful rejection email.

Or you could just use one of the strategies below. They give you tools to reject any patch with ease.

The Critical Resource

Find any increase in resources (no matter how tiny or contrived) and claim that to be a the most scarce thing in the universe. Then reject due to increased usage.

A sample discussion might go something like this:

- Here’s a patch that adds a cache for recent results making the expensive operation 300% faster.

- This causes an increase in memory usage which is unacceptable.

- The current footprint is 50 MB, this cache only adds less than 10k and the common target machine running this app has 2GB of memory.

- You are too stupid to understand memory optimisation. Go away.

 The suffering minority

When faced with a patch that makes things better for 99.9% of the cases and slightly worse for the rest, focus only on the 0.01%. Never comment on the majority. Your replies must only ever discuss the one group you (pretend to) care about.

- I have invented this thing called the auto-mobile. This makes it easier for factory workers to come to work every morning.

- But what about those that live right next to the factory? Requiring them to purchase and maintain auto-mobiles is a totally unacceptable burden.

- No-one is forcing anyone. Every employer is free to obtain their own auto-mobiles if they so choose.

- SILENCE! I will not have you repress my workers!

 The Not Good  Enough

Think up a performance requirement that the new code does not fulfill. Reject. If the submitter makes a new patch which does meet the requirement, just make it stricter until they give up.

- This patch drops the average time from 100 ms to 30 ms.

- We have a hard requirement that the operation must take only 10 ms. This patch is too slow, so rejecting.

- But the current code does not reach that either, and this patch gets us closer to the requirement.

- No! Not fast enough! Not going in.

 The Prevents Portability

Find any advanced feature. Reject based this feature not being widely available and thus increases the maintenance burden.

- Here is a patch to fix issue foo.

- This patch uses compiler feature bar, which is not always available.

- It has been available in every single compiler in the world since 1987.

- And what if we need to compile with a compiler from 1986? What then, mr smartypants? Hmmm?

The Does not Cure World Hunger

This approach judges the patch not on what actually is, but rather what it is not. Think of a requirement, no matter how crazy or irrelevant, and reject.

- This patch will speed up email processing by 4%.

- Does it prevent every spammer in the world from sending spam, even from machines not running our software?

- No.

- How dare you waste my time with this kind of useless-in-the-grand-scheme-of-things patch!

The absolute silence

This is arguably the easiest. Never, ever reply to any patches you don’t care about. Eventually the submitter gives up and goes away all by himself.

Smooth scrolling available in Chromium

What currently happens when you drag two fingers on a touchpad is that the X server intercepts those touches and sends mouse wheel events to applications. The semantics of a mouse wheel event are roughly “move down/up three lines”. This is jerky and not very pleasant. There has been no way of doing pixel perfect scrolling.

With the recent work on X multitouch and the uTouch gesture stack, smoothness has now become possible. Witness pixel accurate scrolling in Chromium in this Youtube video.

The remaining jerkiness in the video is mainly caused by Chromium redrawing its window contents from scratch whenever the viewport is moved.

The code is available in Chromium’s code review site.

Speed bumps hide in places where you least expect them

The most common step in creating software is building it. Usually this means running make or equivalent and waiting. This step is so universal that most people don’t even think about it actively. If one were to see what the computer is doing during build, one would see compiler processes taking 100% of the machine’s CPUs. Thus the system is working as fast as it possibly can.

Right?

Some people working on Chromium doubted this and built their own replacement of Make called Ninja. It is basically the same as Make: you specify a list of dependencies and then tell it to build something. Since Make is one of the most used applications in the world and has been under development since the 70s, surely it is as fast as it can possibly be done.

Right?

Well, let’s find out. Chromium uses a build system called Gyp that generates makefiles. Chromium devs have created a Ninja backend for Gyp. This makes comparing the two extremely easy.

Compiling Chromium from scratch on a dual core desktop machine with makefiles takes around 90 minutes. Ninja builds it in less than an hour. A quad core machine builds Chromium in ~70 minutes. Ninja takes ~40 minutes. Running make on a tree with no changes at all takes 3 minutes. Ninja takes 3 seconds.

So not only is Ninja faster than Make, it is faster by a huge margin and especially on the use case that matters for the average developer: small incremental changes.

What can we learn from this?

There is an old (and very wise) saying that you should never optimize before you measure. In this case the measurement seemed to indicate that nothing was to be done: CPU load was already maximized by the compiler processes. But sometimes your tools give you misleading data. Sometimes they lie to you. Sometimes the “common knowledge” of the entire development community is wrong. Sometimes you just have to do the stupid, irrational, waste-of-time -thingie.

This is called progress.

PS I made quick-n-dirty packages of a Ninja git checkout from a few days ago and put them in my PPA. Feel free to try them out. There is also an experimental CMake backend for Ninja so anyone with a CMake project can easily try what kind of a speedup they would get.

Solution to all API and ABI mismatch issues

One of the most annoying things about creating shared libraries for other people to use is API and ABI stability. You start going somewhere, make a release and then realize that you have to totally change the internals of the library. But you can’t remove functions, because that would break existing apps. Nor can you change structs, the meanings of fields or any other maintenance task to make your job easier. The only bright spot in the horizont is that eventually you can do a major release and break compatibility.

We’ve all been there and it sucks. If you choose to ignore stability because, say, you have only a few users who can just recompile their stuff, you get into endless rebuild cycles and so on. But what if there was a way to eliminate all this in one, swift, elegant stroke?

Well, there is.

Essentially every single library can be reduced to one simple function call that looks kind of like this.

library_result library_do(const char *command, library_object *obj, ...)

The command argument tells the library what to do. The arguments tell it what to do it to and the result tells what happened. Easy as pie!

So, to use a car analogy, here’s an example of how you would start a car.

library_object *car;
library_result result = library_do("initialize car", NULL);
car = RESULT_TO_POINTER(result);
library_do("start engine", car);
library_do("push accelerometer", car);

Now you have a moving car and you have also completely isolated the app from the library using an API that will never need to be changed. It is perfectly forwards, backwards and sideways compatible.

And it gets better. You can query capabilities on the fly and act accordingly.

if(RESULT_TO_BOOLEAN(library_do("has automatic transmission", car))
  do_something();

Dynamic detection of features and changing behavior based on them makes apps work with every version of the library ever. The car could even be changed into a moped, tractor, or a space shuttle and it would still work.

For added convenience the basic commands could be given as constant strings in the library’s header file.

Deeper analysis

If you, dear reader, after reading the above text thought, even for one microsecond, that the described system sounds like a good idea, then you need to stop programming immediately.

Seriously!

Take your hands away from the keyboard and just walk away. As an alternative I suggest taking up sheep farming in New Zealand. There’s lots of fresh air and a sense of accomplishment.

The API discussed above is among the worst design abominations imaginable. It is the epitome of Making My Problem Your Problem. Yet variants of it keep appearing all the time.

The antipatterns and problems in this one single function call would be enough to fill a book. Here are just some of them.

Loss of type safety

This is the big one. The arguments in the function call can be anything and the result can be anything. So which one of the following should you use:

library_do("set x", o, int_variable);
library_do("set x", o, &int_variable);
library_do("set x", o, double_variable);
library_do("set x", o, &double_variable);
library_do("set x", o, value_as_string)

You can’t really know without reading the documentation. Which you have to do every single time you use any function. If you are lucky, the calling convention is the same on every function. It probably is not. Since the compiler does not and can not verify correctness, what you essentially have is code that works either by luck or faith.

The only way to know for sure what to do is to read the source code of the implementation.

Loss of tools

There are a lot of nice tools to help you. Things such as IDE code autocompletion, API inspectors, Doxygen, even the compiler itself as discussed above.

If you go the generic route you throw away all of these tools. They account for dozens upon dozens of man-years just to make your job easier. All of that is gone. Poof!

Loss of debuggability

One symptom of this disease is putting data in dictionaries and other high level containers rather than variables to “allow easy expansion in the future”. This is workable in languages such as Java or Python, but not in C/C++. Here is screengrab from a gdb session demonstrating why this is a terrible idea:

(gdb) print map
$1 = {_M_t = {
    _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >> = {<No data fields>}, <No data fields>},
      _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {_M_color = std::_S_red, _M_parent = 0x607040,
        _M_left = 0x607040, _M_right = 0x607040}, _M_node_count = 1}}}

Your objects have now become undebuggable. Or at the very least extremely cumbersome, because you have to dig out the information you need one tedious step at a time. If the error is non-obvious, it’s source code diving time again.

Loss of performance

Functions are nice. They are type-safe, easy to understand and fast. The compiler might even inline them for you. Generic action operators are not.

Every single call to the library needs to first go through a long if/else tree to inspect which command was given or do a hash table lookup or something similar. This means that every single function call turns into a a massive blob of code that destroys branch prediction and pipelining and all those other wonderful things HW engineers have spent decades optimizing for you.

Loss of error-freeness

The code examples above have been too clean. They have ignored the error cases. Here’s two lines of code to illustrate the difference.

x = get_x(obj); // Can not possibly fail
status = library_do("get x", obj); // Anything can happen

Since the generic function can not provide any guarantees the way a function can, you have to always inspect the result it provides. Maybe you misspelled the command. Maybe this particular object does not have an x value. Maybe it used to but the library internals have changed (which was the point of all this, remember?). So the user has to inspect every single call even for operations that can not possibly fail. Because they can, they will, and if you don’t check, it is your fault!

Loss of consistency

When people are confronted with APIs such as these, the first thing they do is to write wrapper functions to hide the ugliness. Instead of a direct function call you end up with a massive generic invocation blob thingie that gets wrapped in a function call that is indistinguishable from the direct function call.

The end result is an abstraction layer covered by an anti-abstraction layer; a concretisation layer, if you will.

Several layers, actually, since every user will code their own wrapper with their own idiosyncrasies and bugs.

Loss of language features

Let’s say you want the x and y coordinates from an object. Usually you would use a struct. With a generic getter you can not, because a struct implies memory layout and thus is a part of API and ABI. Since we can’t have that, all arguments must be elementary data types, such as integers or strings. What you end up with are constructs such as this abomination here (error checking and the like omitted for sanity):

obj = RESULT_TO_POINTER(library_do("create FooObj", NULL);
library_do("set constructor argument a", obj, 0);
library_do("set constructor argument b", obj, "hello");
library_do("set constructor argument c", obj, 5L);
library_do("run constructor", obj)

Which is so much nicer than

object *obj = new_object(0, "hello", 5); // No need to cast to Long, the compiler does that automatically.

Bonus question: how many different potentially failing code paths can you find in the first code snippet and how much protective code do you need to write to handle all of them?

Where does it come from?

These sorts of APIs usually stem from their designers’ desire to “not limit choices needlessly”, or “make it flexible enough for any change in the future”. There are several different symptoms of this tendency, such as the inner platform effect, the second system effect and soft coding. The end result is usually a framework framework framework.

How can one avoid this trap? There is really no definitive answer, but there is a simple guideline to help you get there. Simply ask yourself: “Is this code solving the problem at hand in the most direct and obvious way possible?” If the answer is no, you probably need to change it. Sooner rather than later.

Things just working

I have a Macbook with a bcm4331 wireless chip that has not been supported in Linux. The driver was added to kernel 3.2. I was anxious to test this when I upgraded to precise.

After the update there was no net connection. The network indicator said “Missing firmware”. So I scoured the net and found the steps necessary to extract the firmware file to the correct directory.

I typed the command and pressed enter. That exact second my network indicator started blinking and a few seconds later it had connected.

Without any configuration, kernel module unloading/loading or “refresh state” button prodding.

It just worked. Automatically. As it should. And even before it worked it gave a sensible and correct error message.

To whoever coded this functionality: I salute you.

More uses for btrfs snapshots

I played around with btrfs snapshots and discovered two new interesting uses for them. The first one deals with unreliable operations. Suppose you want to update a largish SVN checkout but your net connection is slightly flaky. The reason can be anything, bad wires, overloaded server, electrical outages, and so on.

If SVN is interrupted mid-transfer, it will most likely leave your checkout in a non-consistent state that can’t be fixed even with ‘svn cleanup’. The common wisdom on the Internet is that the way to fix this is to delete or rename the erroneous directory and do a ‘svn update’, which will either work or not. With btrfs snapshots you can just do a snapshot of your source tree before the update. If it fails, just nuke the broken directory and restore your snapshot. Then try again. If it works, just get rid of the snapshot dir.

What you essentially gain are atomic operations on non-atomic tasks (such as svn update). This has been possible before with ‘cp -r’ or similar hacks, but they are slow. Btrfs snapshots can be done in the blink of an eye and they don’t take extra disk space.

The other use case is erroneous state preservation. Suppose you hack on your stuff and encounter a crashing bug in your tools (such as bzr or git). You file a bug on it and then get back to doing your own thing. A day or two later you get a reply on your bug report saying “what is the output of command X”. Since you don’t have the given directory tree state around any more, you can’t run the command.

But if you snapshot your broken tree and store it somewhere safe, you can run any analysis scripts on it any time in the future. Even possibly destructive ones, because you can always run the analysis scripts in a fresh snapshot. Earlier these things were not feasible because making copies took time and possibly lots of space. With snapshots they don’t.

Fun stuff with btrfs

I work on, among other things, Chromium. It uses SVN as its revision control system. There are several drawbacks to this, which are well known (no offline commits etc). They are made worse by Chromium’s enormous size. An ‘svn update’ can easily take over an hour.

Recently I looked into using btrfs’s features to make things easier. I found that with very little effort you can make things much more workable.

First you create a btrfs subvolume.

btrfs subvolume create chromium_upstream

Then you check out Chromium to this directory using the guidelines given in their wiki. Now you have a pristine upstream SVN checkout. Then build it once. No development is done in this directory. Instead we create a new directory for our work.

btrfs subvolume snapshot chromium_upstream chromium_feature_x

And roughly three seconds later you have a fresh copy of the entire source tree and the corresponding build tree. Any changes you make to individual files in the new directory won’t cause a total rebuild (which also takes hours). You can hack with complete peace of mind knowing that in the event of failure you can start over with two simple commands.

sudo btrfs subvolume delete chromium_feature_x
btrfs subvolume snapshot chromium_upstream chromium_feature_x

Chromium upstream changes quite rapidly, so keeping up with it with SVN can be tricky. But btrfs makes it easier.

cd chromium_upstream
gclient sync # Roughly analogous to svn update.
cd ..
btrfs subvolume snapshot chromium_upstream chromium_feature_x_v2
cd chromium_feature_x/src && svn diff > ../../thingy.patch && cd ../..
cd chromium_feature_x_v2/src && patch -p0 < ../../thingy.patch && cd ../..
sudo btrfs subvolume delete chromium_feature_

This approach can be taken with any tree of files: images, even multi-gigabyte video files. Thanks to btrfs’s design, multiple copies of these files take roughly the same amount of disk space as only one copy. It’s kind of like having backup/restore and revision control built into your file system.