Canonical Voices

Posts tagged with 'development'

Jussi Pakkanen

Bug finding tools

In Canonical’s recent devices sprint I held a presentation on automatic bug detection tools. The slides are now available here and contain info on tools such as:



Read more
Robin Winslow

On release day we can get up to 8,000 requests a second to from people trying to download the new release. In fact, last October (13.10) was the first release day in a long time that the site didn’t crash under the load at some point during the day (huge credit to the infrastructure team). has been running on Drupal, but we’ve been gradually migrating it to a more bespoke Django based system. In March we started work on migrating the download section in time for the release of Trusty Tahr. This was a prime opportunity to look for ways to reduce some of the load on the servers.

Choosing geolocated download mirrors is hard work for an application

When someone downloads Ubuntu from (on a thank-you page), they are actually sent to one of the 300 or so mirror sites that’s nearby.

To pick a mirror for the user, the application has to:

  1. Decide from the client’s IP address what country they’re in
  2. Get the list of mirrors and find the ones that are in their country
  3. Randomly pick them a mirror, while sending more people to mirrors with higher bandwidth

This process is by far the most intensive operation on the whole site, not because these tasks are particularly complicated in themselves, but because this needs to be done for each and every user – potentially 8,000 a second while every other page on the site can be aggressively cached to prevent most requests from hitting the application itself.

For the site to be able to handle this load, we’d need to load-balance requests across perhaps 40 VMs.

Can everything be done client-side?

Our first thought was to embed the entire mirror list in the thank-you page and use JavaScript in the users’ browsers to select an appropriate mirror. This would drastically reduce the load on the application, because the download page would then be effectively static and cache-able like every other page.

The only way to reliably get the user’s location client-side is with the geolocation API, which is only supported by 85% of users’ browsers. Another slight issue is that the user has to give permission before they could be assigned a mirror, which would slightly hinder their experience.

This solution would inconvenience users just a bit too much. So we found a trade-off:

A mixed solution – Apache geolocation

mod_geoip2 for Apache can apply server rules based on a user’s location and is much faster than doing geolocation at the application level. This means that we can use Apache to send users to a country-specific version of the download page (e.g. the German desktop thank-you page) by adding &country=GB to the end of the URL.

These country specific pages contain the list of mirrors for that country, and each one can now be cached, vastly reducing the load on the server. Client-side JavaScript randomly selects a mirror for the user, weighted by the bandwidth of each mirror, and kicks off their download, without the need for client-side geolocation support.

This solution was successfully implemented shortly before the release of Trusty Tahr.

(This article was also posted on

Read more
Jussi Pakkanen

A use case that pops up every now and then is to have a self-contained object that needs to be accessed from multiple threads. The problem appears when the object, as part of its usual things calls its own methods. This leads to tricky locking operations, a need to use a recursive mutex or something else that is nonoptimal.

Another common approach is to use the pimpl idiom, which hides the contents of an object inside a hidden private object. There are ample details on the internet, but the basic setup of a pimpl’d class is the following. First of all we have the class header:

class Foo {
    void func1();
    void func2();

    class Private;
    std::unique_ptr<Private> p;

Then in the implementation file you have first the defintiion of the private class.

class Foo::Private {
    void func1() { ... };
    void func2() { ... };

   void privateFunc() { ... };
   int x;

Followed by the definition of the main class.

Foo::Foo() : p(new Private) {

void Foo::func1() {

void Foo::func2() {

That is, Foo only calls the implementation bits in Foo::Private.

The main idea to realize is that Foo::Private can never call functions of Foo. Thus if we can isolate the locking bits inside Foo, the functionality inside Foo::Private becomes automatically thread safe. The way to accomplish this is simple. First you add a (public) std::mutex m to Foo::Private. Then you just change the functions of Foo to look like this:

void Foo::func1() {
    std::lock_guard<std::mutex> guard(p->m);

void Foo::func2() {
    std::lock_guard<std::mutex> guard(p->m);

This accomplishes many things nicely:

  • Lock guards make locks impossible to leak, no matter what happens
  • Foo::Private can pretend that it is single-threaded which usually makes implementation a lot easier

The main drawback of this approach is that the locking is coarse, which may be a problem when squeezing out ultimate performance. But usually you don’t need that.

Read more
Jussi Pakkanen

There are usually two different ways of doing something. The first is the correct way. The second is the easy way.

As an example of this, let’s look at using the functionality of C++ standard library. The correct way is to use the fully qualified name, such as std::vector or std::chrono::milliseconds. The easy way is to have using std; and then just using the class names directly.

The first way is the “correct” one as it prevents symbol clashes and for a bunch of other good reasons. The latter leads to all sorts of problems and for this reason many style guides etc prohibit its use.

But there is a catch. Software is written by humans and humans have a peculiar tendency.

They will always do the easy thing.

There is no possible way for you to prevent them from doing that, apart from standing behind their back and watching every letter they type.

Any sort of system that relies, in any way, on the fact that people will do the right thing rather than the easy thing are doomed to fail from the start. They. Will. Not. Work. And they can’t be made to work. Trying to force it to work leads only to massive shouting and bad blood.

What does this mean to you, the software developer?

It means that the only way your application/library/tool/whatever is going to succeed is that correct thing to do must also be the simplest thing to do. That is the only way to make people do the right thing consistently.

Read more
Anthony Dillon

table.highlight { margin-bottom: 0; } table.highlight td { text-align: left; font-size: 0.8em; line-height: 1.6; border: 0; }

This post is part of the series ‘Making responsive‘.

The JavaScript used on is very light. We limit its use to small functional elements of the web style guide, which act to enhance the user experience but are never required to deliver the content the user is there to consume.

At Canonical we use YUI as our JavaScript framework of choice. We have many years of using it for our websites and web apps therefore have a large knowledge base to fall back on. We have a single core.js which contains a number of functions called on when required.

Below I will discuss some of the functions and workarounds we have provided in the web style guide.

Providing fallbacks

When considering our transition from PNGs to SVGs across the site, we provided a fallback for background images with Modernizr and reset the background image with the .no-svg class on the body. Our approached to a fallback replacement in markup images was a JavaScript snippet from CSS Tricks – SVG Fallbacks, which I converted to YUI:

The snippet above checks if Modernizr exists in the namespace. It then interrogates the Modernizr object for SVG support. If the browser does not support SVGs we loop through each image with .svg contained in the src and replace the src with the same path and filename but a .png version. This means all SVGs need to have a PNG version at the same location.

Navigation and fallback

The mobile navigation on uses JavaScript to toggle the menu open and closed. We decided to use JavaScript because it’s well supported. We explored using :target as a pure CSS solution, but this selector isn’t supported in Internet Explorer 7, which represented a fair chunk of our visitors.

mobile-open-navThe navigation on, in small screens.

For browsers that don’t support JavaScript we resort to displaying the “burger” icon, which acts as an in-page anchor to the footer which contains the site navigation.

Equal height

As part of the guidelines project we needed a way of setting a number of elements to the same height. We would love to use the flexbox to do this but the browser support is not there yet. Therefore we developed a small JavaScript solution:

This function finds all elements with an .equal-height class. We then look for child divs or lis and measure the tallest one. Then set all these children to the highest value.

Using combined YUI

One of the obstacles discovered when working on this project was that YUI will load modules from an http (non secure) domain as the library requires. This of course causes issues on any site that is hosted on a secure domain. We definitely didn’t want to restrict the use of the web style guide to non secure sites, therefore we need combine all required modules into a combined YUI file.

To combine your own YUI visit YUI configurator. Add the modules you require and copy the code from the Output Console panel into your own hosted file.

Final thoughts

Obviously we had a fairly easy time of making our JavaScript responsive as we only use the minimum required as a general principle on our site. But using integrating tools like Modernizr into our workflow and keeping top of CSS browser support, we can keep what we do lean and current.

Read the next post in this series: “Making responsive: testing on multiple devices”

Reading list

Read more
Anthony Dillon

This post is part of the series ‘Making responsive‘.

Performance has always been one of the top priorities when it came to building the responsive We started with a list of performance snags and worked to improve each one as much as possible in the time we had. Here is a quick run through of the points we collected and the way we managed to improve them.

Asset caching

We now have a number of websites using our web style guide. Because of this, we needed to deliver assets on both http and secure https domains. We decided to build an asset server to support the guidelines and other sites that require asset hosting.

This gave us the ability to increase the far future expires (FFE) of each file. By doing so the file is cached by the server and not resupplied. This gives us a much faster round trip speed. But as we are still able to update a single file we cannot set the FFE too far in the future. We plan to resolve this with a new and improved assets system, which is currently under development.

The new asset system will have a internal frontend to upload a binary file. This will provide a link to the asset with a 6 character hexadecimal attached to the file name.


The new system restricts the ability to edit or update a file. Only upload a new one and change the link in the markup. This guarantees the asset to stay the same forever.

Minification and concatenation

We introduced a minification and concatenation step to the build of the web style guide. This saves precious bytes and reduces the number of requests performed by each page.

We use the sass ruby gem to generate minified and concatenated CSS in production. We also run the small amount of JavaScript we have through UglifyJS before delivering to production.

Compressed images

Images were the main issue when it came to performance.

We had a look at the file sizes of some of our key images (like the ones in the tablet section of the site) and were shocked to discover we hadn’t been treating our visitors’ bandwidth kindly.

After analysing a handful of images, we decided to have a look into our assets folder and flag the images that were over 100 KB as a first go.

One of the largest time consuming jobs in this project was converting all images that could to SVGs. This meant creating pictograms and illustrations as vectors from earlier PNGs. Any images that could not be recreated as a vector graphic were heavy compressed. This squeezed an alarming amount out of the original file.

We continued this for every image on the site. By doing so the total reduction across the site was 7.712MB.

Reduce required fonts

We currently load a large selection of the Ubuntu font.

<link href='//,300,300italic,400italic,700,700italic%7CUbuntu+Mono' rel='stylesheet' type='text/css' />

The designers are exploring the patterns of the present and ideal future to discover unneeded types. Since the move from normal font weight to light a few months ago as our base font style, we rarely use the bold weight (700) anymore, resorting to normal (400) for highlighting text.

Once we determine which weights we can drop, we will be able to make significant savings, as seen below:

google-fonts-beforeandafterReducing loaded fonts: before and after

Using SVG

Taking the leap to SVGs over PNG caused a number of issues. We decided to load SVGs as opposed to inline SVGs to keep our markup clean and easy to read and update. This meant we needed to provide four different coloured images for each pictogram.


We introduced Modernizr to give us an easy way to detect browsers that do not support SVGs and replace the image with PNGs of the same path and name.

Remove unnecessary enhancements

We explored a parallaxing effect for our site’s background with JavaScript. With worked well on normal resolution screens but lagged on retina displays, so we decided not do it and set the background position to static instead — user experience is always paramount and trumps visual enhancements.

Future improvements

One of the things in our roadmap is to remove unused styles remaining in the stylesheets. There are a number of solutions for this such as grunt-uncss.


There is still a lot to do but we have definitely broken the back of the work to steer in the right direction. The aim is to push the site up to 90+ in the speed page tests in the next wave of updates.

Read the next post in this series: “Making responsive: JavaScript considerations”

Reading list

Read more
Anthony Dillon

This post is part of the series ‘Making responsive‘.

When working to make the current web style guide responsive, we made some large updates to the core Sass. We decided to update the file and folder structure of our styles. I love reading about other people or organisations Sass architectures, so I thought it would be only right to share the structure that has evolved over time here at Canonical.

Let’s get right to it.

  • core.scss
  • core-constants.scss
  • core-grid.scss
  • core-mixins.scss
  • core-print.scss
  • core-templates.scss
  • patterns
    • patterns.scss
    • _arrows.scss
    • _blockquotes.scss
    • _boxes.scss
    • _buttons.scss
    • _contextual-footer.scss
    • _footer.scss
    • _forms.scss
    • _header.scss
    • _helpers.scss
    • _image-centered.scss
    • _inline-logos.scss
    • _lists.scss
    • _notifications.scss
    • _resource.scss
    • _rows.scss
    • _slider.scss
    • _structure.scss
    • _tabbed-content.scss
    • _tooltips.scss
    • _typography.scss
    • _vertical-divider.scss

I won’t describe each file as some are self-explanatory but let’s just go through the core files to understand the structure.

core.scss contains the core HTML element styling. Such as img, p, ul, etc. You could say this acts as a reset file customised to match our style.

core-constants.scss is home to all variables used throughout. This file contains all the set colours used on the site. Base font size and some extra grid variables used to extend the layout.

core-grid.scss holds the entire responsive grid styles. This file mainly consists of generated code from Gridinator which we extended with breakpoints to modify the layout as the viewport gets smaller. You can read more about how we did this in “Making responsive: making our grid responsive”.

core-mixins.scss holds all the mixins used in our Sass.

core-templates.scss is used to hold full pages styling classes. Without applying a template class to the <body> of a page you get a standard page style, if you add a template class, you will get the styles that are appropriate for that template.

webteam frontend working on web style guideWeb team front end working on the web style guide.

Divide and conquer

Patterns were originally all in one huge scss file, which became difficult to maintain. So we decided to split the patterns file apart in a pattern folder. This allows us to find and work in a much more modular way. This involved manually working through the file. Removing all the components styles into a new file and import back into the same position.

Naming conventions

Our mission when setting up the naming convention for our CSS was to make the markup as human readable as possible.

We decided early on to almost use a object oriented, inheritance system for large structural elements. For example, the class .row can be extended by adding the .row-enterprise class which applies a dark aubergine background and modifies the elements inside to be display correctly on a dark background.

We switch to a single class approach for small modular components, such as lists. If you apply the class .list the list items are styled with our simple Ubuntu list style. This can be modified by changing the class to .list-ubuntu or .list-canonical, which apply their corresponding branding themed bullets to the items.

list-stylesList styles.

The decision to use different systems arose from the desire to keep the markup clean and easy to skim read by limiting the classes applied to each element. We could have continued with the inheritance system for smaller elements but that would have lead to two or more classes (.list and .list-canonical) for each element. We felt this was overkill for every small component. For large structural elements such as rows it’s easier to start with a .row class and have added functionality and styling by adding classes.


We mainly use mixins to handle browser prefixes as we haven’t yet added a “prefixer” step to our build system.

A lot of our styles are quite specific and therefore would not benefit from being included as a mixin.

A note on Block, Element, Modifier syntax

We would like to have used the Block, Element, Modifier (BEM) syntax as we think it is a good convention and easy for people external to the project to understand and use. Since we started this project back in 2013 with the above syntax, which is now used on a number of sites across the Canonical/Ubuntu web real estate, the effort to convert every class name to follow the BEM naming convention would far outweigh the benefits it would return.


By splitting our bloated patterns file into multiple small modular files we have made it much easier to maintain and diagnose bugs within components. I would recommend anyone in a similar situation to find the time to split the components into separate files sooner rather then later. The effort grows exponentially the longer it’s left.

Introducing linting to the production of the guidelines will keep our coding style the same throughout the team and help readability to new members of the team.

Read the next post in this series: “Making responsive: ensuring performance”

Reading list

Read more
Jussi Pakkanen

Code review is generally acknowledged to be one of the major tools in modern software development. The reasons for this are simple, it spreads knowledge of the code around the team, obvious bugs and design flaws are spotted early, which makes everyone happy and so on.

But yet our code bases are full of horrible flaws, glaring security holes, unoptimizable algorithms and everything that drives a grown man to sob uncontrollably. These are the sorts of things code review was designed to stop and prevent, so why are they there. Let’s examine this with a thought experiment.

Suppose you are working on a team. Your team member Joe has been given the task of implementing new functionality. For simplicity’s sake let us assume that the functionality is adding together two integers. Then off Joe goes and returns after a few days (possibly weeks) with something.

And I really mean Something.

Instead of coding the addition function, he has created an entire new framework for arbitrary arithmetic operations. The reasoning for this is that it is “more general” because it can represent any mathematical operation (only addition is implemented, though, and trying to use any other operation fails silently with corrupt data). The core is implemented in a multithreaded async callback spaghetti hell that only has a data race on 93% of the time (the remaining 7% covers the one existing test case).

There is only one possible code review for this kind of an achievement.

Review result: Rejected
Comments: If there was a programmer's equivalent to chemical
castration, it would already have been administered to you.

In the ideal world that would be it. The real world has certain impurities as far as this thing goes. The first thing to note is that you have to keep working with whoever wrote the code for an unforeseeable amount of time. Aggravating your coworkers consistenly is not a very nice thing to do and gives you the unfortunate label of “assh*ole”. Things get even worse if Joe is in any way your boss, because critizising his code may get you on the fast track to the basement or possibly the unemployment line. The plain fact, however, is that this piece of code must never be merged. It can’t be fixed by review comments. All of it must be thrown away and replaced with something sane.

At this point office politics enter the fray. Most corporations have deadlines to meet and products to ship. Should you try to block the entry of this code (which implements a Feature, no less, or at least a fraction of one) makes you the bad guy. The code that Joe has written is an expense and if there is one thing organisations do not want to hear it is the fact that they have just wasted a ton of effort. The Code is There and it Must Be Used this Instant! Expect to hear comments of the following kind:

  • Why are you being so negative?
  • The Product Vision requires addition of two numbers. Why are you working against the Vision?
  • Do you want to be the guy that single-handedly destroyed the entire product?
  • This piece of code adds a functionality we did not have before. It is imperative that we get it in now (the product is expected to ship in one year from now)!
  • There is no time to rewrite Joe’s work so we must merge this (even though reimplementing just the functionality would take less effort than even just fixing the obvious bugs)

This onslaught continues until eventually you give in, do the “team decision”, accept the merge and drink yourself unconscious fully aware of the fact that you have to fix all these bugs once someone starts using them (you are not allowed to rewrite it, you must fix the existing code, for that is Law). For some reason or another Joe seems to have magically been transferred somewhere else to work his magic.

For this simple reason code review does not work very well in most offices. If you only ever get comments about how to format your braces, this may be affecting you. In contrast code reviews work quite well in other circumstances. The first one of them is the Linux kernel.

The code that gets into the kernel is being watched over by lots of people. More importantly it is being watched over by people who don’t work for you, your company or their subsidiaries. Linus Torvalds does not care one iota about your company’s quarterly sales goals, launch dates or corporate goals. The only thing he cares about is whether your code is any good. If it is not, it won’t get merged and there is nothing you can do about it. There is no middle manager you can appeal to or HR you can usurp on someone. Unless you have proven yourself, the code reviewers will treat you like an enemy. Anything you do will be scrutinised, analysed, dissected and even outright rejected. This intercorporate fire wall is good because it ensures that terrible code is not merged (sometimes poor code falls through the cracks, though, but such is life). On the other hand this sort of thing causes massive flame wars every now and then.

This does not work in corporate environments, though, for the reasons listed. One way to make it work is to have a master code reviewer who does not care about what other people might think. Someone who can summarily reject awful code without a lengthy “let’s see if we can make it better” discussion. Someone who, when the sales people come to him demanding something to be done half-assed, can tell them to buzz of. Someone who does not care about hurting people’s feelings.

In other words, a psychopath.

Like most things in life, having a psychopath in charge of your code has some downsides. Most of them flow from the fact that psychopaths are usually not very nice to work with.Also, one of the things that is worse than not having code review is having a psychopath master code reviewer that is incompetent or otherwise deluded. Unfortunately most psychopaths are of the latter kind.

So there you have it: the path to high quality code is paved with psychopaths and sworn enemies.

Read more
Jussi Pakkanen

Threads are a bit like fetishes: some people can’t get enough of them and other people just can’t see what the point is. This leads to eternal battles between “we need the power” and “this is too complex”. These have a tendency to never end well.

One inescapable fact about multithreaded and asynchronous programming is that it is hard. A rough estimate says that a multithreaded solution is between ten and 1000 times harder to design, write, debug and maintain than a single threaded one. Clearly, this should not be done without heavy duty performance needs. But how much is that?

Let’s do an experiment to find out. Let’s create a simple C++ network echo server the source code of which can be downloaded here. It can serve an arbitrary amount of clients but it uses only one thread to do so. The implementation uses a simple epoll loop over the open connections.

For our test we use 10 clients that do 10 000 queries each. To reduce the effects of network latency, the clients run on the same machine. The test hardware is a Nexus 4 running the latest Ubuntu phone.

The test finishes in 11 seconds, which means that a single threaded server can serve roughly 10 000 requests a second using basic ARM hardware. It should be noted that because the clients run on the same machine, they are stealing CPU time from the server. The service rates would be bigger if the server process got its own processor. It would also be bigger if compiler optimizations had been enabled but who needs those, anyway.

The end result of all this is that unless you need massive amounts of queries per second or your backend is incredibly slow, multithreading probably won’t do you much good and you’ll be much better of doing everything single-threaded. You’ll spend a lot less time in a debugger and will be generally happier as well.

Even if you need these, multithreading might still not be the way to go. There are other ways of parallelization, such as using multiple processes, which provides additional memory safety and error tolerance as well. This is not to say threads are bad. They are a wonderful tool for many different use cases. You should just be aware the some times the best way to use threads is not to use them at all.

Actually, make that “most times”.

Read more
David Murphy (schwuk)

Today I was adding tox and Travis-CI support to a Django project, and I ran into a problem: our project doesn’t have a Of course I could have added one, but since by convention we don’t package our Django projects (Django applications are a different story) – instead we use virtualenv and pip requirements files – I wanted to see if I could make tox work without changing our project.

Turns out it is quite easy: just add the following three directives to your tox.ini.

In your [tox] section tell tox not to run

skipsdist = True

In your [testenv] section make tox install your requirements (see here for more details):

deps = -r{toxinidir}/dev-requirements.txt

Finally, also in your [testenv] section, tell tox how to run your tests:

commands = python test

Now you can run tox, and your tests should run!

For reference, here is a the complete (albeit minimal) tox.ini file I used:

envlist = py27
skipsdist = True

deps = -r{toxinidir}/dev-requirements.txt
setenv =
    PYTHONPATH = {toxinidir}:{toxinidir}
commands = python test

Read more
Jussi Pakkanen

People often wonder why even the simplest of things seem to take long to implement. Often this is accompanied by uttering the phrase made famous by Jeremy Clarkson: how hard can it be.

Well let’s find out. As an example let’s look into a very simple case of creating a shared library that grabs a screen shot from a video file. The problem description is simplicity itself: open the file with GStreamer, seek to a random location and grab the pixels from the buffer. All in all, ten lines of code, should take a few hours to implement including unit tests.


Well, no. The very first problem is selecting a proper screenshot location. It can’t be in the latter half of the video, for instance. The simple reason for this is that it may then contain spoilers and the mere task of displaying the image might ruin the video file for viewers. So let’s instead select some suitable point, like 2/7:ths of the way in the video clip.

But in order to do that you need to first determine the length of the clip. Fortunately GStreamer provides functionality for this. Less fortunately some codec/muxer/platform/whatever combinations do not implement it. So now we have the problem of trying to determine a proper clip location for a file whose duration we don’t know. In order to save time and effort let’s just grab the screen shot at ten seconds in these cases.

The question now becomes what happens if the clip is less than ten seconds long? Then GStreamer would (probably) seek to the end of the file and grab a screenshot there. Videos often end in black so this might lead to black thumbnails every now and then. Come to think of it, that 2/7:th location might accidentally land on a fade so it might be all black, too. What we need is an image analyzer that detects whether the chosen frame is “interesting” or not.

This rabbit hole goes down quite deep so let’s not go there and instead focus on the other part of the problem.

There are mutually incompatible versions of GStreamer currently in use: 0.10 and 1.0. These two can not be in the same process at the same time due interesting technical issues. No matter which we pick, some client application might be using the other one. So we can’t actually link against GStreamer but instead we need to factor this functionality out to a separate executable. We also need to change the system’s global security profile so that every app is allowed to execute this binary.

Having all this functionality we can just fork/exec the binary and wait for it to finish, right?

In theory yes, but multimedia codecs are tricky beasts, especially hardware accelerated ones on mobile platforms. They have a tendency to freeze at any time. So we need to write functionality that spawns the process, monitors its progress and then kills it if it is not making progress.

A question we have not asked is how does the helper process provide its output to the library? The simple solution is to write the image to a file in the file system. But the question then becomes where should it go? Different applications have different security policies and can access different parts of the file system, so we need a system state parser for that. Or we can do something fancier such as creating a socket pair connection between the library and the client executable and have the client push the results through that. Which means that process spawning just got more complicated and you need to define the serialization protocol for this ad-hoc network transfer.

I could go on but I think the point has been made abundantly clear.

Read more
David Murphy (schwuk)

Although I still use my desktop replacement (i.e., little-to-no battery life) for a good chunk of my work, recent additions to my setup have resulted in some improvements that I thought others might be interested in.

For Christmas just gone my wonderful wife Suzanne – and my equally wonderful children, but let’s face it was her money not theirs! – bought me a HP Chromebook 14. Since the Chromebooks were first announced, I was dismissive of them, thinking that at best they would be a cheap laptop to install Ubuntu on. However over the last year my attitudes had changed, and I came to realise that at least 70% of my time is spent in some browser or other, and of the other 30% most is spent in a terminal or Sublime Text. This realisation, combined with the improvements Intel Haswell brought to battery life made me reconsider my position and start seriously looking at a Chromebook as a 2nd machine for the couch/coffee shop/travel.

I initially focussed on the HP Chromebook 11 and while the ARM architecture didn’t put me off, the 2GB RAM did. When I found the Chromebook 14 with a larger screen, 4GB RAM and Haswell chipset, I dropped enough subtle hints and Suzanne got the message. :-)

So Christmas Day came and I finally got my hands on it! First impressions were very favourable: this neither looks nor feels like a £249 device. ChromeOS was exactly what I was expecting, and generally gets out of my way. The keyboard is superb, and I would compare it in quality to that of my late MacBook Pro. Battery life is equally superb, and I’m easily getting 8+ hours at a time.

Chrome – and ChromeOS – is not without limitations though, and although a new breed of in-browser environments such as Codebox, Koding,, and Cloud9 are giving more options for developers, what I really want is a terminal. Enter Secure Shell from Google – SSH in your browser (with public key authentication). This lets me connect to any box of my choosing, and although I could have just connected back to my desk-bound laptop, I would still be limited to my barely-deserves-the-name-broadband ADSL connection.

So, with my Chromebook and SSH client in place, DigitalOcean was my next port of call, using their painless web interface to create an Ubuntu-based droplet. Command Line Interfaces are incredibly powerful, and despite claims to the contrary most developers spending most of their time with them1. There are a plethora of tools to improve your productivity, and my three must-haves are:

With this droplet I can do pretty much anything I need that ChromeOS doesn’t provide, and connect through to the many other droplets, linodes, EC2 nodes, OpenStack nodes and other servers I use personally and professionally.

In some other posts I’ll expand on how I use (and – equally importantly – how I secure) my DigitalOcean droplets, and which “apps” I use with Chrome.

  1. The fact that I now spend most of my time in the browser and not on the command-line shows you that I’ve settled into my role as an engineering manager! :-) 

Read more
Jussi Pakkanen

A common step in a software developer’s life is building packages. This happens both directly on you own machine and remotely when waiting for the CI server to test your merge requests.

As an example, let’s look at the libcolumbus package. It is a common small-to-medium sized C++ project with a couple of dependencies. Compiling the source takes around 10 seconds, whereas building the corresponding package takes around three minutes. All things considered this seems like a tolerable delay.

But can we make it faster?

The first step in any optimization task is measurement. To do this we simulated a package builder by building the source code in a chroot. It turns out that configuring the source takes one second, compiling it takes around 12 seconds and installing build dependencies takes 2m 29s. These tests were run on an Intel i7 with 16GB of RAM and an SSD disk. We used CMake’s Make backend with 4 parallel processes.

Clearly, reducing the last part brings the biggest benefits. One simple approach is to store a copy of the chroot after dependencies are installed but before package building has started. This is a one-liner:

sudo btrfs subvolume snapshot -r chroot depped-chroot

Now we can do anything with the chroot and we can always return back by deleting it and restoring the snapshot. Here we use -r so the backed up snapshot is read-only. This way we don’t accidentally change it.

With this setup, prepping the chroot is, effectively, a zero time operation. Thus we have cut down total build time from 162 seconds to 13, which is a 12-fold performance improvement.

But can we make it faster?

After this fix the longest single step is the compilation. One of the most efficient ways of cutting down compile times is CCache, so let’s use that. For greater separation of concerns, let’s put the CCache repository on its own subvolume.

sudo btrfs subvolume create chroot/root/.ccache

We build the package once and then make a snapshot of the cache.

sudo btrfs subvolume snapshot -r chroot/root/.ccache ccache

Now we can delete the whole chroot. Reassembling it is simple:

sudo btrfs subvolume snapshot depped-chroot chroot
sudo btrfs subvolume snapshot ccache chroot/root/.ccache

The latter command gave an error about incorrect ioctls. The same effect can be achieved with bind mounts, though.

When doing this the compile time drops to 0.6 seconds. This means that we can compile projects over 100 times faster.

But can we make it faster?

At this point all individual steps take a second or so. Optimizing them further would yield negligible performance improvements. In actual package builds there are other steps that can’t be easily optimized, such as running the unit test suite, running Lintian, gathering and verifying the package and so on.

If we look a bit deeper we find that these are all, effectively, single process operations. (Some build systems, such as Meson, will run unit tests in parallel. They are in the minority, though.) This means that package builders are running processes which consume only one CPU most of the time. According to usually reliable sources package builders are almost always configured to work on only one package at a time.

Having a 24 core monster builder run single threaded executables consecutively does not make much sense. Fortunately this task parallelizes trivially: just build several packages at the same time. Since we could achieve 100 times better performance for a single build and we can run 24 of them at the same time, we find that with a bit of effort we can achieve the same results 2400 times faster. This is roughly equivalent to doing the job of an entire data center on one desktop machine.

The small print

The numbers on this page are slightly optimistic. However the main reduction in performance achieved with chroot snapshotting still stands.

In reality this approach would require some tuning, as an example you would not want to build LibreOffice with -j 1. Keeping the snapshotted chroots up to date requires some smartness, but these are all solvable engineering problems.

Read more
Jussi Pakkanen

One of the main ways of reducing code complexity (and thus compile times) in C/C++ is forward declaration. The most basic form of it is this:

class Foo;

This tells the compiler that there will be a class called Foo but it does not specify it in more detail. With this declaration you can’t deal with Foo objects themselves but you can form pointers and references to them.

Typically you would use forward declarations in this manner.

class Bar;

class Foo {
  void something();
  void method1(Bar *b);
  void method2(Bar &b);

Correspondingly if you want to pass the objects themselves, you would typically do something like this.


class Foo {
  void something();
  void method1(Bar b);
  Bar method2();

This makes sense because you need to know the binary layout of Bar in order to pass it properly to and from a method. Thus a forward declaration is not enough, you must include the full header, otherwise you can’t use the methods of Foo.

But what if some class does not use either of the methods that deal with Bars? What if it only calls method something? It would still need to parse all of Bar (and everything it #includes) even though it never uses Bar objects. This seems inefficient.

It turns out that including Bar.h is not necessary, and you can instead do this:

class Bar;

class Foo {
  void something();
  void method1(Bar b);
  Bar method2();

You can define functions taking or returning full objects with forward declarations just fine. The catch is that those users of Foo that use the Bar methods need to include Bar.h themselves. Correspondingly those that do not deal with Bar objects themselves do not need to include Bar.hh ever, even indirectly. If you ever find out that they do, it is proof that your #includes are not minimal. Fixing these include chains will make your source files more isolated and decrease compile times, sometimes dramatically.

You only need to #include the full definition of Bar if you need:

  • to use its services (constructors, methods, constants, etc)
  • to know its memory layout

In practice the latter means that you need to either call or implement a function that takes a Bar object rather than a pointer or reference to it.

For other uses a forward declaration is sufficient.

Post scriptum

The discussion above holds even if Foo and Bar are templates, but making template classes as clean can be a lot harder and may in some instances be impossible. You should still try to minimize header includes as much as possible.

Read more
Anthony Dillon

I was recently asked to attend a cloud sprint in San Francisco as a front-end developer for the new Juju GUI product. I had the pleasure of finally meeting the guys that I have collaboratively worked with and ultimately been helped by on the project.

Here is a collection of things I learnt during my week overseas.

Mocha testing

Mocha is a JavaScript test framework that tests asynchronously in a browser. Previously I found it difficult to imagine a use case when developing a site, but I now know that any interactive element of a site could benefit from Mocha testing.

This is by no means a full tutorial or features set of Mocha but my findings from a week with the UI engineering team.

Breakdown small elements of your app or website its logic test

If you take a system like a user’s login and register, it is much easier to test each function of the system. For example, if the user hits the signup button you should test the registration form is then visible to the user. Then work methodically through each step of the process, testing as many different inputs you can think of.

Saving your bacon

Testing undoubtedly slows down initial development but catches a lot of mistakes and flaws in the system before anything lands in the main code base. It also means if a test fails you don’t have to manually check each test again by hand — you simply run the test suite and see the ticks roll in.

Speeds up bug squashing

Bug fixing becomes easier to the reporter and the developer. If the reporter submits a test that fails due to a bug, the developer will get the full scope of the issue and once the test passes the developer and reporter can be confident the problem no longer exists.


While I have read a lot about linting in the past but have not needed to use it on any projects I have worked on to date. So I was very happy to use and be taught the linting performed by the UI engineering team.

Enforces a standard coding syntax

I was very impressed with the level of code standards it enforces. It requires all code to be written in a certain way, from indenting and commenting to unused variables. This results in anyone using the code, being able to pick up it up and read it as if created by one person when in fact it may have contributed by many.

Code reviews

In my opinion code reviews should be performed on all front-end work to discourage sloppy code and encourage shared knowledge.

Mark up

Mark up should be very semantic. This can be a case of opinion, but shared discussion will get the team to an agreed solution, which will then be reused again by others in the similar situations.


CSS can be difficult as there are different ways to achieve a similar result, but with a code review the style used will be common practise within the team.


A perfect candidate as different people have different methods of coding. With a review, it will catch any sloppy or short cuts in the code. A review makes sure  your code is refactored to best-practise the first time.


Test driven development (TDD) does slow the development process down but enforces better output from your time spend on the code and less bugs in the future.

If someone writes a failing test for your code which is expected to pass, working on the code to produce a passing test is a much easier way to demonstrate the code now works, along with all the other test for that function.

I truly believe in code reviews now. Previously I was sceptical about them. I used to think that  “because my code is working” I didn’t need reviews and it would slow me down. But a good reviewer will catch things like “it works but didn’t you take a shortcut two classes ago which you meant to go back and refactor”. We all want our code to be perfect and to learn from others on a daily basis. That is what code reviews give us.

Read more
Inayaili de León Persson

Release month is always a busy one for the web team, and this time was no exception with the Ubuntu 13.10 release last week.

In the last few weeks we’ve worked on:

  • Ubuntu 13.10 release: we’ve updated for the latest Ubuntu release
  • Updates to the new Ubuntu OpenStack cloud section: based on some really interesting feedback we got from Tingting’s research, we’ve updated the new pages to make them easier to understand
  • Canonical website: Carla has conducted several workshops and interviews with stakeholders and has defined key audiences and user journeys
  • Juju GUI: on-boarding is now ready to land in Juju soon
  • Fenchurch (our CMS): the demo services are fixed and our publishing speed has seen a 90% improvement!

And we’re currently working on:

  • Responsive mobile pilot: we’ve been squashing the most annoying bugs and it’s now almost ready for the public alpha release!
  • with some of the research for the project already completed, Carla will now be working on creating the site’s information architecture and wireframing its key sections
  • Juju GUI: Alejandra, Luca, Spencer, Peter and Anthony are in a week-long sprint in San Francisco for some intense Juju-related work (lucky them!)
  • we have been working with the Community team to update the site’s design to be more in line with and the first iteration will be going live soon
  • Fenchurch: we are now working on a new download service

Release day at the Canonical office in LondonRelease day at the Canonical office

Have you got any questions or suggestions for us? Would you like to hear about any of these projects and tasks in more detail? Add your thoughts in the comments.

Read more
Jussi Pakkanen

With the release of C++11 something quite extraordinary has happened. Its focus on usable libraries, value types and other niceties has turned C++, conceptually, into a scripting language.

This seems like a weird statement to make, so let’s define exactly what we mean by that. Scripting languages differ from classical compiled languages such as C in the following ways:

  • no need to manually manage memory
  • expressive syntax, complex functionality can be implemented in just a couple of lines of code
  • powerful string manipulation functions
  • large standard library

As of C++11 all these hold true for C++. Let’s examine this with a simple example. Suppose we want to write a program that reads all lines from a file and writes them in a different file in sorted order. This is classical scripting language territory. In C++11 this code would look something like the following (ignoring error cases such as missing input arguments).


using namespace std;

int main(int argc, char **argv) {
  ifstream ifile(argv[1]);
  ofstream ofile(argv[2]);
  string line;
  vector<string> data;
  while(getline(ifile, line)) {
  sort(data.begin(), data.end());
  for(const auto &i : data) {
    ofile << i << std::endl;
  return 0;

That is some tightly packed code. Ignoring include boilerplate and the like leaves us with roughly ten lines of code. If you were to do this with plain C using only its standard library merely implementing getline functionality reliably would take more lines of code. Not to mention it would be tricky to get right.

Other benefits include:

  • every single line of code is clear, understandable and expressive
  • memory leaks can not happen, could be reworked into a library function easily
  • smaller memory footprint due to not needing a VM
  • compile time with -O3 is roughly the same as Python VM startup and has to be done only once
  • faster than any non-JITted scripting language

Now, obviously, this won’t mean that scripting languages will disappear any time soon (you can have my Python when you pry it from my cold, dead hands). What it does do is indicate that C++ is quite usable in fields one traditionally has not expected it to be.

Read more
Inayaili de León Persson

We might have been quiet, but we have been busy! Here’s a quick overview of what the web team has been up to recently.

In the past month we’ve worked on:

  • New website: we’ve revamped the information architecture, revisited the key journeys and updated the look to be more in line with
  • Fenchurch (our CMS): we’ve worked on speeding up deployment and continuous testing
  • New Ubuntu OpenStack cloud section on we’ve launched a restructured cloud section, with links to more resources, clearer journeys and updated design
  • Juju GUI: we’ve launched the brand new service inspector

And we’re currently working on:

  • 13.10 release updates: the new Ubuntu release is upon us, and we’re getting the website ready to show it off
  • A completely new project that will be our mobile/responsive pilot: we’re updating our web patterns to a more future-friendly shape, investigating solutions to handle responsive images, and we’ve set up a (growing) mobile device testing suite — watch this space for more on this project
  • Fenchurch: we’re improving our internal demo servers and enhancing performance on the downloads page to help deal with release days!
  • Usability testing of the new cloud section: following the aforementioned launch, Tingting is helping us test these pages with their target audience — and we’ve already found loads of things we can improve!
  • A new we haven’t worked on Canonical’s main website in a while, so we’re looking into making it leaner and meaner. As a first stage, Carla has been conducting internal interviews and analysing the existing content
  • Juju GUI: we’re designing on-boarding and a new notification system, and we’re finalising designs for the masthead, service block and relationship lines

We’ve also learnt that Spencer’s favourite author is Paul Auster. And Tristram wrote a post on his blog about his first experience with Juju.

Web team weekly meeting on 19 September 2013Spencer giving his 5×5 presentation at last week’s web team meeting

Have you got any questions or suggestions for us? Would you like to hear about any of these projects and tasks in more detail? Please let us know your thoughts in the comments.

Read more
Jussi Pakkanen

The problem

Suppose you have a machine with 8 cores. Also suppose you have the following source packages that you want to compile from scratch.


You want to achieve this as fast as possible. How would you do it?

Think carefully before proceeding.

The solution

Most of you probably came up with the basic idea of compiling one after the other with ‘make -j 8′ or equivalent. There are several reasons to do this, the main one being that this saturates the CPU.

The other choice would be to start the compilation on all subdirs at the same time but with ‘make -j 1′. You could also run two parallel build jobs with ‘-j 4′ or four with ‘-j 2′.

But surely that would be pointless. Doing one thing at the time maximises data locality so the different build trees don’t have to compete with each other for cache.


Well, let’s measure what actually happens.


The first bar shows the time when running with ‘-j 8′. It is slower than all other combinations. In fact it is over 40% (one minute) slower than the fastest one, although all alternatives are roughly as fast.

Why is this?

In addition to compilation and linking processes, there are parts in the build that can not be parallelised. There are two main things in this case. Can you guess what they are?

What all of these projects had in common is that they are built with Autotools. The configure step takes a very long time and can’t be parallelised with -j. When building consecutively, even with perfect parallelisation, the build time can never drop below the sum of configure script run times. This is easily half a minute each on any non-trivial project even on the fastest i7 machine that money can buy.

The second thing is time that is lost inside Make. Its data model makes it very hard to optimize. See all the gory details here.

The end result of all this is a hidden productivity sink, a minute lost here, one there and a third one over there. Sneakily. In secret. In a way people have come to expect.

These are the worst kinds of productivity losses because people honestly believe that this is just the way things are, have always been and shall be evermore. That is what their intuition and experience tells them.

The funny thing about intuition is that it lies to you. Big time. Again and again.

The only way out is measurements.


Read more
Jussi Pakkanen

We all like C++’s container classes such as maps. The main negative thing about them is persistance. Ending your process makes the data structure go away. If you want to store it, you need to write code to serialise it to disk and then deserialise it back to memory again when you need it. This is tedious work that has to be done over and over again.

It would be great if you could command STL containers to write their data to disk instead of memory. The reductions in application startup time alone would be welcomed by all. In addition most uses for small embedded databases such as SQLite would go away if you could just read stuff from persistent std::maps.

The standard does not provide for this because serialisation is a hard problem. But it turns out this is, in fact, possible to do today. The only tools you need are the standard library and basic standards conforming C++.

Before we get to the details, please note this warning from the society of responsible coding.


What follows is the single most evil piece of code I have ever written. Do not use it unless you understand the myriad of ways it can fail (and possibly not even then).

The basic problem is that C++ containers work only with memory but serialisation requires writing bytes to disk. The tried and true solution for this problem is memory mapped files. It is a technique where a certain portion of process’ memory is mapped to a backing file. Any changes to the memory layout will be written to the disk by the kernel. This gives us memory serialisation.

This is only half of the problem, though. STL containers and others allocate the memory they need through operator new. The way new works is implementation defined. It may give out addresses that are scattered around the memory space. We can’t mmap the entire address space because it would take too much space and serialise lots of stuff we don’t care about.

Fortunately C++ allows you to specify custom allocators for containers. An allocator is an object that does memory allocations for the object it is tied to. This indirection allows us to write our own allocator that gives out raw memory chunks from the mmapped memory area.

But there is still a problem. Since pointers refer to absolute memory locations we would need to have the mmapped memory area in the same location in every process that wants to use it. It turns out that you can enforce the address at which the memory mapping is to be done. This gives us an outline on how to achieve our goal.

  • create an empty file for backing (10 MB in this example)
  • mmap it in place
  • populate the data structure with objects allocated in the mmapped area
  • close creator program
  • start reader program, mmap the data and cast the root object into existance

And that’s it. Here’s how it looks in code. First some declarations:

*mmap_start = (void*)139731133333504;
size_t offset = 1024;

template <typename T>
class MmapAlloc {
  pointer allocate(size_t num, const void *hint = 0) {
    long returnvalue = (long)mmap_start + offset;
    size_t increment = num * sizeof(T) + 8;
    increment -= increment % 8;
    offset += increment;
    return (pointer)returnvalue;

typedef std::basic_string<char, std::char_traits<char>,
  MmapAlloc<char>> mmapstring;
typedef std::map<mmapstring, mmapstring, std::less<mmapstring>,
  MmapAlloc<mmapstring> > mmapmap;

First we declare the absolute memory address of the mmapping (it can be anything as long as it won’t overlap an existing allocation). The allocator itself is extremely simple, it just hands out memory offset bytes in the mapping and increments offset by the amount of bytes allocated (plus alignment). Deallocated memory is never actually freed, it remains unused (destructors are called, though). Last we have typedefs for our mmap backed containers.

Population of the data sets can be done like this.

int main(int argc, char **argv) {
    int fd = open("backingstore.dat", O_RDWR);
    void *mapping;
    mapping = mmap(mmap_start, 10*1024*1024,
    if(mapping == MAP_FAILED) {
        printf("MMap failed.\n");
        return 1;
    mmapstring key("key");
    mmapstring value("value");
    if(fd < 1) {
        printf("Open failed.\n");
        return 1;
    auto map = new(mapping)mmapmap();
    (*map)[key] = value;
    printf("Sizeof map: %ld.\n", (long)map->size());
    printf("Value of 'key': %s\n", (*map)[key].c_str());
    return 0;

We construct the root object at the beginning of the mmap and then insert one key/value pair. The output of this application is what one would expect.

Sizeof map: 1.
Value of 'key': value

Now we can use the persisted data structure in another application.

int main(int argc, char **argv) {
    int fd = open("backingstore.dat", O_RDONLY);
    void *mapping;
    mapping = mmap(mmap_start, 10*1024*1024, PROT_READ,
     MAP_SHARED | MAP_FIXED, fd, 0);
    if(mapping == MAP_FAILED) {
        printf("MMap failed.\n");
        return 1;
    std::string key("key");
    auto *map = reinterpret_cast<std::map<std::string,
                                 std::string> *>(mapping);
    printf("Sizeof map: %ld.\n", (long)map->size());
    printf("Value of 'key': %s\n", (*map)[key].c_str());
    return 0;

Note in particular how we can specify the type as std::map<std::string, std::string> rather than the custom allocator version in the creator application. The output is this.

Sizeof map: 1.
Value of 'key': value

It may seem a bit anticlimactic, but what it does is quite powerful.

Extra evil bonus points

If this is not evil enough for you, just think about what other things can be achieved with this technique. As an example you can have the backing file mapped to multiple processes at the same time, in which case they all see changes live. This allows you to have things such as standard containers that are shared among processes.

Read more