Benji’s blog post earlier this week gave you all some insight into what the Launchpad Yellow Squad has been doing recently in its attempt to parallelise the Launchpad test suite. One of the side effects of this is that we’ve been making quite a lot of use of Juju, and we thought it’d be nice to actually spell out what we’ve been doing.
We’re working to parallelise Launchpad’s test suite so that it doesn’t take approximately one epoch to get a branch from being approved for merging until it lands. A lofty goal, sure, and one that presents some interesting problems from the perspective of building an environment to test our work in. You see, Launchpad’s build infrastructure is a pretty complicated beast. It’s come a long way since the time when submitting a branch for merging meant sending an email to our PQM bot, which would then run the test suite and kick the branch out if it failed, but now it’s something of a behemoth.
Time for some S&M
We use Buildbot as our continuous integration system. There are two parts to Buildbot: the master and the slave. Broadly put, the slave is the part of Buildbot that is responsible for doing the actual work of compilation and running tests and the master is responsible for telling the slave when to do things. Each master can be responsible for several slaves. When it became obvious that we were going to need to essentially replicate our existing setup in order to test our parallelisation work, we considered asking Canonicals system administrators, in our sweetest tones, to give us a box upon which to do our testing work, but we spotted two reasons that this would be problematic:
- We didn’t actually know at the outset what the best architecture was for our project.
- Asking for a machine without knowing what you actually need is likely to earn you a look so old it could have come from an ammonite, at least if you have sensible sysadmins.
So instead, the obvious solution: use Amazon EC2. After all, that would allow us to play with different architectures without there being any huge cost in terms of physical resources. Moreover, we’d be able to have root access on the instances on which we were testing, which makes debugging such a complicated process so much easier.
There was still a problem. How to actually set up the test instances, given that there are five of us spread between three timezones, that it takes a significant amount of time to set up a machine for Launchpad development, and finally that we don’t really want to leave EC2 instances running overnight if we don’t have to (because it’s expensive).
The sequence of steps we’d have to take to up an instance tends to look something like this:
- Launch a new EC2 instance (this happens pretty quickly, thanks, Amazon)
- Make sure that everyone’s public SSH keys are usable on that instance
- Run our Launchpad setup script(s) (this takes about an hour, usually).
- Install buildbot.
- Configure buildbot correctly as master or slave.
- Run buildbot (or buildslave, if this is a slave) and make sure it’s hooked up correctly to the other type of buildbot.
- Get some code into buildbot and make it run the test suite.