Launchpad has a feature where it periodically checks the status of remote bugs (as in, bugs recorded in another bug tracker, like bug 12720 in Django).
When someone links a bug on Launchpad with a remote bug it’s called a bug watch. All the bug watches for a bug appear in the Launchpad bug page in an area called “Remote bug watches”. Check out bug 513719 to see a bug watch for bug 12720 in Django.
If the remote bug tracker has been set as the bug tracker for a project in Launchpad, bug tasks for that project can be linked to a specific remote bug too. When the status of the remote bug changes, Launchpad changes the status of the bug task to match, and sends out email to subscribers, the same as if the status had been changed in Launchpad. See the Django bug task in bug 513719 for an example.
Going further, comments can be synchronized too, in both directions. Recent versions of Bugzilla have this capability built in, but older versions can be supported with a plugin. There’s also a plugin for Trac.
This is all very nifty stuff, but it suffers because it doesn’t work very well! Yet.
Part of this is down to complexity
We support 7 different remote bug tracker types: Debian Bugs, Trac, Bugzilla, Mantis, RT (Request Tracker), Roundup and SourceForge.
We try to support a range of versions for each of these trackers, and a range of different access methods.
For example, with Bugzilla, we support old (v2) installations, more recent (v3) ones, recent ones with the Launchpad plugin, and very recent ones with the API built-in. And we support Issuezilla, a variant of Bugzilla.
We try to work around many idiosyncrasies and customizations in the remote systems.
With Mantis, for example, sometimes we need to log in anonymously before we can download bug statuses. Some Mantis installations allow us to download status information in CSV form, in a batch, but not all. If not, we screen-scrape the individual bug pages. Even if we get CSV, it’s often slightly corrupt, so we try and correct for that where we can.
Then there are simply hundreds of things that are beyond our control that Launchpad must cope with and move on, like HTTP errors, errors without correct HTTP codes, slow responses, unrecognized responses, and so forth. We check about 7k remote bugs every day (we aim to do more than 30k a day, but we’re not there yet) so there are a huge number of errors to sift through. It’s a big task to figure out if each problem is transient, a problem in Launchpad, or a remote problem, never mind actually fixing it!
We must also be gentle with the remote systems, and not issue too many requests in a short time, or otherwise make unreasonably large demands on them. We must be nice.
Part of it is down to the development of the code base
The bug watch code in Launchpad is a bit creaky. Originally it was designed to run in a single thread in a single process. This means that one badly performing bug tracker can starve the whole system of updates.
In the past we’ve tried to alleviate this by choosing the bug watches to check more carefully, or by batching requests, but this has not been enough for a some time.
Also, while we do record a lot of information about errors, it is still a Herculean task to constantly monitor the errors coming out of the system.
We also can’t test or do QA against other people’s systems. For testing we must use test doubles to simulate the behaviour of remote systems, but this can get confusing. For QA, sometimes we must test against remote systems. This is acceptable for testing the status fetching code, but not for comment synchronization.
However, we have not had the time available to give the bug watch system a big overhaul, only to make small improvements and bug fixes. For a long time it’s needed some big changes to make it work with the volumes of work it’s expected to cope with.
We’re trying to fix these issues now.
We’ve made checkwatches – the program that drives the bug watch machinery – run multi-threaded. This works, but it hammers our database, so we need to figure out how to alleviate that next.
We’ve started the work to move the code base over to using Twisted. This is a better model for managing a lot of concurrent network activity. As more and more bug watches are registered with Launchpad, we’re going to need it.
We’re going to keep more history of success and failure in the database, so that checkwatches can throttle back checks for remote bugs that persistently error. This is really important because it will reduce the work that checkwatches needs to do each day. It will also, we hope, reduce the deluge of errors to a more manageable stream, so that we’re better able to spot genuine bugs that need our attention. Lastly, we hope to display this information on the web pages so that users can help to diagnose problems themselves.
We’re getting some more infrastructure in place, so that we can develop and QA against real installations of Bugzilla, Trac, et al.
We’re also fixing up as many bugs in checkwatches as we can.
Who are we?
Abel Deuring, Graham Binns and Gavin Panella. We’re meeting up next week in Norwich, England for a sprint, a culmination of a couple of months of dedicated effort on the bug watch system.
Deryck Hodge is also coordinating with the Canonical IS team to deliver the QA infrastructure.
For all of us, the aim is to get checkwatches to work reliably, even if it’s not the most efficient or elegant system inside. We also want future development to be more sustainable than it has been in the past.
We want users to be able to rely on this feature.