Canonical Voices

Posts tagged with 'design'

Gustavo Niemeyer

Back at the Ubuntu Platform Rally last week, I’ve pestered some of the Bazaar team with questions about co-location of branches in the same directory with Bazaar. The great news is that this seems to be really coming for the next release, with first-class integration of the feature in the command set. Unfortunately, though, it’s not quite yet ready for prime time, or even for I’m-crazy-and-want-this-feature time.

Some background on why this feature turns out to be quite important right now may be interesting, since life with Bazaar in the past years hasn’t really brought that up as a blocker. The cause for the new interest lies in some recent changes in the toolset of the Go language. The new go tool not only makes building and interacting with Go packages a breeze, but it also solves a class of problems previously existent. For the go tool to work, though, it requires the use of $GOPATH consistently, and this means that the package has to live in a well defined directory. The traditional way that Bazaar manages branches into their own directories becomes a deal breaker then.

So, last week I had the chance to exchange some ideas with Jelmer Vernooij and Vincent Ladeuil (both Bazaar hackers) on these problems, and they introduced me to the approach of using lightweight checkouts to workaround some of the limitations. Lightweight checkouts in Bazaar makes the working tree resemble a little bit the old-style VCS tools, with the working tree being bound to another location that actually has the core content. The idea is great, and given how well lightweight checkouts work with Bazaar, building a full fledged solution shouldn’t be a lot of work really.

After that conversation, I’ve put a trivial hack together that would make bzr look like git from the outside, by wrapping the command line, and did a lightning talk demo. This got a few more people interested on the concept, which was enough motivation for me to move the idea forward onto a working implementation. Now I just needed the time to do it, but it wasn’t too hard to find it either.

I happen to be part of the unlucky group that too often takes more than 24 hours to get back home from these events. This is not entirely bad, though.. I also happen to be part of the lucky group that can code while flying and riding buses as means to relieve the boredom (reading helps too). This time around, cobzr became the implementation of choice, and given ~10 hours of coding, we have a very neat and over-engineered wrapper for the bzr command.

The core of the implementation is the same as the original hack: wrap bzr and call it from outside to restructure the tree. That said, rather than being entirely lazy and hackish line parsing, it actually parses bzr’s –help output for commands to build a base of supported options, and parses the command line exactly like Bazaar itself would, validating options as it goes and distinguishing between flags with arguments from positional parameters. That enables the proxying to do much more interesting work on the intercepted arguments.

Here is a quick session that shows a branch being created with the tool. It should look fairly familiar for someone used to git:


[~]% bzr branch lp:juju
Branched 443 revisions.

[~]% cd juju
[~/juju]% bzr branch
* master

[~/juju]% bzr checkout -b new-feature
Shared repository with trees (format: 2a)
Location:
shared repository: .bzr/cobzr
Branched 443 revisions.
Branched 443 revisions.
Tree is up to date at revision 443.
Switched to branch: /home/niemeyer/juju/.bzr/cobzr/new-feature/

[~/juju]% bzr branch other-feature
Branched 443 revisions.

[~/juju]% bzr branch
  master
* new-feature
  other-feature

Note that cobzr will not reorganize the tree layout before the multiple branch support is required.

Even though the wrapping is taking place and bzr’s –help output is parsed, there’s pretty much no noticeable overhead given the use of Go for the implementation and also that the processed output of –help is cached (I said it was overengineered).

As an example, the first is the real bzr, while the second is a link to cobzr:


[~/juju]% time /usr/bin/bzr status
/usr/bin/bzr status 0.24s user 0.03s system 88% cpu 0.304 total

[~/juju]% time bzr status
bzr status 0.19s user 0.08s system 88% cpu 0.307 total

This should be more than enough for surviving comfortably until bzr itself comes along with first class support for co-located branches in the next release.

In case you’re interested in using it or are just curious about the command set or other details, please check out the web page for the project:

Read more
Gustavo Niemeyer

A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited from them. That said, it wasn’t until I indeed started to use DVCS tools that I understood how much my daily workflow around code bases would be changed and improved.

This weekend, while flying home from MongoSV, I could experience that same feeling in relation to first class concurrency support in programming languages. Everybody knows how the feature may be used, but I have the feeling that until one actually experiences it in practice, it’s very hard to really understand how much the relationship with ordering while developing software may be improved.

I was having some fun working on improvements to Goetveld. This package allows Go programs to communicate with Rietveld servers to manipulate code review entries. The Rietveld API is a bit rough in a few places, and as a result some features of the package actually parse an HTML form to extract some data, before sending it back. You may have done something similar before while attempting to script a web site that wasn’t originally intended to be.

The interesting fact here is that this is an intrinsically serial procedure: load a form, change it, and send it back, right? Well, not really. As one might intuitively expect, establishing an SSL session and its underlying TCP connection are not instantaneous operations.

To give an idea, here is part of a dump of an SSL connection being initiated (that is, no HTTP data was sent yet) to codereview.appspot.com, originated from my home location:

# tcpdump -ttttt -i wlan0 'host codereview.appspot.com and port 443'
(...)
00:00:00.000000 IP (...)
00:00:00.000063 IP (...)
00:00:00.000562 IP (...)
00:00:00.341627 IP (...)
00:00:00.357009 IP (...)
00:00:00.357118 IP (...)
00:00:00.360362 IP (...)
00:00:00.360550 IP (...)
00:00:00.366011 IP (...)
00:00:00.689446 IP (...)
00:00:00.727693 IP (...)

That’s more than half a second before the application layer was even touched. So, turns out that to save that roundtrip time, we can start both the form loading and the form sending requests at the same time. By the time the form loading ends, processing the data locally is extremely fast, and we can complete the sending side by just providing the request body.

At this time you may be thinking something like “Ugh, that’s too much trouble.. why bother?”, and that highlights precisely the point I’d like to make: it is too much trouble because most people are used to languages that turn it into too much trouble, but the issue is not inherently complex. In fact, this is the entire implementation of this logic in Go:

func (r *Rietveld) UpdateIssue(issue *Issue) error {
        op := &opInfo{r: r, issue: issue}
        errs := make(chan error)
        ch := make(chan map[string]string, 1)
        go func() {
                errs <- r.do(&editLoadHandler{op: op, form: ch})
                close(ch)
        }()
        go func() {
                errs <- r.do(&editHandler{op: op, form: ch})
        }()
        return firstError(2, errs)
}

I'm not cheating. The procedure was being done serially before, with very similar logic. Previously it had to take the form variable itself from the first request and manually provide it to the next one. Now, instead of providing the form, it's providing a channel that will be used to send the form across. One might even argue that the channel makes the algorithm more natural, curiously.

This is the kind of procedure that becomes fun and natural to write, after having first class concurrency at hand for some time. But, as in the case of DVCS, it takes a while to get used to the idea that concurrency and simplicity are not necessarily at opposing ends.

Read more
niemeyer

Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.

If you stay around listening to community conversations for a few days there’s a good chance you’ll hear someone proudly mentioning the tenet:

Do not communicate by sharing memory; instead, share memory by communicating.

There is a blog post on the topic, and also a code walk covering it.

That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that’s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the termination of background activity.

As an example, let’s build a purposefully simplistic goroutine that sends lines across a channel:

type LineReader struct {
        Ch chan string
        r  *bufio.Reader
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &LineReader{
                Ch: make(chan string),
                r:  bufio.NewReader(r),
        }
        go lr.loop()
        return lr
}

The type has a channel where the client can consume lines from, and an internal buffer
used to produce the lines efficiently. Then, we have a function that creates an initialized
reader, fires the reading loop, and returns. Nothing surprising there.

Now, let’s look at the loop itself:

func (lr *LineReader) loop() {
        for {
                line, err := lr.r.ReadSlice('n')
                if err != nil {
                        close(lr.Ch)
                        return
                }
                lr.Ch <- string(line)
        }
}

In the loop we'll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.

There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there's no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.

I'm not claiming this is something difficult to do, by any means. What I'm saying is that there isn't today an idiom for handling these aspects in a simple and consistent way. Or maybe there wasn't. The tomb package for Go is an experiment I'm releasing today in an attempt to address this problem.

The model is simple: a Tomb tracks whether the goroutine is alive, dying, or dead, and the death reason.

To understand that model, let's see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:

type LineReader struct {
        Ch chan string
        r  *bufio.Reader
        t  tomb.Tomb
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &LineReader{
                Ch: make(chan string),
                r:  bufio.NewReader(r),
        }
        go lr.loop()
        return lr
}

Looks very similar. Just a new field in the struct, and the function that creates it hasn't even been touched.

Next, the loop function is modified to support tracking of errors and interruptions:

func (lr *LineReader) loop() {
        defer lr.t.Done()
        for {
                line, err := lr.r.ReadSlice('n')
                if err != nil {
                        close(lr.Ch)
                        lr.t.Kill(err)
                        return
                }
                select {
                case lr.Ch <- string(line):
                case <-lr.t.Dying():
                        close(lr.Ch)
                        return
                }
        }
}

Note a few interesting points here: first, Done is called to track the goroutine termination right before the loop function returns. Then, the previously loose error now goes into the Kill Tomb method, flagging the goroutine as dying. Finally, the channel send was tweaked so that it doesn't block in case the goroutine is dying for whatever reason.

A Tomb has both Dying and Dead channels returned by the respective methods, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.

With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:

func (lr *LineReader) Stop() error {
        lr.t.Kill(nil)
        return lr.t.Wait()
}

In this case the Kill method will put the tomb in a dying state from outside the running goroutine, and Wait will block until the goroutine terminates itself and notifies via the Done method as seen before. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Kill with an actual error is recorded as the cause for the goroutine death. The nil value provided to t.Kill is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.

This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what Erlang does with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.

The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, with familiar method names inspired in existing idioms. If you want to make use of it, go get the package with:

$ go get launchpad.net/tomb

The API documentation with details is available at:

http://gopkgdoc.appspot.com/pkg/launchpad.net/tomb

Have fun!

UPDATE 1: there was a minor simplification in the API since this post was originally written, and the post was changed accordingly.

UPDATE 2: there was a second simplification in the API since this post was originally written, and the post was changed accordingly once again to serve as reference.

Read more
Gustavo Niemeyer

About 1 year after development started in Ensemble, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we’ve been jointly planning for a long time. This is the system that the Ensemble client will communicate with for discovering which formulas are available, for publishing new formulas, for obtaining formula files for deployment, and so on.

We of course would have liked for this part of the project to have been specified and written a while ago, but unfortunately that wasn’t possible for several reasons. That said, there are also good sides of having an important piece flying around in minds and conversations for such a long time: sitting down to specify the system and describe the inner-working details has been a breeze. Even details such as the namespacing of formulas, which hasn’t been entirely clear in my mind, was just streamed into the document as the ideas we’ve been evolving finally got together in a written form.

One curious detail: this is the first long term project at Canonical that will be developed in Go, rather than Python or C/C++, which are the most used languages for projects within Canonical. Not only that, but we’ll also be using MongoDB for a change, rather than the traditional PostgreSQL, and will also use (you guessed) the mgo driver which I’ve been pushing entirely as a personal project for about 8 months now.

Naturally, with so many moving parts that are new to the company culture, this is still being seen as a closely watched experiment. Still, this makes me highly excited, because when I started developing mgo, the MongoDB driver for Go, my hopes that the Go, MongoDB, and mgo trio would eventually be used at Canonical were very low, precisely because they were all alien to the culture. We only got here after quite a lot of internal debate, experiments, and trust too.

All of that means these are happy times. Important feature in Ensemble being specified and written, very exciting tools, home grown software being useful..

Awesomeness.

Read more
Gustavo Niemeyer

Back in 2009 I quickly talked about the obvious revolution in computing that was rolling in the form of mobile phone as computer, and mentioned as well the fact that touch-based interfaces were going to dominate the marketplace because of that.

Move forward a couple of years, and last week I got my first tablet, running Android (a Samsung Galaxy Tab 10.1, if you’re curious). I didn’t know exactly why I needed one, but being in the tech industry I always have that nice excuse for myself of buying things early on for learning about the experience of using them. Last night, I could clearly see this can be a real claim in some cases (in others it’s just an excuse for the wife).

After getting the tablet last week, I’ve started by experimenting with the usual stuff any person would (email, browser, etc), and then downloaded a few games to take on board a longish flight. Some of them were pretty good.. a vertical scrolling shooter, a puzzle-solver, and so on. On all of them, though, it took just a few minutes before the novelty of holding the screen in my hands for interacting with the game got old, and the interest went away with it.

This last night, though, I’ve decided to try another game from the top list, named Cut the Rope, and this time I was immediately hooked into it. That was certainly one of the most enjoyable gaming experiences I had in quite a while, and when going to bed I started to ponder about what was different there.

The game is obviously well executed, with cute drawings and sounds, and also smooth, but I think there was something else as well. In retrospect, the other games felt a lot like ports of a desktop/laptop experience. The side scrolling game, for instance, was quite well suited for a joystick, and at least one other game had an actual joystick emulated on the screen, which is an enabler, but far from nice to be honest.

This one game, though, felt very well suited for a hands-based interaction: quickly drawing lines for cutting ropes, tapping on balloons to push air out, moving levers around, etc. In some more advanced levels, it was clear that my dexterity (or lack thereof) was playing a much more important role in accomplishing the tasks than the traditional button/joystick version of it. This felt like an entirely novel gaming experience that just hadn’t happened yet.

It’s funny and ironic that I had this experience within a week from Microsoft reportedly saying (again!) that a tablet is just another PC. It’s not, and if they tried it out with some minimum attention they’d see why it’s so clearly not.

In that experience, the joystick felt familiar but at the same quite awkward to use, but using my hands naturally in an environment where that was suitable felt very pleasing. We can generalize that a bit and note a common way to relate to innovation: we first try to reuse the knowledge we have when facing a new concept, but when we understand the concept better quite often we’re able to come up with more effective and interesting ways to relate to it.

In the tablet vs. laptop/desktop thread, you probably won’t want to be typing long documents in a tablet, but would most likely prefer to shuffle items in an agenda with your fingers. Also, you likely wouldn’t want to do that detailed CAD work with a fat finger in a screen, but would certainly be happy to review code or a document sitting in your backyard with the birds (no whales).

So, let’s please put that hammer away for a second while creating a most enjoyable touch-based experience.

Read more