Canonical Voices

Posts tagged with 'bazaar'

beuno

A few days ago, the project ohloh announced they where starting to support bazaar imports.

It’s a pretty cool website, so I’m very happy they now support projects I actually work on :)

I kicked off an import of Loggerhead’s history, and in less than a day, it was up and working.

Kudos to the ohloh team!

Read more
Tim Penhey (thumper)

Bazaar has the model right

Some people in the GNOME community have suggested that if Bazaar has nice usability, then GNOME can just use Git on the back-end, and Bazaar lovers can just use the Git back-end via Bazaar. It's true that Bazaar could support this — an experimental plug-in exists to do this right now. But this suggestion betrays several wrong assumptions.

People assume Git and Bazaar are the same. They're not. People assume that if Git and Bazaar have technical differences, then Git must have it right.

The problem with these assumptions is that usability begins at the ground level. Bazaar started with a focus on usability. Git began with a focus on speed. The data models of both Bazaar and Git reflect their initial focus. But Bazaar's model can also be fast. In fact, the Bazaar developers are currently optimising a number of key operations for speed.

Data retrieval

Git and Bazaar are both key/value mapping systems. When bytes are needed, they are requested with that key.

The big difference is that Git's keys are also the hashes of the bytes. This is why it's called a content-addressable file system. This allows git to offer a guarantee that if the value hashes to the key, it has not been modified, whether deliberately or by accident. The Bazaar team considered adopting this approach, but decided it was too constricting. Bazaar uses UUIDs instead.

Authenticating revisions

For detecting malicious modification of revisions, Git uses its cryptographic hashes.

Bazaar uses revision-signing. All revisions can be PGP-signed. No signed revision can be forged. And the hashed representation can easily be generated and passed around to ensure that exactly the same content is used.

If SHA-1 is broken, both Bazaar and Git will lose their ability to detect malicious modification. But since Bazaar uses UUIDs to identify revisions, users can re-sign their old revisions with whatever method proves to be secure. Changing the hash used by Git would make it incompatible with all existing repositories.

Data Integrity and Serialization formats

Bazaar stores hashes of every value, so it equally capable of detecting accidental modification. It can be useful to have different representations of a tree in different repositories. For example, when Git lists files, it divides this data by directory. This is a good approach, but not necessarily the best approach. An alternative approach would be to use a radix tree. This would ensure that Git performed quickly even if users put unreasonable numbers of files in a single directory. But Git's keys are hashes, upgrading Git's format to use radix trees would change the keys, which means that people could not use the commit-id from one repository to refer to the same tree in an other form.

Bazaar doesn't assume it has the perfect format. It provides an upgrade path, and does't change the commit-id of a revision if you change your format. What's more, Bazaar can even reference data it has never seen. This allows partial imports from other VCSes to be fully compatible with more complete imports. And if a VCS provides UUIDs (content hashes certainly qualify as UUIDs), Bazaar can refer to those UUIDs directly.

File and directory representation

Git refers to files by path. It makes no attempt to track renames in its data store.

Bazaar has an inode abstraction; files and directories both have ids. When a file is renamed, its id stays the same. Bazaar's core code refers to files by their id, so merging a renamed file requires no special effort.

Git's approach means that users are warned not to rename files while changing their content. But when files are renamed, those files that refer to the renamed files must have their contents changed as well. For example, if you rename foo.h and foo.c to bar.h and bar.c, you should update the contents of bar.c, or else you will break the build. With Bazaar, users can do whatever they want, and the VCS just works. While Git must always use heuristics to deduce renames, Bazaar does not have to. Of course, it can if it wants to. This is an example of why it is important to design a model for usability from the beginning.

Bazaar can import rename data losslessly from foreign VCSes. Some other VCSes support file-ids, and Bazaar can reuse those without change. For VCSes that support renames, but not file-ids, Bazaar's representation is also non-lossy. When data imports are deterministic and non-lossy, it's easy to export them back to their source VCS. Bazaar's Subversion integration is a great example of how this can work.

Choose the back-end with the right model

In any situation it makes sense to use a back-end that stores the richer dataset. It makes more sense to have a front end client that doesn't use all the functionality or data representation of the back-end than it does to have a richer client that isn't able to store the required information as the back-end is not able to represent it.

If a single back-end storage is going to be used, it makes more sense to use a Bazaar back-end as Bazaar is able to represent everything that Git does, but the reverse is not true.

Conclusion

The Bazaar developers focused on usability, which requires having a model that supports usability. Bazaar has improved its model to increase the usability of the system. We believe that Bazaar has the right model.

co-written by Aaron Bentley and Tim Penhey

Read more
Tim Penhey (thumper)

J5 mentioned in his post his interpretation of the number of users for GIT, Bazaar and Hg (Mercurial). He also finishes with "Converse amongst yourselves".

I guess I should first point out that I am a Bazaar user, and that I work for Canonical. I felt somewhat enraged at the post from J5, and have spent some time trying to work out some response.

John Carr mentioned that 83% of statistics are made up on the spot, and that cannot be more true here. I had been waiting for someone else to post the numbers that they saw at the BOF, but so far I have not seen one.

Here is my take on it.

Yes there were more GIT users than Bazaar users at the BOF, but the numbers were more like 50% of the audience were GIT users, and about 40% were Bazaar users. Someone piped up and said "What about Mercurial?" and so the question was asked, and there were about five or six people. There was an overlap of the GIT and Bazaar groups, and there was by far the larger majority of the audience that had not used any DVCS.

What conclusions can we draw from this? Not much. Many people attending the pre-conference work for larger companies, like Red Hat, Novell, and Nokia, and many of those people work on some hard core linux stuff, many of which have chosen GIT. Many have chosen GIT because that is what the linux kernel is using. Is that a good reason to chose a DVCS? I don't feel that we can really answer that question as I am sure there are strong advocates for both sides.

An interesting question is "Which DVCS is easier for the casual contributor to use?" Surely one of the reasons that a project chooses a DVCS is to allow for more community contributions in an easy to merge way that has a clear contribution history. Bazaar just works. It works for the hard-core developers, but is also easy for those soft-core (?).

From the people I talk to, and I've tried to talk to many here, is that of those that use Bazaar it just works. Bazaar doesn't get in your way of developing the software that you are working on. It is just a tool that works.

One final point. The questions were "Who uses <insert DVCS>?", not "Who likes/loves using <insert DVCS>?".

Read more
Tim Penhey (thumper)

bzr alias

I now officially have some of my code in an open source project where the work was actually done entirely on my own time. Despite being involved with professional software development for almost twenty years, I've normally kept my programming to work hours, or private projects.

This time though, it was different. This time it was truly scratching an itch. I've become somewhat of a Bazaar convert. Bazaar has for a long time allowed you to define command aliases in your bazaar.conf configuration file. I have always used bash aliases for commands that I do really often, so it seemed natural to me to define aliases for bazaar for commands that I used often as well.

Next came the internal conflict. I am an inherently lazy person. This is why I like aliases, less typing. One of the things that bugged me was having to actually edit the configuration file any time I wanted to add or modify an alias. This bugged me. It bugged me so much it caused me to actually do something about it. Luckily for me Bazaar is written primarily in python, and this just happens to be my current favourite language.

The code is now committed to trunk, and should be available generally in bzr 1.6.


  • bzr alias — lists your current aliases

  • bzr alias commit="commit --strict" — sets the alias for commit

  • bzr alias commit — print out the current alias for commit

  • bzr alias --remove commit — removes the commit alias


Obviously if you use alaises as much as I do, one of the first things you'll do is set the alias for unalias.

bzr alias unalias="alias --remove"

My current aliases (that I'll tell you about):

tim@slacko:~/src/bzr/alias-command$ ./bzr alias
bzr alias cbranch="cbranch --lightweight --hardlink"
bzr alias col="checkout --lightweight"
bzr alias commit="commit --strict"
bzr alias lastdiff="diff -r-2..-1"
bzr alias lastlog="log -r-2..-1"
bzr alias ll="log --line -r-10..-1"
bzr alias my-missing="missing --mine-only"
bzr alias unalias="alias --remove"

Read more
Tim Penhey (thumper)

Code in Launchpad

Launchpad offers many things to developers, and open source software developers in particular. One of these things is the ability to host Bazaar branches. For those that have looked a little deeper, they will have noticed that there are four types of branches in Launchpad: Hosted; Mirrored; Remote; and Imported. Hmm, this isn't really what I was intending to talk about at all, but I'm going to go with the flow.

Hosted branches are those where Launchpad is the primary public location of the branch. Hosted branches are normally created by pushing a branch directly to Launchpad. Before you do that though, you need to have registered on Launchpad, and supplied an SSH key. This is how Launchpad knows who you are. There are two ways you can push a branch to Launchpad: one is via SFTP; and the other using the Bazaar smart server (bzr+ssh).

As an example I'm going to use my alias-command bzr branch. The complete SFTP location would be sftp://thumper@bazaar.launchpad.net/~thumper/bzr/alias-command, and the smart server one bzr+ssh://thumper@bazaar.launchpad.net/~thumper/bzr/alias-command. These are a bit unwieldy, so we extended the lp type urls for bzr to support writing if the launchpad plug-in knows who you are. In order for you to do this you use the lp-login command. bzr lp-login will tell you the username that is currently set. If you have not done this yet, you'll see a message like "No Launchpad user ID configured." I set mine by saying bzr lp-login thumper. This stores thumper as the launchpad_username in the bazaar.conf file. This also means I can use bzr push lp:~thumper/bzr/alias-command to push to my hosted Launchpad branch.

Mirrored branches allow you to have your branches stored publicly in some location that you control, and you let Launchpad know where this is. Launchpad will then update its copy of your branch every six hours. This is handy if you don't have an SSH key, or you have a slow network connection, or you just like having your branches available on your own server.

Remote branches are a bit different. Remote branches were sort of created out of necessity. Some people were registering mirrored branches with unreachable locations. Some of these were possibly by mistake, but quite a few were obviously inaccessible. But more strange is that those branches were linked to bugs or blueprints. There was obviously a desire to have branch meta-data there, but not actually allow Launchpad to get access to the branches. So we have remote branches. You cannot get a copy of a remote branch from Launchpad as Launchpad does not have a copy of it.

Imported branches are those branches where Launchpad get the code from either CVS or Subversion, and puts it into a Bazaar branch. I was really wanting to talk about this as I saw two projects recently where we are importing code that I didn't know about. One is my favourite music player, Amarok, and the other was MPlayer. Just out of curiosity I looked at both of these branches on Launchpad. The Amarok one has 12195 revisions as I'm writing this, and the last revision was 11 hours old, and MPlayer had even more revisions, at 26761. However that isn't even the cool bit. What is really nifty is you can go bzr branch lp:amarok or bzr branch lp:mplayer to get the code. Just to check I did just that, and got a copy of the amarok source. It was the first bit of C++ I had looked at in a long time (it used to be all I did).

Anyway, that was what I really wanted to say. Oh yeah, and bzr rocks.

Read more
Tim Penhey (thumper)

The Launchpad branch directory service

Recently Bazaar grew a branch directory service. This allows plug-in developers to define custom "protocols" that resolve the branch names into some other branch location.

The Launchpad plug-in defines a protocol "lp". Launchpad uses the other parts of the URL to relate to projects, series or individual's branches. The shortest valid URL for Launchpad is something like lp:do. You can also use an empty authority (or site part of a URL), so lp:///do is exactly the same, just longer. Personally I prefer the one without all the slashes.

The do part of the branch location relates to the GNOME Do project on Launchpad. There is a little magic (a.k.a. configuration) that is needed to make this work. Projects in Launchpad get an initial development focus series created for them. This is intended to relate to the branch of development that is where current or new work goes. In order to have the code available through the Launchpad directory service, the code has to be available through Launchpad as a normal branch.

Once a branch has been either hosted, mirrored or imported for the project, one of the people responsible for the project in Launchpad can relate the branch with the series. Once this is done the branch is easily accessible. People that have permission to make these links will be shown a link on the main code tab for their project (we don't taunt people who can't make the links with an invalid option).

If there is another series, say 1.0 for our project fooix, and we have branches associated with them, then we could get the 1.0 branch using lp:fooix/1.0. Normal Launchpad branches are also accessible using the lp protocol using lp:~username/project/branch-name.

Read more