Canonical Voices

Posts tagged with 'dvcs'

David Murphy (schwuk)

I was browsing Twitter last night when Thoughbot linked to their post about commit messages.

This was quite timely as my team has been thinking about improving the process of creating our release notes, and it has been proposed that we generate them automatically from our commit messages. This in turn requires that we have commit messages of sufficient quality, which – to be honest – we don’t always. So the second proposal is to enforce “good” commit messages as part of reviewing and approving merge proposals into our projects. See this post from Kevin on my team for an overview of our branching strategies to get an idea of how our projects are structured.

We still need to define what constitutes a “good” message, but we will certainly use both the article from Thoughtbot and the oft-referenced advice from Tim Pope as our basis. We are also only planning to apply this to commits to trunk because, well, you don’t need a novel – or even a short story – for every commit in your spike branch!

Now, back to the Thoughtbot article, and this piece of advice stood out for me:

Never use the -m <msg> / --message=<msg> flag to git commit.

Since I first discovered -m I have used it almost exclusively, thinking I’m being so clever and efficient, but in reality I’ve been restricting what I could say to what felt “right” on an 80 character terminal. If nothing else, I will be trying to avoid the use of -m from now on.

Read more
Tim Penhey (thumper)

You're doing it wrong!

Just yesterday I found a missing feature in one of the apps I just started using. My thought processes were something along the lines of “hey, I could add this feature and it would be good”. So I went to the project's website, found their source code repository, and got blown away by the comment that was with it:

Please note that code you get from this repository is not intended for productive use (unless it's tagged as a released version, of course, in which case the usual alpha/beta disclaimers apply ;-)). We like to break our codebase, config files, database schemas and all kinds of stuff. We sometimes commit non-compiling revisions to facilitate collaborative development. Running such an unstable version might trash your settings, your backlog and maybe your computer. You have been warned!


Eh? OK, I get the first sentence. It is even a good disclaimer. Tagged releases are more stable. People regularly commit code that is unpolished. Sometimes even with some known bugs or issues.

The second sentence has me going “NO!?! What are you doing?

The third sentence just blew my mind. This project is using a DVCS. Not my DVCS of choice, but really that doesn't matter. All DVCSs are made to have good merging and sharing of code between developers. Saying “We sometimes commit non-compiling revisions to facilitate collaborative development” is just a lack of understanding of how to use the tools. You are using a DVCS to facilitate collaborative development! This is centralised version control thinking.

Try this for a code to work by:
Trunk should always at least compile, run, and pass all the tests.


This hasn't stopped me wanting to work on the code, but it has raised my caution levels.

Read more
Tim Penhey (thumper)

Bazaar has the model right

Some people in the GNOME community have suggested that if Bazaar has nice usability, then GNOME can just use Git on the back-end, and Bazaar lovers can just use the Git back-end via Bazaar. It's true that Bazaar could support this — an experimental plug-in exists to do this right now. But this suggestion betrays several wrong assumptions.

People assume Git and Bazaar are the same. They're not. People assume that if Git and Bazaar have technical differences, then Git must have it right.

The problem with these assumptions is that usability begins at the ground level. Bazaar started with a focus on usability. Git began with a focus on speed. The data models of both Bazaar and Git reflect their initial focus. But Bazaar's model can also be fast. In fact, the Bazaar developers are currently optimising a number of key operations for speed.

Data retrieval

Git and Bazaar are both key/value mapping systems. When bytes are needed, they are requested with that key.

The big difference is that Git's keys are also the hashes of the bytes. This is why it's called a content-addressable file system. This allows git to offer a guarantee that if the value hashes to the key, it has not been modified, whether deliberately or by accident. The Bazaar team considered adopting this approach, but decided it was too constricting. Bazaar uses UUIDs instead.

Authenticating revisions

For detecting malicious modification of revisions, Git uses its cryptographic hashes.

Bazaar uses revision-signing. All revisions can be PGP-signed. No signed revision can be forged. And the hashed representation can easily be generated and passed around to ensure that exactly the same content is used.

If SHA-1 is broken, both Bazaar and Git will lose their ability to detect malicious modification. But since Bazaar uses UUIDs to identify revisions, users can re-sign their old revisions with whatever method proves to be secure. Changing the hash used by Git would make it incompatible with all existing repositories.

Data Integrity and Serialization formats

Bazaar stores hashes of every value, so it equally capable of detecting accidental modification. It can be useful to have different representations of a tree in different repositories. For example, when Git lists files, it divides this data by directory. This is a good approach, but not necessarily the best approach. An alternative approach would be to use a radix tree. This would ensure that Git performed quickly even if users put unreasonable numbers of files in a single directory. But Git's keys are hashes, upgrading Git's format to use radix trees would change the keys, which means that people could not use the commit-id from one repository to refer to the same tree in an other form.

Bazaar doesn't assume it has the perfect format. It provides an upgrade path, and does't change the commit-id of a revision if you change your format. What's more, Bazaar can even reference data it has never seen. This allows partial imports from other VCSes to be fully compatible with more complete imports. And if a VCS provides UUIDs (content hashes certainly qualify as UUIDs), Bazaar can refer to those UUIDs directly.

File and directory representation

Git refers to files by path. It makes no attempt to track renames in its data store.

Bazaar has an inode abstraction; files and directories both have ids. When a file is renamed, its id stays the same. Bazaar's core code refers to files by their id, so merging a renamed file requires no special effort.

Git's approach means that users are warned not to rename files while changing their content. But when files are renamed, those files that refer to the renamed files must have their contents changed as well. For example, if you rename foo.h and foo.c to bar.h and bar.c, you should update the contents of bar.c, or else you will break the build. With Bazaar, users can do whatever they want, and the VCS just works. While Git must always use heuristics to deduce renames, Bazaar does not have to. Of course, it can if it wants to. This is an example of why it is important to design a model for usability from the beginning.

Bazaar can import rename data losslessly from foreign VCSes. Some other VCSes support file-ids, and Bazaar can reuse those without change. For VCSes that support renames, but not file-ids, Bazaar's representation is also non-lossy. When data imports are deterministic and non-lossy, it's easy to export them back to their source VCS. Bazaar's Subversion integration is a great example of how this can work.

Choose the back-end with the right model

In any situation it makes sense to use a back-end that stores the richer dataset. It makes more sense to have a front end client that doesn't use all the functionality or data representation of the back-end than it does to have a richer client that isn't able to store the required information as the back-end is not able to represent it.

If a single back-end storage is going to be used, it makes more sense to use a Bazaar back-end as Bazaar is able to represent everything that Git does, but the reverse is not true.

Conclusion

The Bazaar developers focused on usability, which requires having a model that supports usability. Bazaar has improved its model to increase the usability of the system. We believe that Bazaar has the right model.

co-written by Aaron Bentley and Tim Penhey

Read more
Tim Penhey (thumper)

J5 mentioned in his post his interpretation of the number of users for GIT, Bazaar and Hg (Mercurial). He also finishes with "Converse amongst yourselves".

I guess I should first point out that I am a Bazaar user, and that I work for Canonical. I felt somewhat enraged at the post from J5, and have spent some time trying to work out some response.

John Carr mentioned that 83% of statistics are made up on the spot, and that cannot be more true here. I had been waiting for someone else to post the numbers that they saw at the BOF, but so far I have not seen one.

Here is my take on it.

Yes there were more GIT users than Bazaar users at the BOF, but the numbers were more like 50% of the audience were GIT users, and about 40% were Bazaar users. Someone piped up and said "What about Mercurial?" and so the question was asked, and there were about five or six people. There was an overlap of the GIT and Bazaar groups, and there was by far the larger majority of the audience that had not used any DVCS.

What conclusions can we draw from this? Not much. Many people attending the pre-conference work for larger companies, like Red Hat, Novell, and Nokia, and many of those people work on some hard core linux stuff, many of which have chosen GIT. Many have chosen GIT because that is what the linux kernel is using. Is that a good reason to chose a DVCS? I don't feel that we can really answer that question as I am sure there are strong advocates for both sides.

An interesting question is "Which DVCS is easier for the casual contributor to use?" Surely one of the reasons that a project chooses a DVCS is to allow for more community contributions in an easy to merge way that has a clear contribution history. Bazaar just works. It works for the hard-core developers, but is also easy for those soft-core (?).

From the people I talk to, and I've tried to talk to many here, is that of those that use Bazaar it just works. Bazaar doesn't get in your way of developing the software that you are working on. It is just a tool that works.

One final point. The questions were "Who uses <insert DVCS>?", not "Who likes/loves using <insert DVCS>?".

Read more