Canonical Voices

Posts tagged with 'tech'

Today I would like to talk about the couchapp tool. This is something that you can use when working on couchapps, and provides a way to quickly iterate your design.

However, rather confusingly, the couchapp tool isn't actually required for couchapps. If you get a design document with HTML attachments in to your database then you have a couchapp.

Why would you want to use such a tool then? Firstly because it will generate a skeleton couchapp for you, so that you don't have to remember how to organise it, and if it is your first couchapp it's good to start from something working.

More importantly though, the couchapp tool is useful as you develop as it allows you to edit the parts of your app in their native format. This means that if you are writing a HTML snippet you can just put some HTML in a file, rather than having to write it as part of a deeply nested JSON structure and deal with the errors that you would make if you did that. Also it means that you can use things like syntax highlighting in your preferred text editor without any special magic.

How it works

At it's core couchapp is a rather simple tool. It walks a filesystem tree and assembles the things that it finds there in to a JSON document.

For instance, if it finds a directory at the root of the tree called _attachments it puts the content of each file there in to a list which it puts in to the dict with an "_attachments" key, Which is one of the ways that couchdb accepts document attachments. Therefore if you have an _attachments/index.html file in your tree it will be attached to your design document when the JSON structure is sent to couchdb.

This continues across the tree, so the contents of the "views" directory will become the "views" key of the documents, which is how you do map/reduce queries on the database.

couchapp has various conventions for dealing with files. For instance if it finds a ".js" file it treats it as a javascript snippet which will be encoded in to a string in the resulting document. ".html" files outside of the "_attachments" directory will also be encoded as strings. If it finds a ".json" file then it treats it as literal JSON that will be embebedded.

This way it builds up the JSON structure that a couchapp expects, and will send it to the couchdb of your choice when it is done.

In addition to this functionality the tool can also generate you a skeleton app, and also add new pieces to your app, such as new views.

Getting It

couchapp is a python tool, so you can install it using pip or similar. However, Ubuntu users can install it from a PPA (yay for daily builds with recipes!).

Using It

To use it run

couchapp generate myapp

which will create you a new skeleton in myapp.

cd myapp
ls

You will see for instance the _attachments and views directories, and an _attachments/index.html.

To get your app in to couchdb you can run

couchapp push http://localhost:5984/mydb

and it will tell you the URL to visit to see your new app.

If you want to use desktopcouch you can run

couchapp push desktopcouch://

though I think it has a bug that it prints the wrong URLs when pushing to desktopcouch.

Once you have looked at the HTML generated by your app you should look at the design document that couchapp created. Go to

http://localhost:5984/_utils

or

xdg-open ~/.local/share/desktop-couch/couchdb.html

if you are using desktopcouch.

Click to the mydb database and you will see a document called _design/myapp. Click on this and you will see the content of the design document; you are looking at a couchapp in its raw form.

If you compare what is in that design document with what is in the myapp directory that the tool created you should start to see how it generates it from the filesystem.

Now try making a change on the filesystem, for instance edit _attachments/index.html and put your name somewhere in the body. Then push again, running

couchapp push http://localhost:5984/mydb

and refresh the page in your browser and you should see the change. (Just click on index.html in the design document to get back to viewing your app from there).

I will go in to more detail about the content of the couchapp that was generated for you in another post.

See the previous installment.

Read more

Today I would like to talk about the couchapp tool. This is something that you can use when working on couchapps, and provides a way to quickly iterate your design.

However, rather confusingly, the couchapp tool isn't actually required for couchapps. If you get a design document with HTML attachments in to your database then you have a couchapp.

Why would you want to use such a tool then? Firstly because it will generate a skeleton couchapp for you, so that you don't have to remember how to organise it, and if it is your first couchapp it's good to start from something working.

More importantly though, the couchapp tool is useful as you develop as it allows you to edit the parts of your app in their native format. This means that if you are writing a HTML snippet you can just put some HTML in a file, rather than having to write it as part of a deeply nested JSON structure and deal with the errors that you would make if you did that. Also it means that you can use things like syntax highlighting in your preferred text editor without any special magic.

How it works

At it's core couchapp is a rather simple tool. It walks a filesystem tree and assembles the things that it finds there in to a JSON document.

For instance, if it finds a directory at the root of the tree called _attachments it puts the content of each file there in to a list which it puts in to the dict with an "_attachments" key, Which is one of the ways that couchdb accepts document attachments. Therefore if you have an _attachments/index.html file in your tree it will be attached to your design document when the JSON structure is sent to couchdb.

This continues across the tree, so the contents of the "views" directory will become the "views" key of the documents, which is how you do map/reduce queries on the database.

couchapp has various conventions for dealing with files. For instance if it finds a ".js" file it treats it as a javascript snippet which will be encoded in to a string in the resulting document. ".html" files outside of the "_attachments" directory will also be encoded as strings. If it finds a ".json" file then it treats it as literal JSON that will be embebedded.

This way it builds up the JSON structure that a couchapp expects, and will send it to the couchdb of your choice when it is done.

In addition to this functionality the tool can also generate you a skeleton app, and also add new pieces to your app, such as new views.

Getting It

couchapp is a python tool, so you can install it using pip or similar. However, Ubuntu users can install it from a PPA (yay for daily builds with recipes!).

Using It

To use it run

couchapp generate myapp

which will create you a new skeleton in myapp.

cd myapp
ls

You will see for instance the _attachments and views directories, and an _attachments/index.html.

To get your app in to couchdb you can run

couchapp push http://localhost:5984/mydb

and it will tell you the URL to visit to see your new app.

If you want to use desktopcouch you can run

couchapp push desktopcouch://

though I think it has a bug that it prints the wrong URLs when pushing to desktopcouch.

Once you have looked at the HTML generated by your app you should look at the design document that couchapp created. Go to

http://localhost:5984/_utils

or

xdg-open ~/.local/share/desktop-couch/couchdb.html

if you are using desktopcouch.

Click to the mydb database and you will see a document called _design/myapp. Click on this and you will see the content of the design document; you are looking at a couchapp in its raw form.

If you compare what is in that design document with what is in the myapp directory that the tool created you should start to see how it generates it from the filesystem.

Now try making a change on the filesystem, for instance edit _attachments/index.html and put your name somewhere in the body. Then push again, running

couchapp push http://localhost:5984/mydb

and refresh the page in your browser and you should see the change. (Just click on index.html in the design document to get back to viewing your app from there).

I will go in to more detail about the content of the couchapp that was generated for you in another post.

See the previous installment.

Read more

Couchapps are a particular way of using couchdb that allow you to serve web applications directly from the database. These applications generate HTML and javascript to present data from couchdb to the user, and then update the database and the UI based on their actions.

Of course there are plenty of frameworks out there that do this sort of thing, and more and more of them are adding couchdb support. What makes couchapps particularly interesting are two things. Firstly, the ease with which they can be developed and deployed. As they are served directly from couchdb they require little infrastructure, and the couchapp tool allows for a rapid iteration. In addition, the conveniences that are provided mean that simple things can be done very quickly with little code.

The other thing that makes couchapps attractive is that they live inside the database. This means that the code lives alongside the data, and will travel with it as it is replicated. This means that you can easily have an app that you have fast, local access to on your desktop, while at the same time replicating to a server so that you can access the same data from your phone while you are out. Again, this doesn't require couchapps, and they won't be suitable for all needs, but they are certainly an interesting idea.

You can read more about couchapps at http://couchapp.org.

Intrigued by couchapps I set out to play with them over a weekend. Unfortunately the documentation is rather lacking currently, so I wouldn't recommend experimenting yourself if you are not happy digging around for answers, and sometimes not finding them outside the code. In order to go a little way to rectifying this, I intend to write a few posts about the things I wish I had known when I started out. I found everything to be a little strange at first, and it wasn't even clear where the entry point of a couchapp was for instance. Hopefully these posts will be found using google by others who are struggling in a similar way.

Architecture

Firstly something about the pieces that make up a couchapp (or at least those that the tool and documentation recommend,) and the way that they all fit together.

At the core is the couchdb database itself. It is a collection of "documents", each of which can have attachments. Some of these documents are known as "design documents," and they start with a prefix of "_design." Design documents can have "view" functions, and various other special fields that can be used to query or manipulate other documents.

A couchapp is a design document with an attachment, usually called index.html. These attachments are served directly by couchdb and can be accessed at a known URL. You can put anything you like in that html file, and you could just have a static page if you wanted. Usually however though it is is an HTML page that uses javascript in order to display the results of queries on the database. The user will then access the attachment on the design document, and will interact with the resulting page.

In theory you can do anything you like in that page, but it is usual to make use of standard tools in order to query the database and provide information and opportunity for interaction to the user.

The first standard tool is jQuery, with a couple of plugins for working with couchdb and couchapps specifically. These allow for querying views in the database and acting on the results, retrieving and updating documents, and plenty more.

In addition the couchapp tool sets you up with another jQuery plugin called "evently", which is a way to structure interactions with jQuery, and change the page based on various events. I will go in to more detail about how evently works in a later post.

In addition to all the client-side tools for interacting with the database, it is also possible to make use of couchdb features such as shows, lists, update handlers validation functions in order to move some of the processing server-side. This is useful for various reasons, including being more accessible, allowing search engines to index the content, and not having to trust the client not to take malicious actions.

The two approaches can be combined, and you can prototype with the client-side tools, and then move some of the work to the server-side facilities later.

Stay tuned for more on how a simple couchapp generates content based on what is in the db.

Read more

Couchapps are a particular way of using couchdb that allow you to serve web applications directly from the database. These applications generate HTML and javascript to present data from couchdb to the user, and then update the database and the UI based on their actions.

Of course there are plenty of frameworks out there that do this sort of thing, and more and more of them are adding couchdb support. What makes couchapps particularly interesting are two things. Firstly, the ease with which they can be developed and deployed. As they are served directly from couchdb they require little infrastructure, and the couchapp tool allows for a rapid iteration. In addition, the conveniences that are provided mean that simple things can be done very quickly with little code.

The other thing that makes couchapps attractive is that they live inside the database. This means that the code lives alongside the data, and will travel with it as it is replicated. This means that you can easily have an app that you have fast, local access to on your desktop, while at the same time replicating to a server so that you can access the same data from your phone while you are out. Again, this doesn't require couchapps, and they won't be suitable for all needs, but they are certainly an interesting idea.

You can read more about couchapps at http://couchapp.org.

Intrigued by couchapps I set out to play with them over a weekend. Unfortunately the documentation is rather lacking currently, so I wouldn't recommend experimenting yourself if you are not happy digging around for answers, and sometimes not finding them outside the code. In order to go a little way to rectifying this, I intend to write a few posts about the things I wish I had known when I started out. I found everything to be a little strange at first, and it wasn't even clear where the entry point of a couchapp was for instance. Hopefully these posts will be found using google by others who are struggling in a similar way.

Architecture

Firstly something about the pieces that make up a couchapp (or at least those that the tool and documentation recommend,) and the way that they all fit together.

At the core is the couchdb database itself. It is a collection of "documents", each of which can have attachments. Some of these documents are known as "design documents," and they start with a prefix of "_design." Design documents can have "view" functions, and various other special fields that can be used to query or manipulate other documents.

A couchapp is a design document with an attachment, usually called index.html. These attachments are served directly by couchdb and can be accessed at a known URL. You can put anything you like in that html file, and you could just have a static page if you wanted. Usually however though it is is an HTML page that uses javascript in order to display the results of queries on the database. The user will then access the attachment on the design document, and will interact with the resulting page.

In theory you can do anything you like in that page, but it is usual to make use of standard tools in order to query the database and provide information and opportunity for interaction to the user.

The first standard tool is jQuery, with a couple of plugins for working with couchdb and couchapps specifically. These allow for querying views in the database and acting on the results, retrieving and updating documents, and plenty more.

In addition the couchapp tool sets you up with another jQuery plugin called "evently", which is a way to structure interactions with jQuery, and change the page based on various events. I will go in to more detail about how evently works in a later post.

In addition to all the client-side tools for interacting with the database, it is also possible to make use of couchdb features such as shows, lists, update handlers validation functions in order to move some of the processing server-side. This is useful for various reasons, including being more accessible, allowing search engines to index the content, and not having to trust the client not to take malicious actions.

The two approaches can be combined, and you can prototype with the client-side tools, and then move some of the work to the server-side facilities later.

Stay tuned for more on how a simple couchapp generates content based on what is in the db.

Read more

The examples for Django testing point you towards hardcoding a username and password for a user to impersonate in tests, and the API of the test client encourages this too.

However, Django has a nice pluggable authentication system that means you can easily use something such as OpenID instead of passwords.

Putting the passwords in your tests ties you to having the password support enabled, and while you could do this for just the tests, it's completely out of the scope of most tests (I'm not talking about any tests for the actual login process here.)

When I saw this while reviewing code recently I worked with Zygmunt to write a Client subclass that didn't have this restriction. With this subclass you can just choose a User object, and have that client login as that user, without them having to have a password at all. Doing this decoples your tests from the implementation of the authentication system, and makes them target the code you want to test more precisely.

Here's the code:

from django.conf import settings
from django.contrib.auth import login
from django.http import HttpRequest
from django.test.client import Client


class TestClient(Client):

    def login_user(self, user):
        """
        Login as specified user, does not depend on auth backend (hopefully)

        This is based on Client.login() with a small hack that does not
        require the call to authenticate()
        """
        if not 'django.contrib.sessions' in settings.INSTALLED_APPS:
            raise AssertionError("Unable to login without django.contrib.sessions in INSTALLED_APPS")
        user.backend = "%s.%s" % ("django.contrib.auth.backends",
                                  "ModelBackend")
        engine = import_module(settings.SESSION_ENGINE)

        # Create a fake request to store login details.
        request = HttpRequest()
        if self.session:
            request.session = self.session
        else:
            request.session = engine.SessionStore()
        login(request, user)

        # Set the cookie to represent the session.
        session_cookie = settings.SESSION_COOKIE_NAME
        self.cookies[session_cookie] = request.session.session_key
        cookie_data = {
            'max-age': None,
            'path': '/',
            'domain': settings.SESSION_COOKIE_DOMAIN,
            'secure': settings.SESSION_COOKIE_SECURE or None,
            'expires': None,
        }
        self.cookies[session_cookie].update(cookie_data)

        # Save the session values.
        request.session.save()

Then you can use it in your tests like this:

from django.contrib.auth.models import User


client = TestClient()
user = User(username="eve")
user.save()
client.login_user(user)

Then any requests you make with that client will be authenticated as the user that was created.

Ticket submitted with Django to have this available for everyone in future.

Read more

The examples for Django testing point you towards hardcoding a username and password for a user to impersonate in tests, and the API of the test client encourages this too.

However, Django has a nice pluggable authentication system that means you can easily use something such as OpenID instead of passwords.

Putting the passwords in your tests ties you to having the password support enabled, and while you could do this for just the tests, it's completely out of the scope of most tests (I'm not talking about any tests for the actual login process here.)

When I saw this while reviewing code recently I worked with Zygmunt to write a Client subclass that didn't have this restriction. With this subclass you can just choose a User object, and have that client login as that user, without them having to have a password at all. Doing this decoples your tests from the implementation of the authentication system, and makes them target the code you want to test more precisely.

Here's the code:

from django.conf import settings
from django.contrib.auth import login
from django.http import HttpRequest
from django.test.client import Client


class TestClient(Client):

    def login_user(self, user):
        """
        Login as specified user, does not depend on auth backend (hopefully)

        This is based on Client.login() with a small hack that does not
        require the call to authenticate()
        """
        if not 'django.contrib.sessions' in settings.INSTALLED_APPS:
            raise AssertionError("Unable to login without django.contrib.sessions in INSTALLED_APPS")
        user.backend = "%s.%s" % ("django.contrib.auth.backends",
                                  "ModelBackend")
        engine = import_module(settings.SESSION_ENGINE)

        # Create a fake request to store login details.
        request = HttpRequest()
        if self.session:
            request.session = self.session
        else:
            request.session = engine.SessionStore()
        login(request, user)

        # Set the cookie to represent the session.
        session_cookie = settings.SESSION_COOKIE_NAME
        self.cookies[session_cookie] = request.session.session_key
        cookie_data = {
            'max-age': None,
            'path': '/',
            'domain': settings.SESSION_COOKIE_DOMAIN,
            'secure': settings.SESSION_COOKIE_SECURE or None,
            'expires': None,
        }
        self.cookies[session_cookie].update(cookie_data)

        # Save the session values.
        request.session.save()

Then you can use it in your tests like this:

from django.contrib.auth.models import User


client = TestClient()
user = User(username="eve")
user.save()
client.login_user(user)

Then any requests you make with that client will be authenticated as the user that was created.

Ticket submitted with Django to have this available for everyone in future.

Read more

Normally when you write some code using launchpadlib you end up with Launchpad showing your users something like this:

/images/lplib-before.png

This isn't great, how is the user supposed to know which option to click? What do you do if they don't choose the option you want?

Instead it's possible to limit the choices that the user has to make to only those that your application can use, plus the option to deny all access, by changing the way you create your Launchpad object.

from launchpadlib.launchpad import Launchpad

lp = Launchpad.get_token_and_login("testing", allow_access_levels=["WRITE_PUBLIC"])

This will present your users with something like this:

/images/lplib-after.png

which is easier to understand. There could be further improvements, but they would happen on the Launchpad side.

This approach works for both Launchpad.get_token_and_login and Launchpad.login_with.

The values that you can pass here aren't documented, and should probably be constants in launchpadlib, rather than hardcoded in every application, but for now you can use:

  • READ_PUBLIC
  • READ_PRIVATE
  • WRITE_PUBLIC
  • WRITE_PRIVATE

Read more

Normally when you write some code using launchpadlib you end up with Launchpad showing your users something like this:

/images/lplib-before.png

This isn't great, how is the user supposed to know which option to click? What do you do if they don't choose the option you want?

Instead it's possible to limit the choices that the user has to make to only those that your application can use, plus the option to deny all access, by changing the way you create your Launchpad object.

from launchpadlib.launchpad import Launchpad

lp = Launchpad.get_token_and_login("testing", allow_access_levels=["WRITE_PUBLIC"])

This will present your users with something like this:

/images/lplib-after.png

which is easier to understand. There could be further improvements, but they would happen on the Launchpad side.

This approach works for both Launchpad.get_token_and_login and Launchpad.login_with.

The values that you can pass here aren't documented, and should probably be constants in launchpadlib, rather than hardcoded in every application, but for now you can use:

  • READ_PUBLIC
  • READ_PRIVATE
  • WRITE_PUBLIC
  • WRITE_PRIVATE

Read more

Dear Mr Neary, thanks for your thought provoking post, I think it is a problem we need to be aware of as Free Software matures.

Firstly though I would like to say that the apparent ageism present in your argument isn't helpful to your point. Your comments appear to diminish the contributions of a whole generation of people. In addition, we shouldn't just be concerned with attracting young people to contribute, the same changes will have likely reduced the chances that people of all ages will get involved.

Aside from that though there is much to discuss. You talk about the changes in Free Software since you got involved, and it mirrors my observations. While these changes may have forced fewer people to learn all the details of how the system works, they have certainly allowed more people to use the software, bringing many different skills to the party with them.

I would contend that often the experience for those looking to do the compilation that you rate as important has parallels to the experience of just using the software you present from a few years ago. If we can change that experience as much as we have the installation and first use experience then we will empower more people to take part in those activities.

It is instructive then to look at how the changes came about to see if there are any pointers for us. I think there are two causes of the change that are of interest to this discussion.

Firstly, one change has been an increased focus on user experience. Designing and building software that serves the users needs has made it much more palatable for people, and reduced the investment that people have to make before using it. In the same way I think we should focus on developer experience, making it more pleasant to perform some of the tasks needed to be a hobbyist. Yes, this means hiding some of the complexity to start with, but that doesn't mean that it can't be delved in to later. Progressive exposure will help people to learn by not requiring them to master the art before being able to do anything.

Secondly, there has been a push to make informed decisions on behalf of the user when providing them with the initial experience. You no longer get a base system after installation, upon which you are expected to select from the thousands of packages to build your perfect environment. Neither are you led to download multiple CDs that contain the entire contents of a distribution, much of which is installed by default. Instead you are given an environment that is already equipped to do common tasks, where each task is covered by an application that has been selected by experts on your behalf.

We should do something similar with developer tools, making opinionated decisions for the new developer, and allowing them to change things as they learn, similar to the way in which you are still free to choose from the thousands of packages in the distribution repositories. Doing this makes documentation easier to write, allows for knowledge sharing, and reduces the chances of paralysis of choice.

There are obviously difficulties with this given that often the choice of tool that one person makes on a project dicatates or heavily influences the choice other people have to make. If you choose autotools for your projects then I can't build it with CMake. Our development tools are important to us as they shape the environment in which we work, so there are strong opinions, but perhaps consistency could become more of a priority. There are also things we can do with libraries, format specifications and wrappers to allow choice while still providing a good experience for the fledgling developer.

Obviously as we are talking about free software the code will always be available, but that isn't enough in my mind. It needs to be easier to go from code to something you can install and remove, allowing you to dig deeper once you have achieved that.

I believe that our effort around things like https://dev.launchpad.net/BuildBranchToArchive will go some way to helping with this.

Read more

Dear Mr Neary, thanks for your thought provoking post, I think it is a problem we need to be aware of as Free Software matures.

Firstly though I would like to say that the apparent ageism present in your argument isn't helpful to your point. Your comments appear to diminish the contributions of a whole generation of people. In addition, we shouldn't just be concerned with attracting young people to contribute, the same changes will have likely reduced the chances that people of all ages will get involved.

Aside from that though there is much to discuss. You talk about the changes in Free Software since you got involved, and it mirrors my observations. While these changes may have forced fewer people to learn all the details of how the system works, they have certainly allowed more people to use the software, bringing many different skills to the party with them.

I would contend that often the experience for those looking to do the compilation that you rate as important has parallels to the experience of just using the software you present from a few years ago. If we can change that experience as much as we have the installation and first use experience then we will empower more people to take part in those activities.

It is instructive then to look at how the changes came about to see if there are any pointers for us. I think there are two causes of the change that are of interest to this discussion.

Firstly, one change has been an increased focus on user experience. Designing and building software that serves the users needs has made it much more palatable for people, and reduced the investment that people have to make before using it. In the same way I think we should focus on developer experience, making it more pleasant to perform some of the tasks needed to be a hobbyist. Yes, this means hiding some of the complexity to start with, but that doesn't mean that it can't be delved in to later. Progressive exposure will help people to learn by not requiring them to master the art before being able to do anything.

Secondly, there has been a push to make informed decisions on behalf of the user when providing them with the initial experience. You no longer get a base system after installation, upon which you are expected to select from the thousands of packages to build your perfect environment. Neither are you led to download multiple CDs that contain the entire contents of a distribution, much of which is installed by default. Instead you are given an environment that is already equipped to do common tasks, where each task is covered by an application that has been selected by experts on your behalf.

We should do something similar with developer tools, making opinionated decisions for the new developer, and allowing them to change things as they learn, similar to the way in which you are still free to choose from the thousands of packages in the distribution repositories. Doing this makes documentation easier to write, allows for knowledge sharing, and reduces the chances of paralysis of choice.

There are obviously difficulties with this given that often the choice of tool that one person makes on a project dicatates or heavily influences the choice other people have to make. If you choose autotools for your projects then I can't build it with CMake. Our development tools are important to us as they shape the environment in which we work, so there are strong opinions, but perhaps consistency could become more of a priority. There are also things we can do with libraries, format specifications and wrappers to allow choice while still providing a good experience for the fledgling developer.

Obviously as we are talking about free software the code will always be available, but that isn't enough in my mind. It needs to be easier to go from code to something you can install and remove, allowing you to dig deeper once you have achieved that.

I believe that our effort around things like https://dev.launchpad.net/BuildBranchToArchive will go some way to helping with this.

Read more

If you don't want to read this article, then just steer clear of python-multiprocessing, threads and glib in the same application. Let me explain why.

There's a rather famous bug in Gwibber in Ubuntu Lucid, where a gwibber-service process will start taking 100% of the CPU time of one of your cores if it can. While looking in to why this bug happened I learnt a lot about how multiprocessing and GLib work, and wanted to record some of this so that others may avoid the bear traps.

Python's multiprocessing module is a nice module to allow you to easily run some code in a subprocess, to get around the restriction of the GIL for example. It makes it really easy to run a particular function in a subprocess, which is a step up from what you had to do before it existed. However, when using it you should be aware how the way it works can interact with the rest of your app, because there are some possible nasties lurking there.

GLib is a set of building blocks for apps, most notably used by GTK+. It provides an object system, a mainloop and lots more besides. What we are most interested here is the mainloop, signals, and thread integration that it provides.

Let's start the explanation by looking at how multiprocessing does its thing. When you start a subprocess using multiprocessing.Process, or something that uses it, it causes a fork(2), which starts a new process with a copy of the programs current memory, with some exceptions. This is really nice for multiprocessing, as you can just run any code from that program in the subprocess and pass the result back without too much difficulty.

The problems occur because there isn't an exec(3) to accompany the fork(2). This is what makes multiprocessing so easy to use, but doesn't insert a clean process boundary between the processes. Most notably for this example, it means the child inherits the file descriptors of the parent (critically even those marked FD_CLOEXEC).

The other piece to this puzzle is how the GLib mainloop communicates between threads. It requires some mechanism where one thread can alert another that something of interest happened. To do this when you tell GLib that you will be using threads in your app by calling g_thread_init (gobject.threads_init() in Python) then it will create a pipe for use by glib to alert other threads. It also creates a watcher thread that polls one end of this pipe so that it can act when a thread wishes to pass something on to the mainloop.

The final part of the puzzle is what your app does in a subprocess with mutliprocessing. If you purely do something such as number crunching then you won't have any issues. If however you use some glib functions that will cause the child to communicate with the mainloop then you will see problems.

As the child inherits the file descriptors of the parent it will use the same pipe for communication. Therefore if a function in the child writes to this pipe then it can put the parent in to a confused state. What happens in gwibber is that it uses some gnome-keyring functions and that puts the parent in to a state where the watcher thread created by g_thread_init busy-polls on the pipe, taking up as much CPU time as it can get from one core.

In summary, you will see issues if you use python-multiprocessing from a thread and use some glib functions in the children.

There are some ways to fix this, but no silver bullet:

  • Don't use threads, just use multiprocessing. However, you can't communicate with glib signals between subprocesses, and there's no equivalent built in to multiprocessing.
  • Don't use glib functions from the children.
  • Don't use multiprocessing to run the children, use exec(3) a script that does what you want, but this isn't as flexible or as convenient.

It may be possible to use the support for different GMainContexts for different threads to work around this, but:

  • You can't access this from Python, and
  • I'm not sure that every library you use will correctly implement it, and so you may still get issues.

Note that none of the parties here are doing anything particularly wrong, it's a bad interaction caused by some decisions that are known to cause issues with concurrency. I also think there are issues when using DBus from multiprocessing children, but I haven't thoroughly investigated that. I'm not entirely sure why the multiprocessing child seems to have to be run from a non-main thread in the parent to trigger this, any insight would be welcome. You can find a small script to reproduce the problem here.

Or, to put it another way, global state bad for concurrency.

Read more

If you don't want to read this article, then just steer clear of python-multiprocessing, threads and glib in the same application. Let me explain why.

There's a rather famous bug in Gwibber in Ubuntu Lucid, where a gwibber-service process will start taking 100% of the CPU time of one of your cores if it can. While looking in to why this bug happened I learnt a lot about how multiprocessing and GLib work, and wanted to record some of this so that others may avoid the bear traps.

Python's multiprocessing module is a nice module to allow you to easily run some code in a subprocess, to get around the restriction of the GIL for example. It makes it really easy to run a particular function in a subprocess, which is a step up from what you had to do before it existed. However, when using it you should be aware how the way it works can interact with the rest of your app, because there are some possible nasties lurking there.

GLib is a set of building blocks for apps, most notably used by GTK+. It provides an object system, a mainloop and lots more besides. What we are most interested here is the mainloop, signals, and thread integration that it provides.

Let's start the explanation by looking at how multiprocessing does its thing. When you start a subprocess using multiprocessing.Process, or something that uses it, it causes a fork(2), which starts a new process with a copy of the programs current memory, with some exceptions. This is really nice for multiprocessing, as you can just run any code from that program in the subprocess and pass the result back without too much difficulty.

The problems occur because there isn't an exec(3) to accompany the fork(2). This is what makes multiprocessing so easy to use, but doesn't insert a clean process boundary between the processes. Most notably for this example, it means the child inherits the file descriptors of the parent (critically even those marked FD_CLOEXEC).

The other piece to this puzzle is how the GLib mainloop communicates between threads. It requires some mechanism where one thread can alert another that something of interest happened. To do this when you tell GLib that you will be using threads in your app by calling g_thread_init (gobject.threads_init() in Python) then it will create a pipe for use by glib to alert other threads. It also creates a watcher thread that polls one end of this pipe so that it can act when a thread wishes to pass something on to the mainloop.

The final part of the puzzle is what your app does in a subprocess with mutliprocessing. If you purely do something such as number crunching then you won't have any issues. If however you use some glib functions that will cause the child to communicate with the mainloop then you will see problems.

As the child inherits the file descriptors of the parent it will use the same pipe for communication. Therefore if a function in the child writes to this pipe then it can put the parent in to a confused state. What happens in gwibber is that it uses some gnome-keyring functions and that puts the parent in to a state where the watcher thread created by g_thread_init busy-polls on the pipe, taking up as much CPU time as it can get from one core.

In summary, you will see issues if you use python-multiprocessing from a thread and use some glib functions in the children.

There are some ways to fix this, but no silver bullet:

  • Don't use threads, just use multiprocessing. However, you can't communicate with glib signals between subprocesses, and there's no equivalent built in to multiprocessing.
  • Don't use glib functions from the children.
  • Don't use multiprocessing to run the children, use exec(3) a script that does what you want, but this isn't as flexible or as convenient.

It may be possible to use the support for different GMainContexts for different threads to work around this, but:

  • You can't access this from Python, and
  • I'm not sure that every library you use will correctly implement it, and so you may still get issues.

Note that none of the parties here are doing anything particularly wrong, it's a bad interaction caused by some decisions that are known to cause issues with concurrency. I also think there are issues when using DBus from multiprocessing children, but I haven't thoroughly investigated that. I'm not entirely sure why the multiprocessing child seems to have to be run from a non-main thread in the parent to trigger this, any insight would be welcome. You can find a small script to reproduce the problem here.

Or, to put it another way, global state bad for concurrency.

Read more

As my contribution to Ada Lovelace Day 2010 I would like to mention Emma Jane Hogbin.

Emma is an Ubuntu Member, published author, documentation evangelist, conference organiser, Drupal theming expert, tireless conference presenter, and many more things as well.

Her enthusiasm is infectious, and her passion for solving problems for people is admirable. She is a constant source of inspiration to me, and that continues even as she branches out in to new things.

(Hat tip for the title to the ever excellent Sharrow)

Read more

Commit access is no more

Many projects that I work on, or follow the development of, and granted there may be a large selection bias here, are showing some of the same tendencies. Combined these indicate to me that we need to change the way we look at becoming a trusted member of the project.

The obvious change here is the move to distributed version control. I'm obviously a fan of this change, and for many reasons. One of those is the democratisation of the tools. There is no longer a special set of people that gets to use the best tools, with everyone else having to make do. Now you get to use the same tools whether you were the founder of the project, or someone working on your first change. That's extremely beneficial as it means that we don't partition our efforts to improve the tools we use. It also means that new contributors have an easier time getting started, as they get to use better tools. These two influences combine as well: a long time contributor can describe how they achieve something, and the new contributor can directly apply it, as they use the same tools.

This change does mean that getting "commit access" isn't about getting the ability to commit anymore; everyone can commit anytime to their own branch. Some projects, e.g. Bazaar, don't even hand out "commit access" in the literal sense, the project blessed code is handled by a robot, you just get the ability to have the robot merge a branch.

While it is true that getting "commit access" was never really about the tools, it was and is about being trusted to shepherd the shared code, a lot of projects still don't treat it that way. Once a developer gets "commit access" they just start committing every half-cooked patch they have to trunk. The full use of distributed version control, with many branches, just emphasises the shared code aspect. Anyone is free to create a branch with their half-baked idea and see if anyone else likes it. The "blessed" branch is just that, one that the project as a whole decides they will collaborate on.

This leads to my second change, code review. This is something that I also deeply believe in; it is vital to quality, and a point at which open source software becomes a massive advantage, so something we should exploit to the full. I see it used increasingly in many projects, and many moving up jml's code review "ladder" towards pre-merge review of every change. There seems to be increasing acceptance that code review is valuable, or at least that it is something a good project does.

Depending on the project the relationship of code review and "commit access" can vary, but at the least, someone with "commit access" can make their code review count. Some projects will not allow even those with "commit access" to act unilaterally, requiring multiple reviews, and some may even relegate the concept, working off votes from whoever is interested in the change.

At the very least, most projects will have code review when a new contributor wishes to make a change. This typically means that when you are granted "commit access" you are able or expected to review other people's code, even though you may never have done so before. Some projects also require every contribution to be reviewed, meaning that "commit access" doesn't grant you the ability to do as you wish, it instead just puts the onus on you to review the code of others as well as write your own.

As code review becomes more prevalent we need to re-examine what we see as "commit access," and how people show that they are ready for it. It may be that the concept becomes "trusted reviewer" or similar, but at the least code review will be a large part of it. Therefore I feel that we shouldn't just be looking at a person's code contributions, but also their code review contributions. Code review is a skill, some people are very good at it, some people are very very bad at it. You can improve with practice and teaching, and you can set bad examples for others if you are not careful. We will have to make sure that review runs through the blood of a project, everyone reviews the code of everyone else, and the reviews are reviewed.

The final change that I see as related is that of empowering non-code contributors. More and more projects are valuing these contributors, and one important part of doing that is trusting them with responsibility. It may be that sometimes trusting them means giving them "commit access", if they are working on improving the inline help for instance. Yes, it may be that distributed version control and code review mean that they do not have to do this, but those arguments could be made for code contributors too.

This leads me to another, and perhaps the most important, aspect of the "commit access" idea: trust. The fundamental, though sometimes unspoken, measure we use to gauge if someone should get "commit access" is whether we believe them to be trustworthy. Do we trust them to introduce code without review? Do we trust them to review other people's changes? Do we trust them to change only those areas they are experts in, or to speak to someone else if they are not? This is the rule we should be applying when making this decision, and we should be sure to be aware that this is what we are doing. There will often be other considerations as well, but this decision will always factor.

These ideas are not new, and the influences described here did not create them. However the confluence of them, and the changes that will likely happen in our projects over the next few years, mean that we must be sure to confront them. We must discard the "commit access" idea as many projects have seen it, and come up with new responsibilities that better reflect the tasks people are doing, the new ways projects operate, and that reward the interactions that make our projects stronger.

Read more