Canonical Voices

Posts tagged with 'python3'

Barry Warsaw


Snappy Ubuntu Core is a new edition of the Ubuntu you know and love, with some interesting new features, including atomic, transactional updates, and a much more lightweight application deployment story than traditional Debian/Ubuntu packaging.  Much of this work grew out of our development of a mobile/touch based version of Ubuntu for phones and tablets, but now Ubuntu Core is available for clouds and devices.

I find the transactional nature of upgrades to be very interesting.  While you still get a perfectly normal Ubuntu system, your root file system is read-only, so traditional apt-get based upgrades don't work.  Instead, your system version is image based; today you are running image 231 and tomorrow a new image is released to get you to 232.  When you upgrade to the new image, you get all the system changes.  We support both full and delta upgrades (the latter which reduces bandwidth), and even phased updates so that we can roll out new upgrades and quickly pull them from the server side if we notice a problem.  Snappy devices even support rolling back upgrades on a single device, by using a dual-partition root file system.  Phones generally don't support this due to lack of available space on the device.

Of course, the other part really interesting thing about Snappy is the lightweight, flexible approach to deploying applications.  I still remember my early days learning how to package software for Debian and Ubuntu, and now that I'm both an Ubuntu Core Developer and Debian Developer, I understand pretty well how to properly package things.  There's still plenty of black art involved, even for relatively easy upstream packages such as distutils/setuptools-based Python package available on the Cheeseshop (er, PyPI).  The Snappy approach on Ubuntu Core is much more lightweight and easy, and it doesn't require the magical approval of the archive elves, or the vagaries of PPAs, to make your applications quickly available to all your users.  There's even a robust online store for publishing your apps.

There's lots more about Snappy apps and Ubuntu Core that I won't cover here, so I encourage you to follow the links for more information.  You might also want to stop now and take the tour of Ubuntu Core (hey, I'm a poet and I didn't even realize it).

In this post, I want to talk about building and deploying snappy Python applications.  Python itself is not an officially supported development framework, but we have a secret weapon.  The system image client upgrader -- i.e. the component on the devices that checks for, verifies, downloads, and applies atomic updates -- is written in Python 3.  So the core system provides us with a full-featured Python 3 environment we can utilize.

The question that came to mind is this: given a command-line application available on PyPI, how easy is it to turn into a snap and install it on an Ubuntu Core system?  With some caveats I'll explore later, it's actually pretty easy!

Basic approach

The basic idea is this: let's take a package on PyPI, which may have additional dependencies also on PyPI, download them locally, and build them into a snap that we can install on an Ubuntu Core system.

The first question is, how do we build a local version of a fully-contained Python application?  My initial thought was to build a virtual environment using virtualenv or pyvenv, and then somehow turn that virtual environment into a snap.  This turns out to be difficult in practice because virtual environments aren't really designed for this.  They have issues with being relocated for example, and they can contain a lot of extraneous stuff that's great for development (virtual environment's actual purpose ) but unnecessary baggage for our use case.

My second thought involved turning a Python application into a single file executable, and from there it would be fairly easy to snappify.  Python has a long tradition of such tools, many with varying degrees of cross platform portability and standalone-ishness.  After looking again at some oldies but goodies (e.g. cx_freeze) and some new offerings, I decided to start with pex.

pex is a nice tool developed by Brian Wickman and the Twitter folks which they use to deploy Python applications to their production environment.  pex takes advantage of modern Python's support for zip imports, and a clever trick of zip files.

Python supports direct imports (of pure Python modules) from zip files, and the python executable's -m option works even when the module is inside a zip file.  Further, the presence of a file within a package can be used as shorthand for executing the package, e.g. python -m myapp will run myapp/ if it exists.

Zip files are interesting because their index is at the end of the file.  This allows you to put whatever you want at the front of the file and it will still be considered a zip file.  pex exploits this by putting a shebang in the first line of the file, e.g. #!/usr/bin/python3 and thus the entire zip file becomes a single file executable of Python code.

There are of course, plenty of caveats.  Probably the main one is that Python cannot import extension modules directly from the zip, because the dlopen() function call only takes a file system path.  pex handles this by marking the resulting file as not zip safe, so the zip is written out to a temporary directory first.

The other issue of course, is that the zip file must contain all the dependencies not present in the base Python.  pex is actually fairly smart here, in that it will chase dependencies, much like pip and it will include those dependencies in the zip file.  You can also specify any missed dependencies explicitly on the pex command line.

Once we have the pex file, we need to add the required snappy metadata and configuration files, and run the snappy command to generate the .snap file, which can then be installed into Ubuntu Core.  Since we can extract almost all of the minimal required snappy metadata from the Python package metadata, we only need just a little input from the user, and the rest of work can be automated.

We're also going to avail ourselves of a convenient cheat.  Because Python 3 and its standard library are already part of Ubuntu Core on a snappy device, we don't need to worry about any of those dependencies.  We're only going to support Python 3, so we get its full stdlib for free.  If we needed access to Python 2, or any external libraries or add-ons that can't be made part of the zip file, we would need to create a snappy framework for that, and then utilize that framework for our snappy app.  That's outside the scope of this article though.


To build Python snaps, you'll need to have a few things installed.  If you're using Ubuntu 15.04, just apt-get install the appropriate packages.  Otherwise, you can get any additional Python requirements by building a virtual environment and installing tools like pex and wheel into their, then invoking pex from that virtual environment.  But let's assume you have the Vivid Vervet (Ubuntu 15.04); here are the packages you need:
  •  python3
  •  python-pex-cli
  •  python3-wheel
  •  snappy-tools
  •  git
You'll also want a local git clone of which provides a convenient script called for automating the building of Python snaps.  We'll refer to this script extensively in the discussion below.

For extra credit, you might want to get a copy of Python 3.5 (unreleased as of this writing).  I'll show you how to do some interesting debugging with Python 3.5 later on.

From PyPI to snap in one easy step

Let's start with a simple example: world is a very simple script that can provide forward and reverse mappings of ISO 3166 two letter country codes (at least as of before ISO once again paywalled the database).  So if you get an email from you can find out where the BDFL has his secret lair:

$ world py
py originates from PARAGUAY

world is a pure-Python package with both a library and a command line interface. To get started with the script mentioned above, you need to create a minimal .ini file, such as:

name: world

verbose: true

Let's call this file world.ini.  (In fact, you'll find this very file under the examples directory in the snap git repository.)  What do the various sections and variables control?
  •  name is the name of the project on PyPI.  It's used to look up metadata about the project on PyPI via PyPI's JSON API.
  •  verbose variable just defines whether to pass -v to the underlying pex command.
Now, to create the snap, just run:

$ ./ examples/world.ini

You'll see a few progress messages and a warning which you can ignore.  Then out spits a file called world_3.1.1_all.snap.  Because this is pure Python, it's architecture independent.  That's a good thing because the snap will run on any device, such as a local amd64 kvm instance, or an ARM-based Ubuntu Core-compatible Lava Lamp.

Armed with this new snap, we can just install it on our device (in this case, a local kvm instance) and then run it:

$ snappy-remote --url=ssh://localhost:8022 install world_3.1.1_all.snap
$ ssh -p 8022 ubuntu@localhost
ubuntu@localhost:~$ py
py originates from PARAGUAY

From git repository to snap in one easy step

Let's look at another example, this time using a stupid project that contains an extension module. This aptly named package just prints a yes for every -y argument, and no for every -n argument.

The difference here is that stupid isn't on PyPI; it's only available via git.  The helper is smart enough to know how to build snaps from git repositories.  Here's what the stupid.ini file looks like:

name: stupid
origin: git

verbose: yes

Notice that there's a [project]origin variable.  This just says that the origin of the package isn't PyPI, but instead a git repository, and then the public repo url is given.  The first word is just an arbitrary protocol tag; we could eventually extend this to handle other version control systems or origin types.  For now, only git is supported.

To build this snap:

$ ./ examples/stupid.ini

This clones the repository into a temporary directory, builds the Python package into a wheel, and stores that wheel in a local directory.  pex has the ability to build its pex file from local wheels without hitting PyPI, which we use here.  Out spits a file called stupid_1.1a1_all.snap, which we can install in the kvm instance using the snappy-remote command as above, and then run it after ssh'ing in:

ubuntu@localhost:~$ stupid.stupid -ynnyn

Watch out though, because this snap is really not architecture-independent. It contains an extension module which is compiled on the host platform, so it is not portable to different architectures.  It works on my local kvm instance, but sadly not on my Lava Lamp.

Entry points

pex currently requires you to explicitly name the entry point of your Python application.  This is the function which serves as your main and it's what runs by default when the pex zip file is executed.

Usually, a Python package will define its entry point in its file, like so:

        'console_scripts': ['stupid = stupid.__main__:main'],

And if you have a copy of the package, you can run a command to generate the various package metadata files:

$ python3 egg_info

If you look in the resulting stupid.egg_info/entry_points.txt file, you see the entry point clearly defined there.  Ideally, either pex or would just figure this out explicitly.  As it turns out, there's already a feature request open on pex for this, but in the meantime, how can we auto-detect the entry point?

For the stupid example, it's pretty easy.  Once we've cloned its git repository, we just run the egg_info command and read the entry_points.txt file.  Later, we can build the project's binary wheel from the same git clone.

It's a bit more problematic with world though because the package isn't downloaded from PyPI until pex runs, but the pex command line requires that you specify the entry point before the download occurs.

We can handle this by supporting an entry_point variable in the snap's .ini file.  For example, here's the world.ini file with an explicit entry point setting:

name: world
entry_point: worldlib.__main__:main

verbose: true

What if we still wanted to auto-detect the entry point?  We could of course, download the world package in and run the egg-info command over that.  But pex also wants to download world and we don't want to have to download it twice.  Maybe we could download it in and then build a local wheel file for pex to consume.

As it turns out there's an easier way.

Unfortunately, package egg-info metadata is not availble on PyPI, although arguably it should be.  Fortunately, Vinay Sajip runs an external service that does make the metadata available, such as the metadata for world. makes the entry_point variable optional, and if it's missing, it will grab the package metadata from a link like that given above.  An error will be thrown if the file can't be found, in which case, for now, you'd just add the [project]entry_point variable to the .ini file.

A little more detail

The script is more or less a pure convenience wrapper around several independent tools.  pex of course for creating the single executable zip file, but also the snappy command for building the .snap file.  It also utilizes python3 egg_info where possible to extract metadata and construct the snappy facade needed for the snappy build command.  Less typing for you!  In the case of a snap built from a git repository, it also performs the git cloning, and the python3 bdist_wheel command to create the wheel file that pex will consume.

There's one other important thing does: it fixes the resulting pex file's shebang line.  Because we're running these snaps on an Ubuntu Core system, we know that Python 3 will be available in /usr/bin/python3.  We want the pex file's shebang line to be exactly this.  While pex supports a --python option to specify the interpreter, it doesn't take the value literally.  Instead, it takes the last path component and passes it to /usr/bin/env so you end up with a shebang line like:

#!/usr/bin/env python3

That might work, but we don't want the pex file to be subject to the uncertainties of the $PATH environment variable.

One of the things that does is repack the pex file.  Remember, it's just a zip file with some magic at the top (that magic is the shebang), so we just read the file that pex spits out, and rewrite it with the shebang we want.  Eventually, pex itself will handle this and we won't need to do that anymore.


While I was working out the code and techniques for this blog post, I ran into an interesting problem.  The world script would crash with some odd tracebacks.  I don't have the details anymore and they'd be superfluous, but suffice to say that the tracebacks really didn't help in figuring out the problem.  It would work in a local virtual environment build of world using either the (pip installed) PyPI package or run from the upstream git repository, but once the snap was installed in my kvm instance, it would traceback.  I didn't know if this was a bug in world, in the snap I built, or in the Ubuntu Core environment.  How could I figure that out?

Of course, the go to tool for debugging any Python problem is pdb.  I'll just assume you already know this.  If not, stop everything and go learn how to use the debugger.

Okay, but how was I going to get a pdb breakpoint into my snap?  This is where Python 3.5 comes in!

PEP 441, which has already been accepted and implemented in what will be Python 3.5, aims to improve support for zip applications.  Apropos this blog post, the new zipapp module can be used to zip up a directory into single executable file, with an argument to specify the shebang line, and a few other options.  It's related to what pex does, but without all the PyPI interactions and dependency chasing.  Here's how we can use it to debug a pex file.

Let's ignore snappy for the moment and just create a pex of the world application:

$ pex -r world -o world.pex -e worldlib.__main__:main
Now let's say we want to set a pdb breakpoint in the main() function so that we can debug the program, even when it's a single executable file.  We start by unzipping the pex:
$ mkdir world
$ cd world
$ unzip ../world.pex
If you poke around, you'll notice a file in the current directory.  This is pex's own main entry point.  There are also two hidden directories, .bootstrap and .deps.  The former is more pex scaffolding, but inside the latter you'll see the unpacked wheel directories for world and its single dependency.

Drilling down a little farther, you'll see that inside the world wheel is the full source code for world itself.  Set a break point by visiting .deps/world-3.1.1-py2.py3-none-any.whl/worldlib/ in your editor.  Find the main() function and put this right after the def line:

import pdb; pdb.set_trace()

Save your changes and exit your editor.

At this point, you'll want to have Python 3.5 installed or available.  Let's assume that by the time you read this, Python 3.5 has been released and is the default Python 3 on your system.  If not, you can always download a pre-release of the source code, or just build Python 3.5 from its Mercurial repository.  I'll wait while you do this...

...and we're back!  Okay, now armed with Python 3.5, and still inside the world subdirectory you created above, just do this:

$ python3.5 -m zipapp . -p /usr/bin/python3 -o ../world.dbg

Now, before you can run ../world.dbg and watch the break point do its thing, you need to delete pex's own local cache, otherwise pex will execute the world dependency out of its cache, which won't have the break point set. This is a wart that might be worth reporting and fixing in pex itself.  For now:

$ rm -rf ~/.pex
$ ../world.dbg

And now you should be dropped into pdb almost immediately.

If you wanted to build this debugging pex into a snap, just use the snappy build command directly.  You'll need to add the minimal metadata yourself (since currently doesn't preserve it).  See the Snappy developer documentation for more details.

Summary and Caveats

There's a lot of interesting technology here; pex for building single file executables of Python applications, and Snappy Ubuntu Core for atomic, transactional system updates and lightweight application deployment to the cloud and things.  These allow you to get started doing some basic deployments of Python applications.  No doubt there are lots of loose ends to clean up, and caveats to be aware of.  Here are some known ones:

  • All of the above only works with Python 3.  I think that's a feature, but you might disagree. ;)   This works on Ubuntu Core for free because Python 3 is an essential piece of the base image.  Working out how to deploy Python 2 as a Snappy framework would be an interesting exercise.
  • When we build a snap from a git repository for an application that isn't on PyPI, I don't currently have a way to also grab some dependencies from PyPI.  The stupid example shown here doesn't have any additional dependencies so it wasn't a problem.  Fixing this should be a fairly simple matter of engineering on the wrapper (pull requests welcome!)
  • We don't really have a great story for cross-compilation of extension modules. Solving this is probably a fairly complex initiative involving the distros, setuptools and other packaging tools, and upstream Python.  For now, your best bet might be to actually build the snap on the actual target hardware.
  • Importing extension modules requires a file system cache because of limitations in the dlopen() API.  There have been rumors of extensions to glibc which would provide a dlopen()-from-memory type of API which could solve this, or upstream Python's zip support may want to grow native support for caching.
Even with these caveats, it's pretty easy to turn a Python application into a Snappy Ubuntu Core application, publish it to the world, and profit!  So what are you waiting for?  Snap to it!

Read more
Barry Warsaw

I'm writing a bunch of new code these days for Ubuntu Touch's Image Based Upgrade system.  Think of it essentially as Ubuntu Touch's version of upgrading the phone/tablet (affectionately called phablet) operating system in a bulk way rather than piecemeal apt-gets the way you do it on a traditional Ubuntu desktop or server.  One of the key differences is that a phone has to detour through a reboot in order to apply an upgrade since its Ubuntu root file system is mounted read-only during the user session.

Anyway, those details aren't the focus of this article.  Instead, just realize that because it's a pile of new code, and because we want to rid ourselves of Python 2, at least on the phablet image if not everywhere else in Ubuntu, I am prototyping all this in Python 3, and specifically 3.3.  This means that I can use all the latest and greatest cool stuff in the most recent stable Python release.  And man, is there a lot of cool stuff!

One module in particular that I'm especially fond of is contextlibContext managers are objects implementing the protocol behind the with statement, and they are typically used to guarantee that some resource is cleaned up properly, even in the event of error conditions.  When you see code like this:

with open(somefile) as fp:
    data =

you are invoking a context manager.  Python was clever enough to make file objects support the context manager protocol so that you never have to explicitly close the file; that happens automatically when the with statement completes, regardless of whether the code inside the with statement succeeds or raises an exception.

It's also very easy to define your own context managers to properly handle other kinds of resources.  I won't go into too much detail here, because this is all well-established; the with statement has been, er, with us since Python 2.5.

You may be familiar with the contextlib module because of the @contextmanager decorator it provides.  This makes it trivial to define a new context manager without having to deal with all the intricacies of the protocol.  For example, here's how you would implement a context manager that temporarily changes the current working directory:

import os
from contextlib import contextmanager

def chdir(dir):
    cwd = os.getcwd()

In this example, the yield cedes control back to the body of the with statement, and when that completes, the code after the yield is executed.  Because the yield is wrapped inside a try/finally, it is guaranteed that the original working directory is restored.  You would use this code like so:

with chdir('/tmp'):

So far, so good, but this is nothing revolutionary.  Python 3.3 brings additional awesomeness to contextlib by way of the new ExitStack class.

The documentation for ExitStack is a bit dense, and even the examples didn't originally make it clear to me how amazing this new API is.  In my opinion, this is so powerful, it changes completely the way you think about deploying safe code.

So what is an ExitStack?  One way to think about it is as an extensible context manager.  It's used in with statements just like any other context manager:

from contextlib import ExitStack
with ExitStack() as stack:
    # do some magical stuff

Just like any other context manager, the ExitStack's "exit" code is guaranteed to be run at the end of the with statement.  It's the programmable extensibility of the ExitStack where the cool stuff happens.

The first interesting method of an ExitStack you might use is the callback() method.  Let's say for example that in your with statement, you are creating a temporary directory and you want to make sure that temporary directory gets deleted when the with statement exits.  You could do something like this:

import shutil, tempfile
with ExitStack() as stack:
    tempdir = tempfile.mkdtemp()
    stack.callback(shutil.rmtree, tempdir)

Now, when the with statement completes, it calls all of its callbacks, which includes removing the temporary directory.

So, what's the big deal?  Let's say you're actually creating three temporary directories and any of those calls could fail.  To guarantee that all successfully created directories are deleted at the end of the with statement, regardless of whether an exception occurred in the middle, you could do this:

with ExitStack() as stack:
    tempdirs = []
    for i in range(3):
        tempdir = tempfile.mkdtemp()
        stack.callback(shutil.rmtree, tempdir)
    # Do something with the tempdirs

If you knew statically that you wanted three temporary directories, you could set this up with nested with statements, or a single with statement containing multiple backslash-separated targets, but that gets unwieldy very quickly.  And besides, that's impossible if you only know the number of directories you need dynamically at run time.  On the other hand, the ExitStack makes it easy to guarantee everything gets cleaned up and there are no leaks.

That's powerful enough, but it's not all you can do!  Another very useful method is enter_context().

Let's say that you are opening a bunch of files and you want the following behavior: if all of the files open successfully, you want to do something with them, but if any of them fail to open, you want to make sure that the ones that did get open are guaranteed to get closed.  Using ExitStack.enter_context() you can write code like this:

files = []
with ExitStack() as stack:
    for filename in filenames:
        # Open the file and automatically add its context manager to the stack.
        # enter_context() returns the passed in context manager, i.e. the 
        # file object.
        fp = stack.enter_context(open(filename))
    # Capture the close method, but do not call it yet.
    close_all_files = stack.pop_all().close

(Note that the contextlib documentation contains a more efficient, but denser way of writing the same thing.)

So what's going on here?  First, the open(filename) does what it always does of course, it opens the file and returns a file object, which is also a context manager.  However, instead of using that file object in a with statement, we add it to the ExitStack by passing it to the enter_context() method.  For convenience, this method returns the passed in object.

So what happens if one of the open() calls fail before the loop completes?  The with statement will exit as normal and the ExitStack will exit all the context managers it knows about.  In other words, all the files that were successfully opened will get closed.  Thus, in an error condition, you will be left with no open files and no leaked file descriptors, etc.

What happens if the loop completes and all files got opened successfully?  Ah, that's where the next bit of goodness comes into play: the ExitStack's pop_all() method.

pop_all() creates a new ExitStack, and populates it from the original ExitStack, removing all the context managers from the original ExitStack.  So, after stack.pop_all() completes, the original ExitStack, i.e. the one used in the with statement, is now empty.  When the with statement exits, the original ExitStack contains no context managers so none of the files are closed.

Well, then, how do you close all the files once you're done with them?  That's the last bit of magic.  ExitStacks have a .close() method which unwinds all the registered context managers and callbacks and invokes their exit functionality.  So, after you're finally done with all the files and you want to clean everything up, you would just do:


And that's it.

Hopefully that all makes sense.  I know it took a while to sink in for me, but now that it has, it's clear the enormous power this gives you.  You can write much safer code, in the sense that it's easier to ensure much better guarantees that your resources are cleaned up at the right time.

The real power comes when you have many different disparate resources to clean up for a particular operation.  For example, in the test suite for the Image Based Upgrader, I have a test where I need to create a temporary directory and start an HTTP server in a thread.  Roughly, my code looks like this:

def setUpClass(cls):
    cls._cleaner = ExitStack()
        cls._serverdir = tempfile.mkdtemp()
        cls._cleaner.callback(shutil.rmtree, cls._serverdir)
        # ...
        cls._stop = make_http_server(cls._serverdir)

def tearDownClass(cls):

Notice there's no with statement there at all. :)   This is because the resources must remain open until tearDownClass() is called, unless some exception occurs during the setUpClass().  If that happens, the bare except will ensure that all the context managers are properly closed, leaving the original ExitStack empty.  (The bare except is acceptable here because the exception is re-raised after the resources are cleaned up.)  Even though the exception will prevent the tearDownClass() from being called, it's still safe to do so in case it is called for some odd reason, because the original ExitStack is empty.

But if no exception occurs, the original ExitStack will contain all the context managers that need to be closed, and calling .close() on it in the tearDownClass() does exactly that.

I have one more example from my recent code.  Here, I need to create a GPG context (the details are unimportant), and then use that context to verify the detached signature of a file.  If the signature matches, then everything's good, but if not, then I want to raise an exception and throw away both the data file and the signature (i.e. .asc) file.  Here's the code:

with ExitStack() as stack:
    ctx = stack.enter_context(Context(pubkey_path))
    if not ctx.verify(asc_path, channels_path):
        # The signature did not verify, so arrange for the .json and .asc
        # files to be removed before we raise the exception.
        stack.callback(os.remove, channels_path)
        stack.callback(os.remove, asc_path)
        raise FileNotFoundError

Here we create the GPG context, which itself is a context manager, but instead of using it in a with statement, we add it to the ExitStack.  Then we verify the detached signature (asc_path) of a data file (channels_path), and only arrange to remove those files if the verification fails.  When the FileNotFoundError is raised, the ExitStack in the with statement unwinds, removing both files and closing the GPG context.  Of course, if the signature matches, only the GPG context is closed -- the channels_path and asc_path files are not removed.

You can see how an ExitStack actually functions as a fairly generic resource manager!

To me, this revolutionizes the management of external resources.  The new ExitStack object, and the methods and semantics it exposes, make it so much easier to manage those resources, guaranteeing that they get cleaned up at the right time, once and only once, regardless of whether errors occur or not.

ExitStack takes the already powerful concept of context managers and turns it up to 11.  There's more you can do, and it's worth spending some time reading the contextlib documentation in Python 3.3, especially the examples and recipes.

As I mentioned on Twitter, it's features like this that make using Python 2 seem downright barbaric.

Read more
Barry Warsaw

UDS Update #1 - OAuth

For UDS-R for Raring (i.e. Ubuntu 13.04) in Copenhagen, I sponsored three blueprints.  These blueprints represent most of the work I will be doing for the next 6 months, as we're well on our way to the next LTS, Ubuntu 14.04.

I'll provide some updates to the other blueprints later, but for now, I want to talk about OAuth and Python 3.  OAuth is a protocol which allows you to programmatically interact with certain website APIs, in an authenticated manner, without having to provide your website password.  Essentially, it allows you to generate an authorization token which you can use instead, and it allows you to manage and share these tokens with applications, so that you can revoke them if you want, or decide how and which applications to trust to act on your behalf.

A good example of a site that uses OAuth is Launchpad, but many other sites also support OAuth, such as Twitter and Facebook.

There are actually two versions of OAuth out there.  OAuth version 1 is definitely the more prevelent, since it has been around for years, is relatively simple (at least on the client side), and enshrined in RFC 5849.  There are tons of libraries available that support OAuth v1, in a multitude of languages, with Python being no exception.

OAuth v2 is much less common, since it is currently only a draft specification, and has had its share of design-by-committee controversy.  Still, some sites such as Facebook do require OAuth v2.

One of the very earliest Python libraries to support OAuth v1, on both the client and server side, was python-oauth (I'll use the Debian package names in this post), and on the Ubuntu desktop, you'll find lots of scripts and libraries that use python-oauth.  There are major problems with this library though, and I highly recommend not using it.  The biggest problems are that the code is abandoned by its upstream maintainer (it hasn't be updated on PyPI since 2009), and it is not Python 3 compatible.  Because the OAuth v2 draft came after this library was abandoned, it provides no support for the successor specification.

For this reason, one of the blueprints I sponsored was specifically to survey the alternatives available for Python programmers, and make a decision about which one we would officially endorse for Ubuntu.  By "official endorsement" I mean promote the library to other Python programmers (hence this post!) and to port all of our desktop scripts from python-oauth to the agreed upon library.

After some discussion, it was unanimous by the attendees of the UDS session (both in-person and remotely), to choose the python-oauthlib as our preferred library.

python-oauthlib has a lot going for it.  It's Python 3 compatible, has an active upstream maintainer, supports both RFC 5849 for v1, and closely follows the draft for v2.  It's a well-tested, solid library, and it is available in Ubuntu for both Python 2 and Python 3.  Probably the only negative is that the library does not provide any support for the server side.  This is not a major problem for our immediate plans, since there aren't any server applications on the Ubuntu desktop requiring OAuth.  Eventually, yes, we'll need server side support, but we can punt on that recommendation for now.

Another cool thing about python-oauthlib is that it has been adopted by the python-requests library, meaning, if you want to use a modern replacement for the urllib2/httplib2 circus which supports OAuth out of the box, you can just use python-requests, provide the appropriate parameters, and you get request signing for free.

So, as you'll see from the blueprint, there are several bugs linked to packages which need porting to python-oauthlib for Ubuntu 13.04, and I am actively working on them, though contributions, as always, are welcome!  I thought I'd include a little bit of code to show you how you might port from python-oauth to python-oauthlib.  We'll stick with OAuth v1 in this discussion.

The first thing to recognize is that python-oauth uses different, older terminology that predates the RFC.  Thus, you'll see references to a token key and token secret, as well as a consumer key and consumer secret.  In the RFC, and in python-oauthlib, these terms are client key, client secret, resource owner key, and resource owner secret respectively.  After you get over that hump, the rest pretty much falls into place.  As an example, here is a code snippet from the piston-mini-client library which used the old python-oauth library:

class OAuthAuthorizer(object):
    """Authenticate to OAuth protected APIs."""
    def __init__(self, token_key, token_secret, consumer_key, consumer_secret,
        """Initialize a ``OAuthAuthorizer``.

        ``token_key``, ``token_secret``, ``consumer_key`` and
        ``consumer_secret`` are required for signing OAuth requests.  The
        ``oauth_realm`` to use is optional.
        self.token_key = token_key
        self.token_secret = token_secret
        self.consumer_key = consumer_key
        self.consumer_secret = consumer_secret
        self.oauth_realm = oauth_realm

    def sign_request(self, url, method, body, headers):
        """Sign a request with OAuth credentials."""
        # Import oauth here so that you don't need it if you're not going
        # to use it.  Plan B: move this out into a separate oauth module.
        from oauth.oauth import (OAuthRequest, OAuthConsumer, OAuthToken,
        consumer = OAuthConsumer(self.consumer_key, self.consumer_secret)
        token = OAuthToken(self.token_key, self.token_secret)
        oauth_request = OAuthRequest.from_consumer_and_token(
            consumer, token, http_url=url)
                                   consumer, token)

The constructor is pretty simple, and it uses the old OAuth terminology.  The key thing to notice is the way the old API required you to create a consumer, a token, and then a request object, then ask the request object to sign the request.  On top of all the other disadvantages, this isn't a very convenient API.  Let's look at the snippet after conversion to python-oauthlib.

class OAuthAuthorizer(object):
    """Authenticate to OAuth protected APIs."""
    def __init__(self, token_key, token_secret, consumer_key, consumer_secret,
        """Initialize a ``OAuthAuthorizer``.

        ``token_key``, ``token_secret``, ``consumer_key`` and
        ``consumer_secret`` are required for signing OAuth requests.  The
        ``oauth_realm`` to use is optional.
        # 2012-11-19 BAW: python-oauthlib requires unicodes for its tokens and
        # secrets.  Assume utf-8 values.
        self.token_key = _unicodeify(token_key)
        self.token_secret = _unicodeify(token_secret)
        self.consumer_key = _unicodeify(consumer_key)
        self.consumer_secret = _unicodeify(consumer_secret)
        self.oauth_realm = oauth_realm

    def sign_request(self, url, method, body, headers):
        """Sign a request with OAuth credentials."""
        # 2012-11-19 BAW: In order to preserve API backward compatibility,
        # convert empty string body to None.  The old python-oauth library
        # would treat the empty string as "no body", but python-oauthlib
        # requires None.
        if not body:
            body = None
        # Import oauthlib here so that you don't need it if you're not going
        # to use it.  Plan B: move this out into a separate oauth module.
        from oauthlib.oauth1 import Client, SIGNATURE_PLAINTEXT
        oauth_client = Client(self.consumer_key, self.consumer_secret,
                              self.token_key, self.token_secret,
        uri, signed_headers, body = oauth_client.sign(
            url, method, body, headers)

See how much nicer this is?  You need only create a client object, essentially using all the same bits of information.  Then you ask the client to sign the request, and update the request headers with the signature.  Much easier.

Two important things to note.  If you are doing an HTTP GET, there is no request body, and thus no request content which needs to contribute to the signature.  In python-oauth, you could specify an empty body by using either None or the empty string.  piston-mini-client uses the latter, and this is embodied in its public API.  python-oauthlib however, treats the empty string as a body being present, so it would require the Content-Type header to be set even for an HTTP GET which has no content (i.e. no body).  This is why the replacement code checks for an empty string being passed in (actually, any false-ish value), and coerces that to None.

The second issue is that python-oauthlib requires the keys and secrets to be Unicode objects; they cannot be bytes objects.  In code ported straight from Python 2 however, these values are usually 8-bit strings, and so become bytes objects in Python 3.  python-oauthlib will raise a ValueError during signing if any of these are bytes objects.  Thus the use of the _unicodeify() function to decode these values to unicodes.

def _unicodeify(s):
    if isinstance(s, bytes):
        return s.decode('utf-8')
    return s

The above works in both Python 2 and Python 3.  Of course, we don't know for sure that the bytes values are UTF-8, but it's the only sane encoding to expect, and if a client of piston-mini-client were to be so insane as to use an incompatible encoding (US-ASCII is fine because it's compatible with UTF-8), it would be up to the client to just pass in unicodes in the first place.  At the time of this writing, this is under active discussion with upstream, but for now, it's not too difficult to work around.

Anyway, I hope this helps, and I encourage you to help increase the popularity of python-oauthlib on the Cheeseshop, so that we can one day finally kill off the long defunct python-oauth library.

Read more
Barry Warsaw

Recently, as part of our push to ship only Python 3 on the Ubuntu 12.10 desktop, I've helped several projects update their internationalization (i18n) support.  I've seen lots of instances of suboptimal Python 2 i18n code, which leads to liberal sprinkling of cargo culted .decode() and .encode() calls simply to avoid the dreaded UnicodeErrors.  These get worse when the application or library is ported to Python 3 because then even the workarounds aren't enough to prevent nasty failures in non-ASCII environments (i.e. the non-English speaking world majority :).

Let's be honest though, the problem is not because these developers are crappy coders! In fact, far from it, the folks I've talked with are really really smart, experienced Pythonistas.  The fundamental problem is Python 2's 8-bit string type which doubles as a bytes type, and the terrible API of the built-in Python 2 gettext module, which does its utmost to sabotage your Python 2 i18n programs.  I take considerable blame for the latter, since I wrote the original version of that module.  At the time, I really didn't understand unicodes (this is probably also evident in the mess I made of the email package).  Oh, to really have access to Guido's time machine.

The good news is that we now know how to do i18n right, especially in a bilingual Python 2/3 world, and the Python 3 gettext module fixes the most egregious problems in the Python 2 version.  Hopefully this article does some measure of making up for my past sins.

Stop right here and go watch Ned Batchelder's talk from PyCon 2012 entitled Pragmatic Unicode, or How Do I Stop the Pain?  It's the single best description of the background and effective use of Unicode in Python you'll ever see.  Ned does a brilliant job of resolving all the FUD.


Welcome back.  Your Python application is multi-language friendly, right?  I mean, I'm as functionally monolinguistic as most Americans, but I love the diversity of languages we have in the world, and appreciate that people really want to use their desktop and applications in their native language.  Fortunately, once you know the tricks it's not that hard to write good i18n'd Python code, and there are many good FLOSS tools available for helping volunteers translate your application, such as Pootle, Launchpad translations, Translatewiki, Transifex, and Zanata.

So there really is no excuse not to i18n your Python application.  In fact, GNU Mailman has been i18n'd for many years, and pioneered the supporting code in Python's standard library, namely the gettext module.  As part of the Mailman 3 effort, I've also written a higher level library called flufl.i18n which makes it even easier to i18n your application, even in tricky multi-language contexts such as server programs, where you might need to get a German translation and a French translation in one operation, then turn around and get Japanese, Italian, and English for the next operation.

In one recent case, my colleague was having a problem with a simple command line program.  What's common about these types of applications is that you fire them up once, they run to completion then exit, and they only have to deal with one language during the entire execution of the program, specifically the language defined in the user's locale.  If you read the gettext module's documentation, you'd be inclined to do this at the very start of your application:

from gettext import gettext as _

then, you'd wrap translatable strings in code like this:

print _('Here is something I want to tell you')

What gettext does is look up the source string (i.e. the argument to the underscore function) in a translation catalog, returning the text in the appropriate language, which will then be printed.  There are some additional details regarding i18n that I won't go into here.  If you're curious, ask in the comments, and I'll try to fill things in.

Anyway, if you do write the above code, you'll be in for a heap of trouble, as my colleague soon found out.  Just running his program with --help in a French locale, he was getting the dreaded UnicodeEncodeError:

"UnicodeEncodeError: 'ascii' codec can't encode character"

I've also seen reports of such errors when trying to send translated strings to a log file (a practice which I generally discourage, since I think log messages usually shouldn't be translated).  In any case, I'm here to tell you why the above "obvious" code is wrong, and what you should do instead.

First, why is that code wrong, and why does it lead to the UnicodeEncodeErrors?  What might not be obvious from the Python 2 gettext documentation is that gettext.gettext() always returns 8-bit strings (a.k.a. byte strings in Python 3 terminology), and these 8-bit strings are encoded with the charset defined in the language's catalog file.

It's always best practice in Python to deal with human readable text using unicodes.  This is traditionally more problematic in Python 2, where English programs can cheat and use 8-bit strings and usually not crash, since their character range is compatible with ASCII and you only ever print to English locales.  As soon as your French friend uses your program though, you're probably going to run into trouble.  By using unicodes everywhere, you can generally avoid such problems, and in fact it will make your life much easier when you eventually switch to Python 3.

So the 8-bit strings that gettext.gettext() hands you have already sunk you, and to avoid the pain, you'd want to convert them back to unicodes before you use them in any way.  However, converting to unicodes makes the i18n APIs much less convenient, so no one does it until there's way too much broken code to fix.

What you really want in Python 2 is something like this:

from gettext import ugettext as _

which you'd think you should be able to do, the "u" prefix meaning "give me unicode".  But for reasons I can only describe as based on our misunderstandings of unicode and i18n at the time, you can't actually do that, because ugettext() is not exposed as a module-level function.  It is available in the class-based API, but that's a more advanced API that again almost no one uses.  Sadly, it's too late to fix this in Python 2.  The good news is that in Python 3 it is fixed, not by exposing ugettext(), but by changing the most commonly used gettext module APIs to return unicode strings directly, as it always should have done.  In Python 3, the obvious code just works:

from gettext import gettext as _

What can you do in Python 2 then?  Here's what you should use instead of the two lines of code at the beginning of this article:

_ = gettext.translation(my_program_name).ugettext

and now you can wrap all your translatable strings in _('Foo') and it should Just Work.

Perhaps more usefully, you can use the gettext.install() function to put _() into the built-in namespace, so that all your other code can just use that function without doing anything special.  Again, though we have to work around the boneheaded Python 2 API.  Here's how to write code which works correctly in both Python 2 and Python 3.

import sys, gettext
kwargs = {}
if sys.version_info[0] < 3:
    # In Python 2, ensure that the _() that gets installed into built-ins
    # always returns unicodes.  This matches the default behavior under Python
    # 3, although that keyword argument is not present in the Python 3 API.
    kwargs['unicode'] = True
gettext.install(my_program_name, **kwargs)

Or you can use the flufl.i18n API, which always uses returns unicode strings in both Python 2 and Python 3.

Also interesting was that I could never reproduce the crash when ssh'd into the French locale VM. It would only crash for me when I was logged into a terminal on the VM's graphical desktop.  The only difference between the two that I could tell was that in the desktop's terminal, locale(8) returned French values (e.g. fr_FR.UTF-8) for everything, but in the ssh console, it returned the French values for everything except the LC_CTYPE environment variable.  For the life of me, I could not get LC_CTYPE set to anything other than en_US.UTF-8 in the ssh context, so the reproducible test case would just return the English text, and not crash.  This happened even if I explicitly set that environment variable either as a separate export command in the shell, or as a prefix to the normally crashing command.  Maybe there's something in ssh that causes this, but I couldn't find it.

One last thing.  It's important to understand that Python's gettext module only handles Python strings, and other subsystems may be involved.  The classic example is GObject Introspection, the newest and recommended interface to the GNOME Object system.  If your Python-GI based project needs to translate strings too (e.g. in menus or other UI elements), you'll have to use both the gettext API for your Python strings, and set the locale for the C-based bits using locale.setlocale().  This is because Python's API does not set the locale automatically, and Python-GI exposes no other way to control the language it uses for translations.

Read more
Barry Warsaw

So, now all the world now knows that my suggested code name for Ubuntu 12.10, Qwazy Quahog, was not chosen by Mark.  Oh well, maybe I'll have more luck with Racy Roadrunner.

In any case, Ubuntu 12.04 LTS is to be released any day now so it's time for my semi-annual report on Python plans for Ubuntu.  I seem to write about this every cycle, so 12.10 is no exception.  We've made some fantastic progress, but now it's time to get serious.

For Ubuntu 12.10, we've made it a release goal to have Python 3 only on the desktop CD images.  The usual caveats apply: Python 2.7 isn't going away; it will still probably always be available in the main archive.  This release goal also doesn't affect other installation CD images, such as server, or other Ubuntu flavors.  The relatively modest goal then only affects packages for the standard desktop CD images, i.e. the alternative installation CD and the live CD.

Update 20120425: To be crystal clear,  if you depend on Python 2.7, the only thing that changes for you is that after a fresh install from the desktop CD on a new machine, you'll have to explicitly apt-get install python2.7.  After that, everything else will be the same.

This is ostensibly an effort to port a significant chunk of Ubuntu to Python 3, but it really is a much wider, Python-community driven effort.  Ubuntu has its priorities, but I personally want to see a world where Python 3 rules the day, and we can finally start scoffing at Python 2 :).

Still, that leaves us with about 145 binary packages (and many fewer source packages) to port.  There are a few categories of packages to consider:

  • Already ported and available.  This is the good news, and covers packages such as dbus-python.  Unfortunately, there aren't too many others, but we need to check with Debian and make sure we're in sync with any packages there that already support Python 3 (python3-dateutil comes to mind).
  • Upstream supports Python 3, but it is not yet available in Debian or Ubuntu.  These packages should be fairly easy to port, since we have pretty good packaging guidelines for supporting both Python 2 and Python 3.
  • Packages with better replacements for Python 3.  A good example is the python-simplejson package.  Here, we might not care as much because Python 3 already comes with a json module in its standard library, so code which depends on python-simplejson and is required for the desktop CD, should be ported to use the stdlib json module.  python-gobject is another case where porting is a better option, since pygi (gobject-introspection) already supports Python 3.
  • Canonical is the upstream.  Many packages in the archive, such as python-launchpadlib and python-lazr.restfulclient are developed upstream by Canonical.  This doesn't mean you can't or shouldn't help out with the porting of those modules, it's just that we know who to lean on as a last resort.  By all means, feel free to contribute to these too!
  • Orphaned by upstream.  These are the most problematic, since there's essentially no upstream maintainer to contribute patches to.  An example is python-oauth.  In these cases, we need to look for alternatives that are maintained upstream, and open to porting to Python 3.  In the case of python-oauth, we need to investigate oauth2, and see if there are features we're using from the abandoned package that may not be available in the supported one.
  • Unknowns.  Well, this one's the big risky part because we don't know what we don't know.
We need your help!  First of all, there's no way I can personally port everything on our list, including both libraries and applications.  We may have to make some hard choices to drop some functionality from Ubuntu if we can't get it ported, and we don't want to have to do that.  So here are some ways you can contribute:
  • Fill in the spreadsheet with more information.  If you're aware of an upstream or Debian port to Python 3, let us know.  It may make it easier for someone else to enable the Python 3 version in Debian, or to shepherd the upstream patch to landing on their trunk.
  • Help upstream make a Python 3 port available.  There are lots of resources available to help you port some code, from quick references to in-depth guides.  There's also a mailing list (and Gmane newsgroup mirror) you can join to get help, report status, and have other related discussions. Some people have asked Python 3 porting questions on StackOverflow, using the tags #python, #python-3.x, and #porting
  • Join us on the #python3 IRC channel on Freenode.
  • Subscribe to the python-porting mailing list.
  • Get packages ported in Debian.  Once upstream supports Python 3, you can extend the existing Debian package to expose this support into Debian.  From there, you or we can make sure that gets sync'd into Ubuntu.
  • Spread the word!  Even if you don't have time to do any ports yourself, you can help publicize this effort through social media, mailing lists, and your local Python community.  This really is a Python-wide effort!
Python 3.3 is scheduled to be released later this year.  Please help make 2012 the year that Python 3 reached critical mass!


On a more personal note, I am also committed to making Mailman 3 a Python 3 application, but right now I'm blocked on a number of dependencies.  Here are the list of dependencies from the file, and their statuses.  I would love it if you help get these ported too!
Of course, these are only the direct dependencies.  Others that get pulled in include:

Read more
Barry Warsaw

Lessons in porting to Python 3

Yesterday, I completed my port of dbus-python to Python 3, and submitted my patch upstream.  While I've yet to hear any feedback from Simon about my patch, I'm fairly confident that it's going in the right direction.  This version should allow existing Python 2 applications to run largely unchanged, and minimizes the differences that clients will have to make to use the Python 3 version.

Some of the changes are specific to the dbus-python project, and I included a detailed summary of those changes and my rationale behind them.  There are lots of good lessons learned during this porting exercise that I want to share with you, have a discussion about, and see if there aren't things we core Python developers can do in Python 3.3 to make it even easier to migrate to Python 3.

First, some background.  D-Bus is a project for same-system interprocess communication, and it's an essential component of any Linux desktop.  The D-Bus system and C API are mature and well-defined, and there are bindings available for many programming language, Python included of course.  The existing dbus-python package is only compatible with Python 2, and most recommendations are to use the Gnome version of Python bindings should you want to use D-Bus with Python 3.  For us in Ubuntu, this isn't acceptable though because we must have a solution that supports KDE and potentially even non-UI based D-Bus Python servers.  Several ports of dbus-python to Python 3 have been attempted in the past, but none have been accepted upstream, so naturally I took it as a challenge to work on a new version of the port.  After some discussion with the upstream maintainer Simon McVittie, I had a few requirements in mind:

  • One code base for both Python 2 and Python 3.  It's simply too difficult to support multiple development branches, so one branch must be compilable in both versions of Python.  Because dbus-python is not setuptools-based, I not to rely on 2to3 to auto-convert the Python layer.  This is more difficult, but given the next requirement, entirely possible.
  • Minimum Python versions to support are 2.6 and 3.2 (Python 2.7 is also supported).  Python 2.6 contains almost everything you need to do a high quality port of both the Python layer and the C extension layer with a single code base.  Python 2.7 has one or two additional helpers, but they aren't important enough to count Python 2.6 out.  For dbus-python, this specifically means dropping support for Python 2.5, which is more than 5 years old at the time of this writing.  Also, it makes no sense to support Python 3.0 or 3.1 as neither of those are in wide-spread use.
  • Minimize any API changes seen by Python 2 code, and minimize the changes needed to port clients to Python 3.  For the former, this means everything from keeping Python APIs unchanged to keeping the inheritance hierarchy the same.  Python 2 programs will see a few small changes after the application of my patches; I'll describe them below but they should be inconsequential for the vast majority of Python 2 applications.  While it's unavoidable that Python 3 applications will see a different API, these differences have been minimized.
There are two main issues that had to be sorted out for this port, and in general for most ports to Python 3: bytes vs. strings, and ints vs. longs.  For the latter, you probably know that where Python 2 has two integer types, Python 3 has only one. In Python 3, all integers are longs, and there is no L suffix for integer literals.  This turned out to be trickier in the dbus-python case because dbus supports a numeric stack of various integer widths, and in Python 2 these are implemented as subclasses of the built-in int and long types.  Because there are only longs in Python 3, the inheritance hierarchy a Python application will see changes between Python 2 and Python 3.  This is unavoidable.

I also made the decision to change some object types to longs in both versions of Python, where I thought it was highly unlikely that Python clients would care.  Specifically, many dbus objects have a variant_level attribute, which is usually zero, but can be any positive integer.  For implementation simplicity, I changed these to longs in Python 2 also.

Ah, bytes vs. strings is always where things get interesting when porting to Python 3.  It's the single most brain hurty exercise you will have to go through.  Remember that Python 2 lets you cheat.  If you not sure whether the entity you're dealing with is some bytes, or some (usually ASCII-encoded) string, just use a Python 2 str type (a.k.a. 8-bit string) and let Python's automatic conversion rules change it to a unicode when the two types meet.  You can't get away with this in Python 3 though, for very good reasons - it's error prone, and can lead to data corruption or the annoyingly ubiquitous and hard to predict UnicodeErrors.

In Python 3, you must be clear about what are bytes and what are strings (i.e. unicodes), and you must be explicit when converting between the two.  Yes, this can be painful at times but in my opinion, it's crucial that you do so.  It's that important to eliminate UnicodeErrors that you can't defend against and your users won't understand or be able to correct.  Once you're clear in your own mind as to which are strings and which are bytes, it's usually not that hard to reflect that clearly in your code, especially if you leave Python 2.5 and anything earlier behind, which I highly recommend.

dbus-python presented an interesting challenge here.  It has several data types in its C API that are defined as UTF-8 encoded char*'s.  At first blush, it seemed to me that these should be reflected in Python 3 as bytes objects to simplify the conversion in the extension module to and from char*'s.  It turns out that this was a bad idea from an implementation stand point, and dbus-python's upstream maintainer had already expressed his opinion that these data types should be exposed as unicodes in Python 3.  After having failed at my initial attempts at making them bytes, I now agree that they must be unicodes, both for implementation simplicity and for minimal impact on porting user code.

The biggest problem I ran into with the choice of bytes is that the callback dispatch code in dbus-python is complex, difficult to understand and debug, driven by external data, and written with a deep assumption of operating on strings.  For example, when the dbus C API receives a signal, it must determine whether there is a Python function registered to handle that signal, and it does this by comparing a number of client-registered parameters, such as the method name, the interface, and the object path.  If the dbus C API was turning these parameters into bytes, but the clients had registered strings, then the comparisons in the callback dispatch routines would fail, either loudly with an exception, or silently with failing comparisons.  The former were relatively easy to track down and fix, by explicitly decoding client-registered strings to bytes.  But the latter, silent failures, were nearly impossible to debug.  Add to that the fact that there were so many roads into the registration system, that it was also very difficult to coerce all incoming data early enough so that coercion wasn't necessary at comparison time.  I was left with the unappealing alternative of forcing all client code to also change their data from using strings to using bytes, which I realized would be much too high a burden on clients porting their applications to Python 3.  Simon was right, but it was a useful exercise to fail at anyway.

(By way of comparison, it took me the better part of a week and a half to try to get the test suite passing when these objects were bytes, which I was ultimately unable to do, and about a day to get them passing when everything was unicodes.  That's gotta tell you something right there, and hopefully not that "I suck" :).

Let's look at some practical advice that may help you in your own porting efforts.

  • Target nothing older than Python 2.6 or Python 3.2.  I mentioned this before, but it's really going to make your life easier.  Specifically, drop Python 2.5 and earlier and you will thank yourself[1].  If you absolutely cannot do this, consider waiting to port to Python 3.  Note that while Python 2.7 has a few additional conveniences for supporting both Python 2 and Python 3 in a single code base, I did not find them compelling enough to drop Python 2.6 support.
  • Where you have C types with reprs, make those reprs return unicodes in both versions.  Many dbus-python types have somewhat complicated reprs because they return different strings depending on whether their variant_levels are zero or non-zero.  #ifdef'ing all of these was just too much work. Because most code probably doesn't care about the specific type of the repr, and because Python 2 allows unicode reprs, and because I have a very clever hack for this[2], I decided to make all reprs return unicodes in both versions of Python.
  • Include the following __future__ imports in your Python code: print_function, absolute_import, and unicode_literals.  In Python 2.6 and 2.7, these enable features that are the default in Python 3, and so make it easier to support both with one codebase.  Specifically, change all your print statements to print() functions, and remove all your u'' prefixes from your unicode literals.  Be sure to b'' prefix all your byte literals[3].
  • Wherever possible, in your extension modules, change all your PyInts to PyLongs.  In dbus-python, this means that the variant_level attributes are longs in both Python versions, as are values that represent such things as UNIX file descriptors.  The only place where I kept PyInts in Python 2 (and their requisite #ifdefs to use PyLongs in Python 3) was in the numeric stack inheritance hierarchy, mostly so that Python 2 code which cares about such things would not have to change.
  • Define a Python variable and a C macro for determining whether you're running in Python 2 or Python 3.  The former is used in dbus-python because under Python 3, there is no UTF8String type any more, among other subtle differences.  The latter is used to simply the #ifdef tests where they're needed[4].
  • In your C code, #include <bytesobject.h> .  This header exposes aliases for all PyString calls so that you can use the Python 3 idiom of PyBytes.  Then globally replace all PyString_Foo() calls with PyBytes_Foo() and the code will look clean and be compilable under both versions of Python.  You may need to add explicit PyUnicode calls where you need to discern between bytes and strings, but again, this code will be completely portable between Python 2 and Python 3.
  • Try to write your functions to accept both unicodes and bytes, but always normalize them to one type or the other for internal use, and choose one or the other to return.  Some Python stdlib methods are polymorphic in that they return bytes when handed bytes, and unicodes when handed unicodes.  This can be convenient in some cases, but problematic in others.  Choose carefully when porting your APIs.
  • Don't use trailing-L long literals if you can help it.
  • Switch to using Py_TYPE() everywhere instead of de-references ob_type explicitly.  The structures are laid out differently between Python 2 and Python 3, and this Python-supplied macro hides the ugliness from you.
Here are a few other miscellaneous issues you should be aware of:

Metaclasses are defined differently in Python 2 and Python 3, and you cannot write any Python code snippet that is even compilable between the two.  That's because the syntax for defining a class that derives from a metaclass in Python 3 is illegal syntax in Python 2. Your module simply won't compile.  My solution was to use exec() on a string.  For this reason, I suggest keeping metaclass subclasses as simple as possible, so that string is nice and small.

Get rid of all your uses of iteritems(), iterkeys(), itervalues(), and xrange().  You probably don't need the optimization these provide, and they do not exist in Python 3.  You can conditionalize around them, but I think in most cases it's not worth it.  If you really need the optimization, then you'll have to figure out a way around the missing names in Python 3.  But note that Python 3 is already more efficient for the first three, since you get back dictview objects instead of concrete lists.

PyArg_Parse() and friends lack a 'y' code in Python 2.  In Python 3, these return bytes objects.  Where I absolutely needed bytes in Python 3 and strs in Python 2, I just #ifdef'd around the PyArg_Parse() calls.  In Python 3, there's no equivalent of 'z' for bytes objects (which accept Nones and set the output variable to NULL in that case).  If this is important to you, you might need to write an O& converter.

Watch out for next() vs. __next__() when writing iterators.  Python 2 uses the former while Python 3 uses the latter.  Best to define the method once, and then support compatibility via `next = __next__` in your class definition.

operator.isSequenceType() is gone in Python 3.  Here's the code I use for compatibility:

def is_sequence(obj):
        from collections import Sequence
    except ImportError:
        from operator import isSequenceType
        return operator.isSequenceType(obj)
        return isinstance(obj, Sequence)

If you by chance use PyCObjects in your extension module, you'll have to switch these to PyCapsules for Python 3.  If you're lucky enough to be able to drop Python 2.6, you can use PyCapsules everywhere, since they are available in Python 2.7.

Let me close by saying that you shouldn't be frightened off by the prospect either of porting your code to Python 3, or supporting both Python 2 and Python 3 in a single code base.  It's definitely doable, and we in the Python community are gaining more experience at it every day.  I strongly feel that we are well on the track of Guido's original goal of mainstream Python 3 acceptance within 5 years of Python 3's release.  I think we're soon going to see a critical mass of Python 3 ports, after which time, you'll just seem old and creaky if you don't port to Python 3.

There are some other excellent references for helping you port out there on the 'net, and for the most part, I've tried not to duplicate their information.  Here are some useful places to start:

    [1] It is not impossible to support both Python 3 and versions of Python 2 earlier than 2.6, just more difficult.  Michael Foord has had success doing this for libraries of his such as mock.  I just think it's more trouble than it's worth in most cases.

    [2] Here's the clever hack, but first a set-up.  The reprs of many of the dbus-python objects are conditional on whether the variant_level is zero or not.  The variant_level is only included in the repr when it is greater than zero (with zero being the typical value).  This just means there are usually two calls to PyUnicode_FromFormat() in each C repr implementation, and #ifdef'ing them to use PyString_FromFormat() in Python 2 would just double the pain.  In addition, the reprs all include the repr of their parent objects, i.e. their base class repr.  The problem is that these base-class reprs will be PyBytes in Python 2 and PyUnicodes in Python 3, and there's nothing we can do about that.  As it turns out, Python 2.6 and Python 3.2 have a %V format with some very interesting semantics.  %V consumes two arguments, a PyObject* and a char*, but it only uses one of them.  When the first argument is not NULL, it uses that and ignores the second argument.  But when the first argument is NULL, it will use the second argument.

    How can this help produce portable code?  I define the following macro and use this everywhere the %V format code is given:
    #define REPRV(obj) \
        (PyUnicode_Check(obj) ? (obj) : NULL), \
        (PyUnicode_Check(obj) ? NULL : PyBytes_AS_STRING(obj))
    which would be used at a call site something like this:

    return PyUnicode_FromFormat("...%V...", REPRV(parent_repr));

    In Python 2, where parent_repr is a PyBytes, REPRV() will return NULL as the first argument, and via PyBytes_AS_STRING(), a char* in the second argument. In Python 3, where parent_repr is a PyUnicode, the first argument will just be the object and the second argument will be NULL (but it is ignored by Python).  As long as parent_repr is either a PyUnicode or a PyBytes (a.k.a. PyString), this works perfectly, and keeps the call sites simple and sane.  Beware though because if parent_repr can be any other type, this will crash your program.  Fortunately, Python doesn't allow for arbitrary repr types - they must be bytes or unicodes, so in practice this is pretty safe.

    [3] A recent thread in python-dev points out that this recommendation may not be practical if you're building PEP 3333-compliant WSGI applications.  My take on it is that PEP 3333's definition of "native strings" is a mistake, but sadly one that we have to live with for now.

    [4] Here's what my Python-level flag looks like:
    import sys
    is_py3 = getattr(sys.version_info, 'major', sys.version_info[0]) == 3
    Now I can use this in other code to switch behavior between Python 2 and Python 3.  For example, in dbus-python to import the UTF8String type in Python 2 only:
    from dbus import is_py3
    if is_py3:
       from _dbus_bindings import UTF8String
    This is much easier and less error prone then doing the sys.version_info test everywhere.  The other problem is that sys.version_info is a namedtuple only in Python 2.7, so in Python 2.6, it has no attribute called 'major'.

    The C-level macro looks like this:
    #if PY_MAJOR_VERSION >= 3
    #define PY3K
     So now C code only needs to do:

    #ifdef PY3K
    /* Do something Python 3-ish */
    /* Do something Python 2-ish */

    You might also find the six package to be useful here, at least for writing portable Python code.

    Read more