Canonical Voices

Posts tagged with 'c'


As part of the ongoing effort we are doing in Canonical to snappify the world, we are trying to make available more and more software as easily-installable, secure, snaps. One of the things snapd does to isolate applications is to install snaps in separate folders in /snap/<mysnap>, and also creating $HOME/snap/<mysnap> for storing the snap data. Unfortunately, this changes where many applications expects data files to be as per the usual Unix conventions.

Solving this usually implies maintaining patches that change the file paths. If the $SNAP environment variable is set, the patches use it to find files in the right locations. However, this is cumbersome as upstreams are not always willing to accept the patches, at least until snap package format gets more popular. Also, in some cases we might be interested in snappifying proprietary software, where patches are not an option.

To solve these problems, Michael Terry created a library that uses the LD_PRELOAD trick to intercept calls to glibc. Although the library works really well, it has the disadvantage of not working in all cases, like on programs that perform syscalls directly without using the C library, or on statically compiled binaries.

In this post, I have explored an alternative to using LD_PRELOAD that would solve these issues by going to a lower level and intercepting syscalls using the ptrace syscall.

ptrace is used by programs like gdb or strace for debugging purposes. It is a swiss knife tool that lets us access a process’ memory and registers (the tracee) from another process (the tracer). There are many good tutorials around that explain how to use it, like this or this (from which I have borrowed parts of the code), so here I will focus on how it can be used to modify arbitrary syscall arguments. The concrete problem I had at hand was how to change a call to, say, open() so it ends up opening a file in a different path to the one originally specified by the tracee.

I have developed a proof of concept for this that can be found in github. The code is specific to x86_64, although the concepts behind are applicable to other architectures. As a word of caution, I have preferred to not do as many checks as I should to have cleaner code, so please do not consider it production-ready. I will go now through the code, function by function. Skipping the include directives and the forward declarations we get to main():

int main(int argc, char **argv)
    pid_t pid;
    int status;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <prog> <arg1> ... <argN>\n", argv[0]);
        return 1;

    if ((pid = fork()) == 0) {
        ptrace(PTRACE_TRACEME, 0, 0, 0);
        kill(getpid(), SIGSTOP);
        return execvp(argv[1], argv + 1);
    } else {
        waitpid(pid, &status, 0);
        return 0;

Here we do all the usual stuff needed when we want to trace a process from the start: we call fork(), then the child executes ptrace() with request PTRACE_TRACEME to indicate that it is willing to be traced. After that, it sends itself a SIGSTOP signal, which makes it stop (the execve call1 will be performed later). At that point, the parent process, which was waiting for a signal from the child, re-starts. The first thing it does is setting ptrace option PTRACE_O_TRACESYSGOOD, which makes the kernel set bit 7 in the signal numbers so we can easily distinguish between system call traps and normal traps. After that, it calls process_signal(), which is defined as

static void process_signals(pid_t child)
    const char *file_to_redirect = "ONE.txt";
    const char *file_to_avoid = "TWO.txt";

    while(1) {
        char orig_file[PATH_MAX];

        /* Wait for open syscall start */
        if (wait_for_open(child) != 0) break;

        /* Find out file and re-direct if it is the target */

        read_file(child, orig_file);

        if (strcmp(file_to_avoid, orig_file) == 0)
            redirect_file(child, file_to_redirect);

        /* Wait for open syscall exit */
        if (wait_for_open(child) != 0) break;

This function is the main loop of the tracer. It waits for the open() syscall from the child to be started, and for it to exit (ptrace signals us both events) by calling wait_for_open(). When an open() call is detected, we read the file that the child wants to open in read_file(), and then we compare with string “TWO.txt”. If there is a match, we change the syscall arguments so “ONE.txt” is opened instead. Next we analyze the different functions that perform the low level stuff, and that contain architecture specific parts:

static int wait_for_open(pid_t child)
    int status;

    while (1) {
        ptrace(PTRACE_SYSCALL, child, 0, 0);
        waitpid(child, &status, 0);
        /* Is it the open syscall (sycall number 2 in x86_64)? */
        if (WIFSTOPPED(status) && WSTOPSIG(status) & 0x80 &&
            ptrace(PTRACE_PEEKUSER, child, sizeof(long)*ORIG_RAX, 0) == 2)
            return 0;
        if (WIFEXITED(status))
            return 1;

wait_for_open() is executed until an open() system call is detected. By calling ptrace with argument PTRACE_SYSCALL, we let the child continue until the next signal or syscall enter/exit. The first time this happens the child, which was stopped after sending itself SIGSTOP, continues its execution and calls execve(). The parent then waits for signals from the child. If the child has stopped due to the signal, the signal number has the 7th bit set (should happen if the signal was triggered due to a syscall as we have set PTRACE_O_TRACESYSGOOD option), and it is the open() syscall (system call number 2 for x86_64), then we return with status 0. If the child has actually exited, the return value is 1. If nothing of this happens, we wait for the next signal. Here we are using PTRACE_PEEKUSER request, which lets us access the tracee user area. This area contains an array with the general purpose registers, and we use offsets defined in <sys/reg.h> to access them. When performing a syscall, RAX register contains the syscall number. However, we use ORIG_RAX offset to grab that number instead of the also existing RAX offset. We do this because RAX is also used to store the return value for the syscall, so the kernel stores the syscall number with offset ORIG_RAX and the return value with offset RAX. This needs to be taking into account especially when processing the exit of the system call. More information can be found here.

Next function is

static void read_file(pid_t child, char *file)
    char *child_addr;
    int i;

    child_addr = (char *) ptrace(PTRACE_PEEKUSER, child, sizeof(long)*RDI, 0);

    do {
        long val;
        char *p;

        val = ptrace(PTRACE_PEEKTEXT, child, child_addr, NULL);
        if (val == -1) {
            fprintf(stderr, "PTRACE_PEEKTEXT error: %s", strerror(errno));
        child_addr += sizeof (long);

        p = (char *) &val;
        for (i = 0; i < sizeof (long); ++i, ++file) {
            *file = *p++;
            if (*file == '\0') break;
    } while (i == sizeof (long));

The read_file() function uses PTRACE_PEEKUSER as in the previous function to retrieve the first argument to the call function, which is the address of the string with the file name. This parameter is stored in the RDI register. Then it uses PTRACE_PEEKTEXT ptrace request, which lets us copy over data from the traced process’ memory. This is performed by words, so we do this in a loop until we find a null byte that indicates the end of the string.

The last function is

static void redirect_file(pid_t child, const char *file)
    char *stack_addr, *file_addr;

    stack_addr = (char *) ptrace(PTRACE_PEEKUSER, child, sizeof(long)*RSP, 0);
    /* Move further of red zone and make sure we have space for the file name */
    stack_addr -= 128 + PATH_MAX;
    file_addr = stack_addr;

    /* Write new file in lower part of the stack */
    do {
        int i;
        char val[sizeof (long)];

        for (i = 0; i < sizeof (long); ++i, ++file) {
            val[i] = *file;
            if (*file == '\0') break;

        ptrace(PTRACE_POKETEXT, child, stack_addr, *(long *) val);
        stack_addr += sizeof (long);
    } while (*file);

    /* Change argument to open */
    ptrace(PTRACE_POKEUSER, child, sizeof(long)*RDI, file_addr);

The redirect_file() function is the most interesting one of this program. It modifies the argument to the open system call, forcing the child to open a different file to the one it originally specified. The main problem with this is that we need to modify the child’s memory space so the new file name is used by the kernel. We can change the tracee’s memory easily by using PTRACE_POKETEXT, the issue is, where can we store it?

A possible option is to save the original string using PTRACE_PEEKTEXT, then overwrite it by using PTRACE_POKETEXT. When we get called after the sycall exits, we copy back the original data. This can work fine in some cases, but it can be problematic if the new file name is longer than the original. We could be overwriting data that is used by other threads, which are not necessarily stopped so they could access that data while the kernel is processing the call. Or that data we are overwriting could be part of another parameter to the syscall, which would not happen for open(), but it is possible for other syscalls like link(). Finally, there is also the possibility that the string we are trying to modify is in a read only segment. Therefore, this is not a safe option.

After noticing this, I considered the option of adding a read-write segment to the binary under study, or to resize an existing one. However, I found there are not that many tools to do this, and those that apparently could do the job like ERESI, were not very intuitive2. Also, we would need to find out where the new segment gets loaded to know where to write, which would complicate the code. Furthermore, I wanted to avoid modifying the binary if possible.

Finally, I concluded that the stack was exactly what I needed: it is of course RW, and a reference address can be found by simply looking at the RSP register. What we have to do is make sure we write in a safe part of the stack. This can be performed by writing to addresses lower than the RSP (that is, the free part of the stack). To achieve this, we “reserve” stack memory so we can write a string of up to PATH_MAX length, and we add up 128 bytes for the red zone (this size is specified by the x86_64 ABI). Note also that syscalls do not write on the process stack: one of the very first things that is done by the Linux kernel syscalls entry point is to switch RSP to a kernel stack. This approach has also the advantage of being automatically thread-friendly, as each one has its own stack. On the other hand, there is the possibility of writing outside the stack. However that risk is quite small nowadays, as stacks of user space programs tend to be big and typically auto-grow on page fault. Another advantage is that we do not need to save and recover memory areas at enter/exit of the syscall, as the tracee should not write anything in the used memory area.

Once it is decided where to write, the implementation is straightforward: first we use PTRACE_PEEKUSER to get the RSP value of the tracee. Then, we write the new file name to a pointer lower than the RSP calculated as explained in the previous paragraph. The data is written by using PTRACE_POKETEXT, word by word. Finally, we change the child’s RDI register so it points to the new address.

Now we can give the program a try. I created a couple of files with content:

$ cat ONE.txt 
This is ONE.txt
$ cat TWO.txt 
This is TWO.txt

Executing the same cat command using redirect we have:

$ gcc redirect.c -o redirect
$ ./redirect cat ONE.txt 
This is ONE.txt
$ ./redirect cat TWO.txt 
This is ONE.txt

Things work as publicized: we modify the file opened by the cat in case it tries to show the content of “TWO.txt”.


As has been seen, the code to make this is remarkably small, which shows the power of the ptrace call. There are indeed parts of this that are very architecture specific, but that is mostly the name of the registers and maybe the red zone size, so it should be relatively straightforward to make it multi-architecture by adding some macros.

Another appreciation is that the example is for the open() syscall, but this technique can be applied to arbitrary arguments which are passed to any syscall as pointers to data in the traced process.

To finish, the main drawback for this solution is performance, as we have to stop (twice) for each syscall invoked by the child, with all the context switches that implies. A possible solution would be to use ptrace in combination with seccomp and seccomp Berkeley Packet Filters, which apparently make possible for the tracer to specify the syscalls that would provoke a trap. That would be, however, matter for another post.

Read more

Occasionally I find myself processing input data which arrives as a stream, like data from files or from a socket, but that has a known structure that can be modeled with C types. For instance, let’s say we are receiving from a socket a parcel that consists on a header of one byte, and a payload that is an integer. A naive way to handle this is the following (simplified for readability) code snippet:

int main(void)
    int fd;
    char *buff;
    struct sockaddr_in addr;
    int vint;
    char vchar;

    fd = socket(AF_INET, SOCK_STREAM, 0);
    buff = malloc(BUFF_SIZE);
    /* Init socket address */
    connect(fd, (struct sockaddr *) &addr, sizeof(addr));

    read(fd, buff, BUFF_SIZE);

    vchar = buff[0];
    vint  = *(int *) &buff[1];
    /* Do something with extracted data, free resources */
    return 0;

Here we get the raw data with a read() call, we read the first byte, then we read an integer by taking a pointer to the second read byte and casting it to a pointer to an integer. (for this example we are assuming that the integer inserted in the stream has the same size and endianness as the CPU ones).

There is a big issue with this: the cast to int *, which is undefined behavior according to the C standard 1. And it is because things can go wrong in at least two ways, first due to pointer aliasing rules, second due to type alignment.

Strict pointer aliasing tells the compiler that it can assume that pointers to different types point to different places in memory. This allows some optimizations, like reordering. Therefore, we could be in trouble if, say, we take &buff[1] into a char * and use it to write a byte in that location, as reordering could hit us. So just do not do that. Let’s also hope that we have a compiler that is not completely insane and does not move our reading by int pointer before the read() system call. We could also disable strict aliasing if we are using GCC with option -fno-strict-aliasing, which by the way is something that the Linux kernel does. At any rate, this is a complex subject and I will not dig into it this time.

We will concentrate in this article on how to solve the other problem, that is, how to access safely types that are not stored in memory in their natural alignment.

The C Standard-Compliant Solution

Before moving further, keep in mind that it is always possible to be strictly compliant with the standard and access safely memory without breaking language rules or using compiler or machine specific tricks. In the example, we could retrieve vint by doing

    vint  =   buff[1] + (buff[2] << 8)
            + (buff[3] << 16) + (buff[4] << 24);

(supposing stored data is little endian).

The issue here is performance: we are implicitly transforming four bytes to integers, then we have to bit-shift three of them, and finally we have to add them up. Note however that this is what we want if data and CPU have different endianness.

Doing Unaligned Memory Accesses

In all machine architectures there is a natural alignment for the different data types. This alignment is usually the size of the types, for instance in 32 bits architectures the alignment for integers is 4, for doubles it is 8, etc. If instances of these types are not stored in memory positions that are multiple of their alignment, we are talking about unaligned access. If we try to access unaligned data either of these can happen:

  • The hardware let’s us access it – but always at a performance penalty.
  • An exception is triggered by the CPU. This type of exception is called bus error 2.

We might be willing to accept the performance penalty 3, which is mitigated by CPU caches and not that noticeable in certain architectures like x86-64 , but we certainly do not want our program to crash. How possible is this? To be honest it is not something I have seen that often. Therefore, as a first analysis step, I checked how easy it was to get bus errors. To do so, I created the following C++ program, access1.cpp (I could not resist to use templates here to reduce the code size):

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
void print_unaligned(char *ptr)
    T *val = reinterpret_cast<T *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << val << " *val: " << *val << endl;

int main(void)
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;

The program allocates memory using new char[], which as malloc() in C is guaranteed to allocate memory with the same alignment as the strictest fundamental type. After zeroing the memory, we access mem and mem + 1 by casting to different pointer types, knowing that the second address is odd, and therefore unaligned except for char * access.

I compiled the file with g++ on my laptop, ran it, and got

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/, for GNU/Linux 2.6.32, BuildID[sha1]=09d0fb19340a10941eef4c3dc4d6eb29383e717d, not stripped
$ ./access1
Type is "i" with size 4
0x16c3c20 *val: 0
Type is "i" with size 4
0x16c3c21 *val: 0
Type is "x" with size 8
0x16c3c20 *val: 0
Type is "x" with size 8
0x16c3c21 *val: 0
Type is "e" with size 16
0x16c3c20 *val: 0
Type is "e" with size 16
0x16c3c21 *val: 0

No error for x86-64. This was expected as Intel architecture is known to support unaligned access by hardware, at a performance penalty (which is apparently quite small these days, see 4).

The second try was with an ARM CPU, compiling for arm-32:

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=8c3c3e7d77fddd5f95d18dbffe37d67edc716a1c, not stripped
$ ./access1
Type is "i" with size 4
0x47b008 *val: 0
Type is "i" with size 4
0x47b009 *val: 0
Type is "x" with size 8
0x47b008 *val: 0
Type is "x" with size 8
Bus error (core dumped)

Now we get what we were searching for, a legitimate bus error, in this case when accessing a long long from an unaligned address. Commenting out the offending line and letting the program run further showed the error also when accessing a long double from mem + 1.

Fixing Unaligned Memory Accesses

After proving that this could be a real problem, at least for some architectures, I tried to find a solution that would let me do unaligned memory accesses in the most generic way. I could not find anything safe that was strictly following the C standard. However, all C/C++ compilers have ways to define packed structures, and that came to the rescue.

Packed structures are intended to minimize the padding that is introduced by alignment needed by the structure members. They are used when minimizing storage is a big concern. But what is interesting for us is that its members can be unaligned due to the packing, so dereferencing them must take that into account. Therefore, if we are accessing a type in a CPU that does not support unaligned access for that type the compiler must synthesize code that handles this transparently from the point of view of the C program.

To test that this worked as expected, I wrote access2.cpp, which uses GCC attribute __packed__ to define a packed structure:

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
struct __attribute__((__packed__)) struct_safe
    T val;

template <typename T>
void print_unaligned(char *ptr)
    struct_safe<T> *safe = reinterpret_cast<struct_safe<T> *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << safe << " safe->val: " << safe->val << endl;

int main(void)
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;

In this case, instead of directly casting to the type, I cast to a pointer to the packed struct and access the type through it.

Compiling and running for x86-64 got the expected result: no error, all worked as before. Then I compiled and ran it in an ARM device:

$ g++ access2.cpp -o access2
$ file access2
access2: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=9a1ee8c2fcd97393a4b53fe12563676d9f2327a3, not stripped
$ ./access2
Type is "i" with size 4
0x391008 safe->val: 0
Type is "i" with size 4
0x391009 safe->val: 0
Type is "x" with size 8
0x391008 safe->val: 0
Type is "x" with size 8
0x391009 safe->val: 0
Type is "e" with size 8
0x391008 safe->val: 0
Type is "e" with size 8
0x391009 safe->val: 0

No bus errors anymore! It worked as expected. To gain some understanding of what was happening behind the curtains, I disassembled the generated ARM binaries. For both access1 and access2, the same instruction was being used when I was getting a value after casting to int: LDR, which unsurprisingly loads a 32-bit word into a register. But for the long long, I found that access1 was using LDRD, which loads double words (8 bytes) from memory, while access2 was using two LDR instructions instead.

This all made a lot of sense, as ARM states that LDR supports access to unaligned data, while LDRD does not 5. Indeed the later is faster, but has this restriction. It was also good to check that there was no penalty for using the packed structure for integers: GCC does a good job to discriminate when the CPU really needs to handle differently unaligned accesses.

GCC cast-align Warning

GCC has a warning that can help to identify points in the code when we might be accessing unaligned data, which is activated with -Wcast-align. It is not part of the warnings that are activated by options -Wall or -Wextra, so we will have to add it explicitly if we want it. The warning is only triggered when compiling for architectures that do not support unaligned access for all types, so you will not see it if compiling only for x86.

When triggered, you will see something like

file.c:28:23: warning: cast increases required alignment of target type [-Wcast-align]
   int *my_int_ptr = (int *) &buf[i];


The moral of this post is that you need to be very careful when casting pointers to a type different to the original one 6. When you need to do that, think about alignment issues first, and also think on your target architectures. There are programs that we want to run on more than one CPU type and too many times we only test in our reference.

Unfortunately the C standard does not give us a standard way of doing efficient access to unaligned data, but most if not all compilers seem to provide ways to do this. If we are using GCC, __attribute__((__packed__)) can help us when we might be doing unaligned accesses. The ARM compiler has a __packed attribute for pointers 7, and I am sure other compilers provide similar machinery. I also recommend to activate -Wcast-align if using GCC, which makes easier to spot alignment issues.

Finally, a word of caution: in most cases you should not do this type of casts. Some times you can define structures and read directly data onto them, some times you can use unions. Bear in mind always the strict pointer aliasing rules, which can hit back. To summarize, think twice before using the sort of trick showed in the post, and use them only when really needed.

Read more

Over the last several months there has been noticeable and growing pain associated with the evolving integration tests around snapd, and given the project goal of being a cross-distribution platform, we are very keen on solving this problem appropriately so that stability is guaranteed everywhere.

With that mindset a more focused effort was made over the last few weeks to produce a tool that can get the project out of those problems, and onto a runway of more pleasant stability. Despite the short amount of time, I’m very happy about the Spread project which resulted from this effort.

Spread is not Jenkins or Travis, and is not a language or library either. Spread is a tool that will very conveniently ship your code to one or more systems, in parallel, and then offer the right set of options so you can run whatever you need to run to make sure the logic is working, and drive it all from the local system. That implies you can run Spread inside Travis, Jenkins, or your terminal, in a similar way to how your unit tests work.

Here is a short list of interesting facts about Spread:

  • Full-system tests with on demand machine allocation.
  • Multi-backend with Linode and LXD (for local runs) out of the box for now.
  • Multi-language since it can run arbitrary remote code.
  • Agent-less and driven via embedded ssh (kudos to Go team).
  • Convenient harness with project+backend+suite+test prepare and restore scripts.
  • Variants feature for test duplication without copy & paste.
  • Great debugging support – add -debug and stop with a shell inside every failure.
  • Reuse of servers – server allocation is fast, but not allocating is faster.
  • Reasonable test outputs with the shell’s +x mode on failures.
  • … and so forth.

This is all well documented, so I’ll just provide one example here to offer a real taste of how the system feels like.

This is spread.yaml, put in the project root to define the basics:

project: spread

            - ubuntu-16.04
            - ubuntu-14.04

path: /home/test

prepare: |
    echo Entering project...
restore: |
    echo Leaving project...

        summary: Integration tests
        prepare: |
            echo Entering suite...
        restore: |
            echo Leaving suite...

The suite name is also the path under which the tests are found.

Then, this is tests/hello/task.yaml:

summary: Greet the world
prepare: |
    echo "Entering task..."
restore: |
    echo "Leaving task..."
    FOO/a: one
    FOO/b: two
execute: |
    echo "Hello world!"
    [ $FOO = one ] || exit 1

The outcome should be almost obvious (intended feature :-). The one curious detail here is the FOO/a and FOO/b environment variables. This is how to introduce variants, which means this one test will in fact become two: first with FOO=one, and then with FOO=two. Now consider that such environment variables can be defined at any level – project, backend, suite, and task – and imagine how easy it is to test small variations without any copy & paste. After cascading takes place (project→backend→suite→task) all environment variables using a given variant key will be present at once on the same execution.

Now let’s try to run this configuration, including the -debug flag so we get a shell on the failures. Note how with a single test we get four different jobs, two variants over two systems, with the variant b failing as instructed:

$ spread -debug

2016/06/11 19:09:27 Allocating lxd:ubuntu-14.04...
2016/06/11 19:09:27 Allocating lxd:ubuntu-16.04...
2016/06/11 19:09:41 Waiting for LXD container to have an address...
2016/06/11 19:09:43 Waiting for LXD container to have an address...
2016/06/11 19:09:44 Allocated lxd:ubuntu-14.04.
2016/06/11 19:09:44 Connecting to lxd:ubuntu-14.04...
2016/06/11 19:09:48 Allocated lxd:ubuntu-16.04.
2016/06/11 19:09:48 Connecting to lxd:ubuntu-16.04...
2016/06/11 19:09:52 Connected to lxd:ubuntu-14.04.
2016/06/11 19:09:52 Sending project data to lxd:ubuntu-14.04...
2016/06/11 19:09:53 Connected to lxd:ubuntu-16.04.
2016/06/11 19:09:53 Sending project data to lxd:ubuntu-16.04...

2016/06/11 19:09:54 Error executing lxd:ubuntu-14.04:tests/hello:b :
+ echo Hello world!
Hello world!
+ [ two = one ]
+ exit 1

2016/06/11 19:09:54 Starting shell to debug...

lxd:ubuntu-14.04 ~/tests/hello# echo $FOO
lxd:ubuntu-14.04 ~/tests/hello# cat /etc/os-release | grep ^PRETTY
PRETTY_NAME="Ubuntu 14.04.4 LTS"
lxd:ubuntu-14.04 ~/tests/hello# exit

2016/06/11 19:09:55 Error executing lxd:ubuntu-16.04:tests/hello:b :
+ echo Hello world!
Hello world!
+ [ two = one ]
+ exit 1

2016/06/11 19:09:55 Starting shell to debug...

lxd:ubuntu-16.04 ~/tests/hello# echo $FOO
lxd:ubuntu-16.04 ~/tests/hello# cat /etc/os-release | grep ^PRETTY
PRETTY_NAME="Ubuntu 16.04 LTS"
lxd:ubuntu-16.04 ~/tests/hello# exit

2016/06/11 19:10:33 Discarding lxd:ubuntu-14.04 (spread-129)...
2016/06/11 19:11:04 Discarding lxd:ubuntu-16.04 (spread-130)...
2016/06/11 19:11:05 Successful tasks
2016/06/11 19:11:05 Aborted tasks: 0
2016/06/11 19:11:05 Failed tasks: 2
    - lxd:ubuntu-14.04:tests/hello:b
    - lxd:ubuntu-16.04:tests/hello:b
error: unsuccessful run

This demonstrates many of the stated goals (parallelism, clarity, convenience, debugging, …) while running on a local system. Running on a remote system is just as easy by using an appropriate backend. The snapd project on GitHub, for example, is hooked up on Travis to run Spread and then ship its tests over to Linode. Here is a real run output with the initial tests being ported, and a basic smoke test.

If you like what you see, by all means please go ahead and make good use of it.

We’re all for more stability and sanity everywhere.


Read more
Colin Ian King

Using PR_SET_PDEATHSIG to reap child processes

The prctl() system call provides a rather useful PR_SET_PDEATHSIG option to allow a signal to be sent to child processes when the parent unexpectedly dies. A quick and dirty mechanism is trigger the SIGHUP or SIGKILL signal to kill the child immediately, or perhaps more elegantly to invoke a resource tidy up before exiting.

In the trivial example below, we use the SIGUSR1 signal to inform the child that the parent has died. I know printf() should not be used in a signal handler, it just makes the example simpler.

 #include <stdlib.h>                                 
#include <unistd.h>
#include <signal.h>
#include <sys/prctl.h>
#include <err.h>

void sigusr1_handler(int dummy)
printf("Parent died, child now exiting\n");

int main()
pid_t pid;

pid = fork();
if (pid < 0)
err(1, "fork failed");
if (pid == 0) {
/* Child */
if (signal(SIGUSR1, sigusr1_handler) == SIG_ERR)
err(1, "signal failed");
if (prctl(PR_SET_PDEATHSIG, SIGUSR1) < 0)
err(1, "prctl failed");

for (;;)
if (pid > 0) {
/* Parent */
printf("Parent exiting...\n");

return 0;

..the child process sits in an infinite loop, performing 60 second sleeps.  The parent sleeps for 5 seconds and then exits.  The child is then sent a SIGUSR1 signal and the handler exits.  In practice the signal handler would be used to trigger a more sophisticated clean up of resources if required.

Anyhow, this is a useful Linux feature that seems to be overlooked.

Read more
Colin Ian King

Smackerel of Opinion

It is approaching the Christmas Holiday season, so it's that time again to write some slightly obfuscated C in a seasonal way.  This year I thought I would try some coloured ASCII art for the output for a little variety.

#define r(s) s[e%(sizeof s-1)]
#include /* */
#define S "%s"/* Have */
#define u printf(/* a */
#define c )J/* Merry */
#define W "H"/* Christmas */
#define e rand()/* and */
#define U(i) v[i]/* a */
#define C(q) q[]=/* Happy */
#define J ;;/* New Year */
#define O [v]/* Colin.I.King */

typedef a
; a m, v[6] ,
H;a main(
){char C(
"*Oo", C(t
)"^~#",Q[ ]=
)".x+*";u S"2"
"m",o,o, o
c while(U(!!m)
<22)u S"%dm%39s\n" ,
o,0 O++>19?42:',',""
c while(0 O++<'~') u S
%39,o,r(s)c for(J){1 O=1
-U(1),srand(v),u S"0;0"W S
"0;2;%dm",o,o,' 'c for(m=0
;m>>4<1;++m){u S"%d;%d"W,o
, m+2,20-m c;for(H=0;H<1+(m
<<1);H++){4 O=!H|H==m<<1 ,
2 O=!(e&05),U(3)=H>m*5/
3,5 O=r(D)J if(4 O|U(
2)){u S"%d;%d;3%cm"
:'*',3 O?2:1+(U(1)^(1&e;)),r(Q),U(5)c}else u S"42;32\
;%dm%c",o,1+3 O,r(t)c u S"0m",o c} }while(m<19)u S"\
%d;19"W S"33;2;7m #\n",o,1+ ++m,o c sleep(m>=-H c}}

The source can be downloaded from here and compiled and run as follows:

gcc snowman.c -o snowman

and press control-C to exit when you have seen enough.

Read more

As detailed in the preliminary release of qml.v1 for Go a couple of weeks ago, my next task was to finish the improvements in its OpenGL API. Good progress has happened since then, and the new API is mostly done and available for experimentation. At the same time, there’s still work to do on polishing edges and on documenting the extensive API. This blog post aims to present the improvements made, their internal design, and also to invite help for finishing the pending details.

Before diving into the new, let’s first have a quick look at how a Go application using OpenGL might look like with qml.v0. This is an excerpt from the old painting example:

import (

func (r *GoRect) Paint(p *qml.Painter) {
        width := gl.Float(r.Int("width"))
        height := gl.Float(r.Int("height"))
        // ...

The example imports both the qml and the gl packages, and then defines a Paint method that makes use of the GL types, functions, and constants from the gl package. It looks quite reasonable, but there are a few relevant shortcomings.

One major issue in the current API is that it offers no means to tell even at a basic level what version of the OpenGL API is being coded against, because the available functions and constants are the complete set extracted from the gl.h header. For example, OpenGL 2.0 has GL_ALPHA and GL_ALPHA4/8/12/16 constants, but OpenGL ES 2.0 has only GL_ALPHA. This simplistic choice was a good start, but comes with a number of undesired side effects:

  • Many trivial errors that should be compile errors fail at runtime instead
  • When the code does work, the developer is not sure about which API version it is targeting
  • Symbols for unsupported API versions may not be available for linking, even if unused

That last point also provides a hint of another general issue: portability. Every system has particularities for how to load the proper OpenGL API entry points. For example, which libraries should be linked with, where they are in the local system, which entry points they support, etc.

So this is the stage for the improvements that are happening. Before detailing the solution, let’s have a look at the new painting example in qml.v1, that makes use of the improved API:

import (

func (r *GoRect) Paint(p *qml.Painter) {
        gl := GL.API(p)
        width := float32(r.Int("width"))
        height := float32(r.Int("height"))
        // ...

With the new API, rather than importing a generic gl package, a version-specific gl/2.0 package is imported under the name GL. That choice of package name allows preserving familiar OpenGL terms for both the functions and the constants (gl.Enable and GL.BLEND, for example). Inside the new Paint method, the gl value obtained from GL.API holds only the functions that are defined for the specific OpenGL API version imported, and the constants in the GL package are also constrained to those available in the given version. Any improper references become build time errors.

To support all the various OpenGL versions and profiles, there are 23 independent packages right now. These packages are of course not being hand-built. Instead, they are generated all at once by a tool that gathers information from various sources. The process can be tersely described as:

  1. A ragel-based parser processes Qt’s qopenglfunctions_*.h header files to collect version-specific functions
  2. The Khronos OpenGL Registry XML is parsed to collect version-specific constants
  3. A number of tweaks defined in the tool’s code is applied to the state
  4. Packages are generated by feeding the state to text templates

Version-specific functions might also be extracted from the Khronos Registry, but there’s a good reason to use information from the Qt headers instead: Qt already solved the portability issue. It works in several platforms, and if somebody is using QML successfully, it means Qt is already using that system’s OpenGL capabilities. So rather than designing a new mechanism to solve the same problem, the qml package now leverages Qt for resolving all the GL function entry points and the linking against available libraries.

Going back to the example, it also demonstrates another improvement that comes with the new API: plain types that do not carry further meaning such as gl.Float and gl.Int were replaced by their native counterparts, float32 and int32. Richer types such as Enum were preserved, and as suggested by David Crawshaw some new types were also introduced to represent entities such as programs, shaders, and buffers. The custom types are all available under the common gl/glbase package that all version-specific packages make use of.

So this is all working and available for experimentation right now. What is left to do is almost exclusively improving the list of function tweaks with two goals in mind, which will be highlighted below as those are areas where help would be appreciated, mainly due to the footprint of the API.

Documentation importing

There are a few hundred functions to document, but a large number of these are variations of the same function. The previous approach was to simply link to the upstream documentation, but it would be much better to have polished documentation attached to the functions themselves. This is the new documentation for MultMatrixd, for example. For now the documentation is being imported manually, but the final process will likely consist of some automation and some manual polishing.

Function polishing

The standard C OpenGL API can often be translated automatically (see BindBuffer or BlendColor), but in other cases the function prototype has to be tweaked to become friendly to Go. The translation tool already has good support for defining most of these tweaks independently from the rest of the machinery. For example, the following logic changes the ShaderSource function from its standard from into something convenient in Go:

name: "ShaderSource",
params: paramTweaks{
        "glstring": {rename: "source", retype: "...string"},
        "length":   {remove: true},
        "count":    {remove: true},
before: `
        count := len(source)
        length := make([]int32, count)
        glstring := make([]unsafe.Pointer, count)
        for i, src := range source {
                length[i] = int32(len(src))
                if len(src) > 0 {
                        glstring[i] = *(*unsafe.Pointer)(unsafe.Pointer(&src))
                } else {
                        glstring[i] = unsafe.Pointer(uintptr(0))

Other cases may be much simpler. The MultMatrixd tweak, for instance, simply ensures that the parameter has the proper length, and injects the documentation:

name: "MultMatrixd",
before: `
        if len(m) != 16 {
                panic("parameter m must have length 16 for the 4x4 matrix")
doc: `
        multiplies the current matrix with the provided matrix.

and as an even simpler example, CreateProgram is tweaked so that it returns a glbase.Program instead of the default uint32.

name:   "CreateProgram",
result: "glbase.Program",

That kind of polishing is where contributions would be most appreciated right now. One valid way of doing this is picking a range of functions and importing and polishing their documentation manually, and while doing that keeping an eye on required tweaks that should be performed on the function based on its documentation and prototype.

If you’d like to help somehow, or just ask questions or report your experience with the new API, please join us in the project mailing list.

Read more

As part of the on going work on Ubuntu Touch phones, I was invited to contribute a Go package to interface with ubuntuoneauth, a C++ and Qt library that authenticates against Ubuntu One using the system account made available by the phone owner. The details of that library and its use case are not interesting for most people right now, but the work to interface with it is a good example to talk about because, besides the result (uoneauth) being an external and independent Go package that extends the qml package, ubuntuoneauth is not a QML library, but rather a plain Qt library. Some of the callbacks use even traditional C++ types that do not inherit from QObject and have no Qt metadata, so offering that functionality from Go nicely takes a bit more work.

What follows are some of the highlights of that integration logic, to serve as a reference for similar extensions in the future. Note that if your interest is in creating QML applications with Go, none of this is necessary and the documentation is a better place to start.

As an initial step, the following examples demonstrate how the original C++ library and the Go package being designed are meant to be used. The first snippet contains the relevant logic taken out of the examples/signing-main.cpp file, tweaked for clarity:

int main(int argc, char *argv[]) {
    UbuntuOne::SigningExample *example = new UbuntuOne::SigningExample(&a);
    QTimer::singleShot(0, example, SLOT(doExample()));

SigningExample::SigningExample(QObject *parent) : QObject(parent) {
    QObject::connect(&service, SIGNAL(credentialsFound(const Token&)),
                     this, SLOT(handleCredentialsFound(Token)));
    QObject::connect(&service, SIGNAL(credentialsNotFound()),
                     this, SLOT(handleCredentialsNotFound()));

void SigningExample::doExample() {

void SigningExample::handleCredentialsFound(Token token) {
    QString authHeader = token.signUrl(url, QStringLiteral("GET"));
void SigningExample::handleCredentialsNotFound() {
    qDebug() << "No credentials were found.";

The example hooks into various signals in the service, one for each possible outcome, and then calls the service’s getCredentials method to initiate the process. If successful, the credentialsFound signal is emitted with a Token value that is able to sign URLs, returning an HTTP header that can authenticate a request.

That same process is more straightforward when using the library from Go:

service := uoneauth.NewService(engine)
token, err := service.Token()
if err != nil {
        return err
signature := token.HeaderSignature("GET", url)

Again, this gets a service, a token from it, and signs a URL, in a “forward” way.

So the goal is turning the initial C++ workflow into this simpler Go API. A good next step is looking into how the NewService function is implemented:

func NewService(engine *qml.Engine) *Service {
        s := &Service{reply: make(chan reply, 1)}

        qml.RunMain(func() {
                s.obj = *qml.CommonOf(C.newSSOService(), engine)

        runtime.SetFinalizer(s, (*Service).finalize)

        s.obj.On("credentialsFound", s.credentialsFound)
        s.obj.On("credentialsNotFound", s.credentialsNotFound)
        s.obj.On("twoFactorAuthRequired", s.twoFactorAuthRequired)
        s.obj.On("requestFailed", s.requestFailed)
        return s

NewService creates the service instance, and then asks the qml package to run some logic in the main Qt thread via RunMain. This is necessary because a number of operations in Qt, including the creation of objects, are associated with the currently running thread. Using RunMain in this case ensures that the creation of the C++ object performed by newSSOService happens in the main Qt thread (the “GUI thread”).

Then, the address of the C++ UbuntuOne::SSOService type is handed to CommonOf to obtain a Common value that implements all the common logic supported by C++ types that inherit from QObject. This is an unsafe operation as there’s no way for CommonOf to guarantee that the provided address indeed points to a C++ value with a type that inherits from QObject, so the call site must necessarily import the unsafe package to provide the unsafe.Pointer parameter. That’s not a problem in this context, though, since such extension packages are necessarily dealing with unsafe logic in either case.

The obtained Common value is then assigned to the service’s obj field. In most cases, that value is instead assigned to an anonymous Common field, as done in qml.Window for example. Doing so means qml.Window values implement the qml.Object interface, and may be manipulated as a generic object. For the new Service type, though, the fact that this is a generic object won’t be disclosed for the moment, and instead a simpler API will be offered.

Following the function logic further, a finalizer is then registered to ensure the C++ value gets deallocated if the developer forgets to Close the service explicitly. When doing that, it’s important to ensure the Close method drops the finalizer when called, not only to facilitate the garbage collection of the object, but also to avoid deallocating the same value twice.

The next four lines in the function should be straightforward: they register methods of the service to be called when the relevant signals are emitted. Here is the implementation of two of these methods:

func (s *Service) credentialsFound(token *Token) {
        s.sendReply(reply{token: token})

func (s *Service) credentialsNotFound() {
        s.sendReply(reply{err: ErrNoCreds})

func (s *Service) sendReply(r reply) {
        select {
        case s.reply <- r:
                panic("internal error: multiple results received")

Handling the signals consists of just sending the reply over the channel to whoever initiated the request. The select statement in sendReply just ensures that the invariant of having a reply per request is not broken without being noticed, as that would require a slightly different design.

There’s one more point worth observing in this logic: the token value received as a parameter in credentialsFound was already converted into the local Token type. In most cases, this is unnecessary as the parameter is directly useful as a qml.Object or as another native type (int, etc), but in this case UbuntuOne::Token is a plain C++ type that does not inherit from QObject, so the default signal parameter that would arrive in the Go method has only a type name and the value address.

Instead of taking the plain value, it is turned into a more useful one by registering a converter with the qml package:

func convertToken(engine *qml.Engine, obj qml.Object) interface{} {
        // Copy as the one held by obj is passed by reference.
        addr := unsafe.Pointer(obj.Property("plainAddr").(uintptr))
        token := &Token{C.tokenCopy(addr)}
        runtime.SetFinalizer(token, (*Token).finalize)
        return token

func init() {
        qml.RegisterConverter("Token", convertToken)

Given that setup, the Service.Token method may simply call the getCredentials method from the underlying UbuntuOne::SSOService method to fire a request, and block waiting for a reply:

func (s *Service) Token() (*Token, error) {
        qml.RunMain(func() {
        reply := <-s.reply
        return reply.token, reply.err

The lock ensures that a second request won’t take place before the first one is done, forcing the correct sequencing of replies. Given the current logic, this isn’t strictly necessary since all requests are equivalent, but this will remain correct even if other methods from SSOService are added to this interface.

The returned Token value may then be used to sign URLs by simply calling the respective underlying method:

func (t *Token) HeaderSignature(method, url string) string {
        cmethod, curl := C.CString(method), C.CString(url)
        cheader := C.tokenSignURL(t.addr, cmethod, curl, 0)
        defer freeCStrings(cmethod, curl, cheader)
        return C.GoString(cheader)

No care about using qml.RunMain has to be taken in this case, because UbuntuOne::Token is a simple C++ type that does not interact with the Qt machinery.

This completes the journey of creating a package that provides access to the ubuntuoneauth library from Go code. In many cases it’s a better idea to simply rewrite the logic in Go, but there will be situations similar to this library, where either rewriting would take more time than reasonable, or perhaps delegating the maintenance and stabilization of the underlying logic to a different team is the best thing to do. In those cases, an approach such as the one exposed here can easily solve the problem.

Read more
Colin Ian King

Finding small bugs

Over the past few months I've been using static code analysis tools such as cppcheck, Coverity Scan and also smatch on various open source projects.   I've generally found that most open source code is fairly well written, however, most suffer a common pattern of bugs on the error handling paths.  Typically, these are not free'ing up memory or freeing up memory incorrectly.  Other frequent bugs are not initialising variables and overly complex code paths that introduce subtle bugs when certain rare conditions are occur.  Most of these bugs are small and very rarely hit; some of these just silently do things wrong while others can potentially trigger segmentation faults.

The --force option in cppcheck to force the checking of every build configuration has been very useful in finding code paths that are rarely built, executed or tested and hence are likely to contain bugs.

I'm coming to the conclusion that whenever I have to look at some new code I should take 5 minutes or so throwing it at various static code analysis tools to see what pops out and being a good citizen and fixing these and sending these upstream. It's not too much effort and helps reduce some of those more obscure bugs that rarely bite but do linger around in code.

Read more

As recently announced, my latest endeavor at Canonical is to enable graphical client-side development with the Go language via a new qml Go package that integrates the language with Qt’s QML framework.

The QML framework solves the problem of designing graphic applications via a language that offers a convenient mix of declarative and procedural features. As a very simple example, if the following QML content is loaded by itself, it will draw a square centralized inside a window:

import QtQuick 2.0

Rectangle {
        width: 640
        height: 280
        color: "black"

        Rectangle {
                width: 250; height: 250
                anchors.centerIn: parent
                color: "red"

                MouseArea {
                        anchors.fill: parent
                        onClicked: console.log("Stop poking me!")

As expected, it would look similar to:

Centralized Rectangle

The red rectangle will remain centered as the window is resized, and will print a message every time it is clicked. The clicked signal was hooked into logic implemented in JavaScript in this case, but it might as well have been hooked into custom Go logic with the same ease using the qml package. With very minor changes, it could also be something much more interesting than a red rectangle, such as a modern web browser:

import QtQuick 2.0
import QtWebKit 3.0

Rectangle {
        width: 640
        height: 280
        color: "black"

        WebView {
                width: 250; height: 250
                anchors.centerIn: parent
                url: ""

The result would be:

Web Page Rectangle

It’s worth noting that this wasn’t just a screenshot of a web page, but rather a tiny fully functional web browser accessing a web page, easily embedded into the QML application to satisfy whatever browsey needs the author might have.

Being able to leverage all these existing building blocks so comfortably is what makes a graphics platform attractive to develop in, and is what inspired the on going effort to have that platform working under the Go language. As we can observe on some of the work published, that side of things seems to be going well.

Now, the next level up is to enable people to create such building blocks without leaving the Go language. The initial step towards that is already committed to the code repository. It enables Go types to be created and seamlessly integrated into the QML language. For example, consider this simple Go type:

type GoType struct {
        Text string

func (v *GoType) OnTextChanged() {
        fmt.Println("Text changed...")

If this type is made available to QML content via the qml.RegisterTypes function, it may then be used as any other native type. This would work as a QML file, for instance:

import GoExtensions 1.0

GoType {
        text: "Have you signed up for GopherCon yet?"

There’s a relevant detail, though: this type has no visible content by itself. This means that displaying something must be done via interactions with other QML items, or via external systems (dbus, for example).

Solving that problem has been one of my objectives in the last month, and although it’s not yet publicly available, the work is in a good-enough state that I feel comfortable talking about it.

As described, the main goal is enabling these custom types to paint. This will be achieved by exposing a Go package that offers an OpenGL API, and defining methods that the custom types have to implement for being able to render at appropriate times. Although the details aren’t finalized, the current draft can run familiar OpenGL code similar to the following:

func (v *GoType) Paint(p *qml.Painter) {
        obj := p.Object()
        width := gl.Float(obj.Int("width"))
        height := gl.Float(obj.Int("height"))

        gl.BlendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);
        gl.Color4f(1.0, 1.0, 1.0, 0.8)
        gl.Vertex2f(0, 0)
        gl.Vertex2f(width, 0)
        gl.Vertex2f(width, height)
        gl.Vertex2f(0, height)

        gl.Color4f(0.0, 0.0, 0.0, 1.0)
        gl.Vertex2f(0, 0)
        gl.Vertex2f(width, height)
        gl.Vertex2f(width, 0)
        gl.Vertex2f(0, height)

One interesting point to realize is that, again, this isn’t just rendering into a lonely OpenGL context. The rendered content goes into a framebuffer that lives within the overall QML scene graph, and has the same integration capabilities as every other QML item.

In the following animated image, for example, the white square was generated with the above GL code, and was rendered next to other native items with this QML source:

Go + OpenGL Example

Note the smooth blending and proper overlapping established in the scene graph based on the ordering of elements in the QML source. The animation was also driven by QML without special handling of the content rendered from Go.

Another relevant point in terms of the integration with Go is that the GL code demonstrated above is considered old-school these days. The modern version uses fewer calls, making better use of the graphics card memory by transferring, tracking, and handling data such as vertexes in contiguous arrays. This reduces the impact of the cross-context calls that depart and rejoin the Go runtime, and can unlock interesting use cases that the overhead might otherwise prevent.

In terms of availability of these features, we’re about to enter a holiday season, so they should be polished enough for a first public review at some point in January. Looking forward to it.

UPDATE (2014-01-27): These features are now publicly available. The painting example demonstrates the exact scenario pictured above.

UPDATE (2014-02-18): New screencast demonstrates some of the OpenGL features of Go QML.

Read more
Colin Ian King

Infinite Snowflake

So it's close to the Christmas Holiday season, so I thought I would spend some time writing something seasonal.   My son, who is rather mathematically minded, was asking me about finding reoccurring digits in reciprocals of primes, for example 1/7, 1/11, 1/13 etc., and somehow this got me looking at the binary digits of Pi, and after a little more browsing around wikipedia onto the Thue-Morse binary sequence.

The Thue-Morse binary sequence starts with zero and one successively appends to the existing sequence the boolean compliment of the sequence so far.   One interesting feature is that a turtle graphics program can be written to control the turtle by feeding it the successive digits from the sequence so that:

  • A zero moves the cursor forward by a step
  • A one turns the cursor anti-clockwise by 60 degress (or pi/3 radians)
 ..and a Koch snowflake is drawn.  So, I've implemented this in C and obfuscated it a little just for fun:

#include <math.h>
#define Q (1<<9)
#define M /**/for
#define E(b,a) {\
J q=1^(X+b)[3+Q*\
int /*/*/ typedef
#define W double
#define z (Q<<5)

J;J X[3+(Q *Q)]
,t[z ] ;J main(
){ W o= Q >>1,V=
o/5, a=0; M (1 [X
]=1; X[1] <z;X
[1 ] *=2) for(
X[2] =0;X [2]<
X[1] ;X[2 ]++,
t[X[ 2]+X [1]]
=1^t [X[2 ]]);
for( 0[X] =0;X
[0]< 3;0[ X]++
)for (X[2] =0;X
[2]< z;X[ 2]++
)if( t[X[ 2]])
a-=( M_PI /3.0
);else{o+= cos(a ); V+=

sin(a);X[3+( int)o+Q*(int)V]|=1;}printf(
"P6 %d %d 1 ",Q,Q);M(1[X] =0;X[1]<Q;X[1]
++)M(2[X]=0;X[ 2]<Q;X[2]++)E(2[X],X[1]);
return 0;} /* Colin Ian King 2013 */

To build and run:
gcc koch-binary.c -lm -o koch-binary 
./koch-binary | ppmtojpeg > koch-binary.jpg
And the result is a koch-snowflake:

The original source code can be found here. It contains a 512 x 512 turtle graphics plotter, a Thue-Morse binary sequence generator and a PPM output backend. Anyway, have fun figuring out how it works (it is only mildly obfuscated) and have a great Christmas!

Read more

Here is a small programming brain teaser for the weekend:

Assume uf is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.

The questions are:

1. How to tell if uf represents an integer number?

2. How to serialize the absolute value of such an integer number in the minimum number of bytes possible, using big-endian ordering and the 8th bit as a continuation flag? For example, float64(1<<70 + 3<<21) serializes as:


The background for this problem is that the current draft of the strepr specification mentions that serialization. Some languages, such as Python and Ruby, implement transparent arbitrary precision integers, and that makes implementing the specification easier.

For example, here is a simple Python interactive session that arrives at the result provided above exploring the native integer representation.

>>> f = float((1<<70) + (3<<21))
>>> v = int(f)
>>> l = [v&0x7f]
>>> v >>= 7
>>> while v > 0:
...     l.append(0x80 | (v&0x7f))
...     v >>= 7
>>> l.reverse()
>>> "".join("%02x" % i for i in l)

Python makes the procedure simpler because it is internally converting the float into an integer of appropriate precision via standard C functions, and then offering bit operations on the resulting value.

The suggested brain teaser can be efficiently solved using just the IEEE-754 representation, though, and it’s relatively easy because the problem is being constrained to the integer space.

A link to an implementation will be provided next week.

UPDATE: The logic is now available as part of the reference implementation of strepr.

Read more

Note: This is a candidate version of the specification. This note will be removed once v1 is closed, and any changes will be described at the end. Please get in touch if you’re implementing it.



This specification defines strepr, a stable representation that enables computing hashes and cryptographic signatures out of a defined set of composite values that is commonly found across a number of languages and applications.

Although the defined representation is a serialization format, it isn’t meant to be used as a traditional one. It may not be seen entirely in memory at once, or written to disk, or sent across the network. Its role is specifically in aiding the generation of hashes and signatures for values that are serialized via other means (JSON, BSON, YAML, HTTP headers or query parameters, configuration files, etc).

The format is designed with the following principles in mind:

Understandable — The representation must be easy to understand to increase the chances of it being implemented correctly.

Portable — The defined logic works properly when the data is being transferred across different platforms and implementations, independently from the choice of protocol and serialization implementation.

Unambiguous — As a natural requirement for producing stable hashes, there is a single way to process any supported value being held in the native form of the host language.

Meaning-oriented — The stable representation holds the meaning of the data being transferred, not its type. For example, the number 7 must be represented in the same way whether it’s being held in a float64 or in an uint16.

Supported values

The following values are supported:

  • nil: the nil/null/none singleton
  • bool: the true and false singletons
  • string: raw sequence of bytes
  • integers: positive, zero, and negative integer numbers
  • floats: IEEE754 binary floating point numbers
  • list: sequence of values
  • map: associative value→value pairs


nil = 'z'

The nil/null/none singleton is represented by the single byte 'z' (0x7a).

bool = 't' / 'f'

The true and false singletons are represented by the bytes 't' (0x74) and 'f' (0x66), respectively.

unsigned integer = 'p' <value>

Positive and zero integers are represented by the byte 'p' (0x70) followed by the variable-length encoding of the number.

For example, the number 131 is always represented as {0x70, 0x81, 0x03}, independently from the type that holds it in the host language.

negative integer = 'n' <absolute value>

Negative integers are represented by the byte 'n' (0x6e) followed by the variable-length encoding of the absolute value of the number.

For example, the number -131 is always represented as {0x6e, 0x81, 0x03}, independently from the type that holds it in the host language.

string = 's' <num bytes> <bytes>

Strings are represented by the byte 's' (0x73) followed by the variable-length encoding of the number of bytes in the string, followed by the specified number of raw bytes. If the string holds a list of Unicode code points, the raw bytes must contain their UTF-8 encoding.

For example, the string hi is represented as {0x73, 0x02, 'h', 'i'}

Due to the complexity involved in Unicode normalization, it is not required for the implementation of this specification. Consequently, Unicode strings that if normalized would be equal may have different stable representations.

binary float = 'd' <binary64>

32-bit or 64-bit IEEE754 binary floating point numbers that are not holding integers are represented by the byte 'd' (0x64) followed by the big-endian 64-bit IEEE754 binary floating point encoding of the number.

There are two exceptions to that rule:

1. If the floating point value is holding a NaN, it must necessarily be encoded by the following sequence of bytes: {0x64, 0x7f, 0xf8, 0x00 0x00, 0x00, 0x00, 0x00, 0x00}. This ensures all NaN values have a single representation.

2. If the floating point value is holding an integer number it must instead be encoded as an unsigned or negative integer, as appropriate. Floating point values that hold integer numbers are defined as those where floor(v) == v && abs(v) != ∞.

For example, the value 1.1 is represented as {0x64, 0x3f, 0xf1, 0x99, 0x99, 0x99, 0x99, 0x99, 0x9a}, but the value 1.0 is represented as {0x70, 0x01}, and -0.0 is represented as {0x70, 0x00}.

This distinction means all supported numbers have a single representation, independently from the data type used by the host language and serialization format.

list = 'l' <num items> [<item> ...]

Lists of values are represented by the byte 'l' (0x6c), followed by the variable-length encoding of the number of pairs in the list, followed by the stable representation of each item in the list in the original order.

For example, the value [131, -131] is represented as {0x6c, 0x70, 0x81, 0x03, 0x6e, 0x81, 0x03, 0x65}

map = 'm' <num pairs> [<item key> <item value>  ...]

Associative maps of values are represented by the byte 'm' (0x6d) followed by the variable-length encoding of the number of pairs in the map, followed by an ordered sequence of the stable representation of each key and value in the map. The pairs must be sorted so that the stable representation of the keys is in ascending lexicographical order. A map must not have multiple keys with the same representation.

For example, the map {"a": 4, 5: "b"} is always represented as {0x6d, 0x02, 0x70, 0x05, 0x73, 0x01, 'b', 0x73, 0x01, 'a', 0x70, 0x04}.

Variable-length encoding

Integers are variable-length encoded so that they can be represented in short space and with unbounded size. In an encoded number, the last byte holds the 7 least significant bits of the unsigned value, and zero as the eight bit. If there are remaining non-zero bits, the previous byte holds the next 7 bits, and the eight bit is set on to flag the continuation to the next byte. The process continues until there are non-zero bits remaining. The most significant bits end up in the first byte of the encoded value, which must necessarily not be 0x80.

For example, the number 128 is variable-length encoded as {0x81, 0x00}.

Reference implementation

A reference implementation is available, including a test suite which should be considered when implementing the specification.


draft1 → draft2

  • Enforce the use of UTF-8 for Unicode strings and explain why normalization is being left out.
  • Enforce a single NaN representation for floats.
  • Explain that map key uniqueness refers to the representation.
  • Don’t claim the specification is easy to implement; floats require attention.
  • Mention reference implementation.

Read more

A few years ago, when I started pondering about the possibility of porting juju to the Go language, one of the first pieces of the puzzle that were put in place was goyaml: a Go package to parse and serialize a yaml document. This was just an experiment and, as a sane route to get started, a Go layer that does all the language-specific handling was written on top of the libyaml C scanner, parser, and serializer library.

This was a good initial plan, but for a number of reasons the end goal was always to have a pure Go implementation. Having a C layer in a Go program slows down builds significantly due to the time taken to build the C code, makes compiling in other platforms and cross-compiling harder, has certain runtime penalties, and also forces the application to drop the memory safety guarantees offered by Go.

For these reasons, over the last couple of weeks I took a few hours a day to port the C backend to Go. The total time, considering full time work days, would be equivalent to about a week worth of work.

The work started on the scanner and parser side of the library. This took most of the time, not only because it encompassed more than half of the code base, but also because the shared logic had to be ported too, and there was a need to understand which patterns were used in the old code and how they would be converted across in a reasonable way.

The whole scanner and parser plus header files, or around 5000 code lines of C, were ported over in a single shot without intermediate runs. To steer the process in a sane direction, gofmt was called often to reformat the converted code, and then the project was compiled every once in a while to make sure that the pieces were hanging together properly enough.

It’s worth highlighting how useful gofmt was in that process. The C code was converted in the most convenient way to type it, and then gofmt would quickly put it all together in a familiar form for analysis. Not rarely, it would also point out trivial syntactic issues. A double win.

After the scanner and parser were finally converted completely, the pre-existing Go unmarshaling logic was shifted to the new pure implementation, and the reading side of the test suite could run as-is. Naturally, though, it didn’t work out of the box.

To quickly pick up the errors in the new implementation, the C logic and the Go port were put side-by-side to run the same tests, and tracing was introduced in strategic points of the scanner and parser. With that, it was easy to spot where they diverged and pinpoint the human errors.

It took about two hours to get the full suite to run successfully, with a handful of bugs uncovered. Out of curiosity, the issues were:

  • An improperly dropped parenthesis affected the precedence of an expression
  • A slice was being iterated with copying semantics where a reference was necessary
  • A pointer arithmetic conversion missed the base where there was base+offset addressing
  • An inner scoped variable improperly shadowed the outer scope

The same process of porting and test-fixing was then repeated on the the serializing side of the project, in a much shorter time frame for the reasons cited.

The resulting code isn’t yet idiomatic Go. There are several signs in it that it was ported over from C: the name conventions, the use of custom solutions for buffering and reader/writer abstractions, the excessive copying of data due to the need of tracking data ownership so the simple deallocating destructors don’t double-free, etc. It’s also been deoptimized, due to changes such as the removal of macros and in many cases its inlining, and the direct expansion of large unions which causes some core objects to grow significantly.

At this point, though, it’s easy to gradually move the code base towards the common idiom in small increments and as time permits, and cleaning up those artifacts that were left behind.

This code will be made public over the next few days via a new goyaml release. Meanwhile, some quick facts about the process and outcome follows.

Lines of code

According to cloc, there was a total of 7070 lines of C code in .c and .h files. Of those, 6727 were ported, and 342 were 12 functions that were left unconverted as being unnecessary right now. Those 6727 lines of C became 5039 lines of Go code in a mostly one-to-one dumb translation.

That difference comes mainly from garbage collection, lack of forward declarations, standard helpers such as append, range-based for loops, first class slice type with length and capacity, internal OOM handling, and so on.

Future work code can easily increase the difference further by replacing some of the logic ported with more sensible options available in Go, such as standard abstractions for readers and writers, buffered writing support as availalbe in the standard library, etc.

Code clarity and safety

In the specific context of the work done, which is of a scanner, parser and serializer, the slice abstraction is responsible for noticeable clarity gains in the code, when compared to the equivalent logic based on pointer arithmetic. It also gives a much more comforting guarantee of correctness of the written code due to bound-checking.


While curious, this shouldn’t be taken as a performance comparison between the two languages, as it is comparing a fine tuned C implementation with something that is worse than a direct one-to-one port: not only it hasn’t seen any time at all on preventing waste, but the original logic was deoptimized due to changes such as the removal of inlining macros and the expansion of large unions. There are many obvious changes to be done for improving performance.

With that out of the way, in a simple decoding benchmark the C-backed decoder runs on about 37% of the time taken by the out-of-the-box deoptimized Go port.

Output size

The previous goyaml.a Go package file had 1463kb. The new one has 1016kb. This difference includes glue code generated for the integration.

Considering only the .c and .h files involved in the port, the C object code generated with the standard flags used by the go build tool (-g -O2) sums up to 789kb. The equivalent Go code with the standard settings compiles to 664kb. The 12 functions not ported are also part of that difference, so the difference is pretty much negligible.

Build time

Building the 8 .c files alone takes 3.6 seconds with the standard flags used by the go build tool (-g -O2). After the port, building the entire Go project with the standard settings takes 0.3 seconds.

Mechanical changes

Many of the mechanical changes were done using regular expressions. Excluding the trivial ones, about a dozen regular expressions were used to swap variable and type names, drop parenthesis, place brackets in the right locations, convert function declarations, and so on.

Read more

Last week I was part of a rant with a couple of coworkers around the fact Go handles errors for expected scenarios by returning an error value instead of using exceptions or a similar mechanism. This is a rather controversial topic because people have grown used to having errors out of their way via exceptions, and Go brings back an improved version of a well known pattern previously adopted by a number of languages — including C — where errors are communicated via return values. This means that errors are in the programmer’s face and have to be dealt with all the time. In addition, the controversy extends towards the fact that, in languages with exceptions, every unadorned error comes with a full traceback of what happened and where, which in some cases is convenient.

All this convenience has a cost, though, which is rather simple to summarize:

Exceptions teach developers to not care about errors.

A sad corollary is that this is relevant even if you are a brilliant developer, as you’ll be affected by the world around you being lenient towards error handling. The problem will show up in the libraries that you import, in the applications that are sitting in your desktop, and in the servers that back your data as well.

Raymond Chen described the issue back in 2004 as:

Writing correct code in the exception-throwing model is in a sense harder than in an error-code model, since anything can fail, and you have to be ready for it. In an error-code model, it’s obvious when you have to check for errors: When you get an error code. In an exception model, you just have to know that errors can occur anywhere.

In other words, in an error-code model, it is obvious when somebody failed to handle an error: They didn’t check the error code. But in an exception-throwing model, it is not obvious from looking at the code whether somebody handled the error, since the error is not explicit.
When you’re writing code, do you think about what the consequences of an exception would be if it were raised by each line of code? You have to do this if you intend to write correct code.

That’s exactly right. Every line that may raise an exception holds a hidden “else” branch for the error scenario that is very easy to forget about. Even if it sounds like a pointless repetitive task to be entering that error handling code, the exercise of writing it down forces developers to keep the alternative scenario in mind, and pretty often it doesn’t end up empty.

It isn’t the first time I write about that, and given the controversy that surrounds these claims, I generally try to find one or two examples that bring the issue home. So here is the best example I could find today, within the pty module of Python’s 3.3 standard library:

def spawn(argv, master_read=_read, stdin_read=_read):
    """Create a spawned process."""
    if type(argv) == type(''):
        argv = (argv,)
    pid, master_fd = fork()
    if pid == CHILD:
        os.execlp(argv[0], *argv)

Every time someone calls this logic with an improper executable in argv there will be a new Python process lying around, uncollected, and unknown to the application, because execlp will fail, and the process just forked will be disregarded. It doesn’t matter if a client of that module catches that exception or not. It’s too late. The local duty wasn’t done. Of course, the bug is trivial to fix by adding a try/except within the spawn function itself. The problem, though, is that this logic looked fine for everybody that ever looked at that function since 1994 when Guido van Rossum first committed it!

Here is another interesting one:

$ make clean
Sorry, command-not-found has crashed! Please file a bug report at:

Please include the following information with the report:

command-not-found version: 0.3
Python version: 3.2.3 final 0
Distributor ID: Ubuntu
Description:    Ubuntu 13.04
Release:        13.04
Codename:       raring
Exception information:

unsupported locale setting
Traceback (most recent call last):
  File "/.../CommandNotFound/", line 24, in crash_guard
  File "/usr/lib/command-not-found", line 69, in main
  File "/usr/lib/command-not-found", line 40, in enable_i18n
    locale.setlocale(locale.LC_ALL, '')
  File "/usr/lib/python3.2/", line 541, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

That’s a pretty harsh crash for the lack of locale data in a system-level application that is, ironically, supposed to tell users what packages to install when commands are missing. Note that at the top of the stack there’s a reference to crash_guard. This function has the intent of catching all exceptions right at the edge of the call stack, and displaying a detailed system specification and traceback to aid in fixing the problem.

Such “parachute catching” is a fairly common pattern in exception-oriented programming and tends to give developers the false sense of having good error handling within the application. Rather than actually guarding the application, though, it’s just a useful way to crash. The proper thing to have done in the case above would be to print a warning, if at all, and then let the program run as usual. This would have been achieved by simply wrapping that one line as in:

    locale.setlocale(locale.LC_ALL, '')
except Exception as e:
    print("Cannot change locale:", e)

Clearly, it was easy to handle that one. The problem, again, is that it was very natural to not do it in the first place. In fact, it’s more than natural: it actually feels good to not be looking at the error path. It’s less code, more linear, and what’s left is the most desired outcome.

The consequence, unfortunately, is that we’re immersing ourselves in a world of brittle software and pretty whales. Although more verbose, the error result style builds the correct mindset: does that function or method have a possible error outcome? How is it being handled? Is that system-interacting function not returning an error? What is being done with the problem that, of course, can happen?

A surprising number of crashes and plain misbehavior is a result of such unconscious negligence.

Read more

This weekend the proper environment settled out for sorting a pet peeve that shows up every once in a while when coding: writing logic that interacts with other applications in the system via their stdin and stdout streams is often more involved than it should be, which seems pretty ironic when sitting in front of a Unix-like system.

Rather than going over the trouble of setting up pipes and hooking them up in a custom way, often applications end up just delegating the job to /bin/sh, which is not ideal for a number of reasons: argument formatting isn’t straightforward, injecting custom application-defined logic is hard, which means even simple tasks that might be easily achieved by the language end up shelling out to further external applications, and so on.

In an attempt to address that, I’ve spent some time working on an experimental Go package that is being released today: pipe.

I hope you like it as well, and please drop me a note if you find any issues.

Read more

?Rob Pike just wrote an article/talk that is the best background on the origins of Go yet.

It surprises me how much his considerations match my world view pre-Go, and in a sense give me a fulfilling explanation about why I got hooked into the language. I still recall sitting in a hotel years ago with Jamu Kakar while we went through the upcoming C++0x standard (now C++11) and got perplexed about how someone could think that having details such as rvalue references and move constructors into the language specification was something reasonable.

Rob also expressed again the initial surprise that developers using languages such as Python and Ruby were more often the ones willing to migrate towards Go, rather than ones using C++, with some reasonable explanations about why that is so. While I agree with his considerations, I see Python going through the same kind of issue that caused C++ to be what it is today.

Consider this excerpt from PEP 0380 as evidence:

If yielding of values is the only concern, this can be performed without much difficulty using a loop such as

for v in g:
    yield v

However, if the subgenerator is to interact properly with the caller in the case of calls to send(), throw() and close(), things become considerably more difficult. As will be seen later, the necessary code is very complicated, and it is tricky to handle all the corner cases correctly.

A new syntax will be proposed to address this issue. In the simplest use cases, it will be equivalent to the above for-loop, but it will also handle the full range of generator behaviour, and allow generator code to be refactored in a simple and straightforward way.

This description has the same DNA that creates the C++ problem Rob talks about. Don’t get me wrong, I’m sure yield from will make a lot of people very happy, and that’s exactly the tricky part. It’s easy and satisfying to please a selection of users, but often that leads to isolated solutions that create new cognitive load and new corner cases that in turn lead to new requirements.

The history of generators in Python is specially telling:

  • PEP 0234 [30-Jan-2001] – Iterators – Accepted
  • PEP 0255 [18-May-2001] – Simple Generators – Accepted
  • PEP 0288 [21-Mar-2002] – Generators Attributes and Exceptions – Withdrawn
  • PEP 0289 [30-Jan-2002] – Generator Expressions – Accepted
  • PEP 0325 [25-Aug-2003] – Resource-Release Support for Generators – Rejected
  • PEP 0342 [10-May-2005] – Coroutines via Enhanced Generators – Accepted
  • PEP 0380 [13-Feb-2009] – Syntax for Delegating to a Subgenerator – Accepted

You see the rabbit hole getting deeper? I’ll clarify it further by rephrasing the previous quote from PEP 0380:

If [feature from PEP 0255] is the only concern, this can be performed without much difficulty using a loop [...] However, if the subgenerator is to interact properly with [changes from PEP 0342] things become considerably more difficult. [So we need feature from PEP 0380.]

Yet, while the language grows handling self-inflicted micro-problems, the real issue is still not solved. All of these features are simplistic forms of concurrency and communication, that don’t satisfy the developers, causing community fragmentation.

This happened to C++, to Python, and to many other languages. Go seems slightly special in that regard in the sense that its core development team has an outstanding respect for simplicity, yet dares to solve the difficult problems at their root, while keeping these solutions orthogonal so that they support each other. Less is more, and is not always straightforward.

Read more
Colin Ian King

open() using O_WRONLY | O_RDWR

One of the lesser known Linux features is that one can open a file with the flags O_WRONLY | O_RDWR.   One requires read and write permission to perform the open(), however, the flags indicate that no reading or writing is to be done on the file descriptor.   It is useful for operations such as ioctl() where we also want to ensure we don't actually do any reading or writing to a device.  A bunch of utilities such as LILO seem to use this obscure feature. 

LILO defines these flags as O_NOACCESS as follows:

 #ifdef O_ACCMODE  
 /* open a file for "no access" */  
 # define O_NOACCESS 3  
 #endif in this example, you may find these flags more widely known as O_NOACCESS even though they are not defined in the standard fcntl.h headers.

Below is a very simple example of the use of O_WRONLY | O_RDWR:

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <unistd.h>  
 #include <sys/ioctl.h>  
 #include <fcntl.h>  
 int main(int argc, char **argv)  
      int fd;  
      struct winsize ws;  
      if ((fd = open("/dev/tty", O_WRONLY | O_RDWR)) < 0) {  
           perror("open /dev/tty failed");  
      if (ioctl(fd, TIOCGWINSZ, &ws) == 0)  
           printf("%d x %d\n", ws.ws_row, ws.ws_col);  

It is a little arcane and not portable but also an interesting feature to know about.

Read more
Gustavo Niemeyer

About 1 year after development started in Ensemble, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we’ve been jointly planning for a long time. This is the system that the Ensemble client will communicate with for discovering which formulas are available, for publishing new formulas, for obtaining formula files for deployment, and so on.

We of course would have liked for this part of the project to have been specified and written a while ago, but unfortunately that wasn’t possible for several reasons. That said, there are also good sides of having an important piece flying around in minds and conversations for such a long time: sitting down to specify the system and describe the inner-working details has been a breeze. Even details such as the namespacing of formulas, which hasn’t been entirely clear in my mind, was just streamed into the document as the ideas we’ve been evolving finally got together in a written form.

One curious detail: this is the first long term project at Canonical that will be developed in Go, rather than Python or C/C++, which are the most used languages for projects within Canonical. Not only that, but we’ll also be using MongoDB for a change, rather than the traditional PostgreSQL, and will also use (you guessed) the mgo driver which I’ve been pushing entirely as a personal project for about 8 months now.

Naturally, with so many moving parts that are new to the company culture, this is still being seen as a closely watched experiment. Still, this makes me highly excited, because when I started developing mgo, the MongoDB driver for Go, my hopes that the Go, MongoDB, and mgo trio would eventually be used at Canonical were very low, precisely because they were all alien to the culture. We only got here after quite a lot of internal debate, experiments, and trust too.

All of that means these are happy times. Important feature in Ensemble being specified and written, very exciting tools, home grown software being useful..


Read more

Circular buffers are based on an algorithm well known by any developer who’s got past the “Hello world!” days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.

One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of constant size. This is an unwanted property when one is attempting to consume items from the buffer and it is not an option to blindly drop items, for instance.

This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It’s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I’ll publish it below and let you decide.

Traditional circular buffers

Before introducing the variant which can actually expand during use, let’s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept. All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.

So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur. The following snippet demonstrates the concept in practice:

buf = [None, None, None, None, None]
bufcap = len(buf)
pushi = 0   

for elem in range(7):
    buf[pushi] = elem
    pushi = (pushi + 1) % bufcap
print buf # => [5, 6, 2, 3, 4]

In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That’s the specific feature of circular buffers which the proposal in this post will offer an alternative for.

The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur. Please note that these snippets are not necessarily idiomatic Python. The intention is to highlight the algorithm itself.

class CircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.pushi = self.popi = 0
        self.bufcap = len(self.buf)

    def push(self, x):
        assert self.buflen == 0 or self.pushi != self.popi, 
               "Buffer overflow!"
        self.buf[self.pushi] = x
        self.pushi = (self.pushi + 1) % self.bufcap
        self.buflen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.popi = (self.popi + 1) % self.bufcap
        return x

With the basics covered, let’s look at how to extend this algorithm to support dynamic expansion in case of overflows.

Dynamically expanding a circular buffer

The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as the ring area from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again. To clarify what this means, some examples are presented below. The full algorithm will be presented afterwards.

First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:

for i in range(5):

circbuf.pop() # => 0
circbuf.pop() # => 1


print circbuf.buf # => [5, 6, 2, 3, 4]

At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer. Following the example, pushing two additional elements would behave the following way:


print circbuf.buf # => [5, 6, 2, 3, 4, 7, 8]

In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let’s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:

circbuf.pop() # => 2
circbuf.pop() # => 3

print circbuf.buf # => [5, 6, None, None, 4, 7, 8, 9]

In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area. That’s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).

The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area. This will happen when the content of the previous partial ring area is fully consumed, as shown below:

circbuf.pop() # => 4
circbuf.pop() # => 5
circbuf.pop() # => 6

print circbuf.buf # => [10, None, None, None, None, 7, 8, 9]

At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.

Sample algorithm

With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that’s inherent to the problem being solved.

class ExpandingCircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringlen == self.ringcap or 
           self.ringcap != self.bufcap:
            self.buflen += 1
            self.bufcap += 1
            if self.pushi == 0: # Optimization.
                self.ringlen = self.buflen
                self.ringcap = self.bufcap
            self.buf[self.pushi] = x
            self.pushi = (self.pushi + 1) % self.ringcap
            self.buflen += 1
            self.ringlen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.pushi = 0
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
            self.popi = (self.popi + 1) % self.ringcap
        return x

Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation. In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion. Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.

UPDATE: Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.

class ExpandingCircBuf2(object):

    def __init__(self):
        self.buf = []
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringcap != self.bufcap:
            expandbuf = (self.pushi == 0)
            expandring = False
        elif self.ringcap == self.ringlen:
            expandbuf = True
            expandring = (self.pushi == 0)
            expandbuf = False
            expandring = False

        if expandbuf:
            self.pushi = self.bufcap
            expansion = [None, None, None]
            self.bufcap += len(expansion)
            if expandring:
                self.ringcap = self.bufcap

        self.buf[self.pushi] = x
        self.buflen += 1
        if self.pushi < self.ringcap:
            self.ringlen += 1
        self.pushi = (self.pushi + 1) % self.bufcap

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
            self.popi = (self.popi + 1) % self.ringcap
        return x


This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics. When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.

Read more
Gustavo Niemeyer

ZooKeeper is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic IaaS deployments which we push at Canonical), so it was a natural choice to experiment with.

Gozk is a complete binding for ZooKeeper which explores the native features of Go to facilitate the interaction with a ZooKeeper server. To avoid reimplementing the well tested bits of the protocol in an unstable way, Gozk is built on top of the standard C ZooKeeper library.

The experience of integrating ZooKeeper with Go was certainly valuable on itself, and worked as a nice way to learn the details of integrating the Go language with a C library. If you’re interested in learning a bit about Go, ZooKeeper, or other details related to the creation of bindings and asynchronous programming, please fasten the seatbelt now.

Basics of C wrapping in Go

Creating the binding on itself was a pretty interesting experiment already. I have worked on the creation of quite a few bindings and language bridges before, and must say I was pleasantly surprised with the experience of creating the Go binding. With Cgo, the name given to the “foreign function interface” mechanism for C integration, one basically declares a special import statement which causes a pre-processor to look at the comment preceding it. Something similar to this:

// #include <zookeeper.h>
import "C"

The comment doesn’t have to be restricted to a single line, or to #include statements even. The C code contained in the comment will be transparently inserted into a helper C file which is compiled and linked with the final object file, and the given snippet will also be parsed and inclusions processed. In the Go side, that “C” import is simulated as if it were a normal Go package so that the C functions, types, and values are all directly accessible.

As an example, a C function with this prototype:

int zoo_wexists(zhandle_t *zh, const char *path, watcher_fn watcher,
                void *context, struct Stat *stat);

In Go may be used as:

cstat := C.struct_Stat{}
rc, cerr := C.zoo_wexists(zk.handle, cpath, nil, nil, &cstat)

When the C function is used in a context where two result values are requested, as done above, Cgo will save the well known errno variable after the function has finished executing and will return it wrapped into an os.Errno value.

Also, note how the C struct is defined in a way that can be passed straight to the C function. Interestingly, the allocation of the memory backing the structure is going to be performed and tracked by the Go runtime, and will be garbage collected appropriately once no more references exist within the Go runtime. This fact has to be kept in mind since the application will crash if a value allocated normally within Go is saved with a foreign C function and maintained after all the Go references are gone. The alternative in these cases is to call the usual C functions to get hold of memory for the involved values. That memory won’t be touched by the garbage collector, and, of course, must be explicitly freed when no longer necessary. Here is a simple example showing explicit allocation:

cbuffer := (*C.char)(C.malloc(bufferSize))

Note the use of the defer statement above. Even when dealing with foreign functionality, it comes in handy. The above call will ensure that the buffer is deallocated right before the current function returns, for instance, so it’s a nice way to ensure no leaks happen, even if in the future the function suddenly gets a new exit point which didn’t consider the allocation of resources.

In terms of typing, Go is more strict than C, and Cgo-based logic will also ensure that the types returned and passed into the foreign C functions are correctly typed, in the same way done for the native types. Note above, for instance, how the call to the free() function has to explicitly convert the value into an unsafe.Pointer, even though in C no casting would be necessary to pass a pointer into a void * parameter.

The unsafe.Pointer is in fact a very special type within Go. Using it, one can convert any pointer type into any other pointer type in an unsafe way (thus the package name), and also back and forth into a uintptr value with the address of the memory referenced by the pointer. For every other type conversion, Go will ensure at compilation time that doing the conversion at runtime is a safe operation.

With all of these resources, including the ability to use common Go syntax and functionality even when dealing with foreign types, values, and function calls, the integration task turns out to be quite a pleasant experience. That said, some of the things may still require some good thinking to get right, as we’ll see shortly.

Watch callbacks and channels

One of the most interesting (and slightly tricky) aspects of mapping the ZooKeeper concepts into Go was the “watch” functionality. ZooKeeper allows one to attach a “watch” to a node so that the server will report back when changes happen to the given node. In the C library, this functionality is exposed via a callback function which is executed once the monitored node aspect is modified.

It would certainly be possible to offer this functionality in Go using a similar mechanism, but Go channels provide a number of advantages for that kind of asynchronous notification: waiting for multiple events via the select statement, synchronous blocking until the event happens, testing if the event is already available, etc.

The tricky bit, though, isn’t the use of channels. That part is quite simple. The tricky detail is that the C callback function execution happens in a C thread started by the ZooKeeper library, and happens asynchronously, while the Go application is doing its business elsewhere. Right now, there’s no straightforward way to transfer the execution of this asynchronous C function back into the Go land. The solution for this problem was found with some help from the folks at the golang-nuts mailing list, and luckily it’s not that hard to support or understand. That said, this is a good opportunity to get some coffee or your preferred focus-enhancing drink.

The solution works like this: when the ZooKeeper C library gets a watch notification, it executes a C callback function which is inside a Gozk helper file. Rather than transferring control to Go right away, this C function simply appends data about the event onto a queue, and signals a pthread condition variable to notify that an event is available. Then, on the Go side, once the first ZooKeeper connection is initialized, a new goroutine is fired and loops waiting for events to be available. The interesting detail about this loop, is that it blocks within a foreign C function waiting for an event to be available, through the signaling of the shared pthread condition variable. In the Go side, that’s how the call looks like, just to give a more practical feeling:

// This will block until there's a watch available.
data := C.wait_for_watch()

Then, on the C side, here is the function definition:

watch_data *wait_for_watch() {
    watch_data *data = NULL;
    if (first_watch == NULL)
        pthread_cond_wait(&watch_available, &watch_mutex);
    data = first_watch;
    first_watch = first_watch->next;
    return data;

As you can see, not really a big deal. When that kind of blocking occurs inside a foreign C function, the Go runtime will correctly continue the execution of other goroutines within other operating system threads.

The result of this mechanism is a nice to use interface based on channels, which may be explored in different ways depending on the application needs. Here is a simple example blocking on the event synchronously, for instance:

stat, watch, err := zk.ExistsW("/some/path")
if stat == nil && err == nil {
    event := <-watch
    // Use event ...


Those were some of the interesting aspects of implementing the ZooKeeper binding. I would like to speak about some additional details, but this post is rather long already, so I'll keep that for a future opportunity. The code is available under the LGPL, so if you're curious about some other aspect, or would like to use ZooKeeper with Go, please move on and check it out!

Read more