Canonical Voices

Posts tagged with 'debugging'


So there I was. I did have to use a proprietary library, for which I had no sources and no real hope of support from the creators. I built my program against it, I ran it, and I got a segmentation fault. An exception that seemed to happen inside that insidious library, which was of course stripped of all debugging information. I scratched my head, changed my code, checked traces, tried valgrind, strace, and other debugging tools, but found no obvious error. Finally, I assumed that I had to dig deeper and do some serious debugging of the library’s assembly code with gdb. The rest of the post is dedicated to the steps I followed to find out what was happening inside the wily proprietary library that we will call libProprietary. Prerequisites for this article are some knowledge of gdb and ARM architecture.

Some background on the task I was doing: I am a Canonical employee that works as developer for Ubuntu for Phones. In most, if not all, phones, the BSP code is not 100% open and we have to use proprietary libraries built for Android. Therefore, these libraries use bionic, Android’s libc implementation. As we want to call them inside binaries compiled with glibc, we resort to libhybris, an ingenious library that is able to load and call libraries compiled against bionic while the rest of the process uses glibc. This will turn out to be critical in this debugging. Note also that we are debugging ARM 32-bits binaries here.

The Debugging Session

To start, I made sure I had installed glibc and other libraries symbols and started to debug by using gdb in the usual way:

$ gdb myprogram
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
Starting program: myprogram
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/".
[New Thread 0xf49de460 (LWP 7101)]
[New Thread 0xf31de460 (LWP 7104)]
[New Thread 0xf39de460 (LWP 7103)]
[New Thread 0xf41de460 (LWP 7102)]
[New Thread 0xf51de460 (LWP 7100)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf49de460 (LWP 7101)]
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0xf520bd06 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info proc mappings
process 7097
Mapped address spaces:

	Start Addr   End Addr       Size     Offset objfile
	   0x10000    0x17000     0x7000        0x0 /usr/bin/myprogram
	0xf41e0000 0xf49df000   0x7ff000        0x0 [stack:7101]
	0xf51f6000 0xf5221000    0x2b000        0x0 /android/system/lib/
	0xf5221000 0xf5222000     0x1000        0x0 
	0xf5222000 0xf5224000     0x2000    0x2b000 /android/system/lib/
	0xf5224000 0xf5225000     0x1000    0x2d000 /android/system/lib/

We can see here that we get the promised crash. I execute a couple of gdb commands after that to see the backtrace and part of the process address space that will be of interest in the following discussion. The backtrace shows that a segment violation happened when the CPU tried to execute instructions in address zero, and we can see by checking the process mappings that the previous frame lives inside the text segment of There is no backtrace beyond that point, but that should come as no surprise as there is no DWARF information in libProprietary, and also noting that usage of frame pointer is optimized away quite commonly these days.

After this I tried to get a bit more information on the CPU state when the crash happened:

(gdb) info reg
r0             0x0	0
r1             0x0	0
r2             0x0	0
r3             0x9	9
r4             0x0	0
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x0	0
r11            0x0	0
r12            0xffffffff	4294967295
sp             0xf49dde70	0xf49dde70
lr             0xf520bd07	-182403833
pc             0x0	0x0
cpsr           0x60000010	1610612752
(gdb) disassemble 0xf520bd02,+10
Dump of assembler code from 0xf520bd02 to 0xf520bd0c:
   0xf520bd02:	b	0xf49c9cd6
   0xf520bd06:	movwpl	pc, #18628	; 0x48c4	<UNPREDICTABLE>
   0xf520bd0a:	andlt	r4, r11, r8, lsr #12
End of assembler dump.

Hmm, we are starting to see weird things here. First, in 0xf520bd02 (which probably has been executed little before the crash) we get an unconditional branch to some point in the thread stack (see mappings in previous figure). Second, the instruction in 0xf520bd06 (which should be executed after returning from the procedure that provokes the crash) would load into the pc (program counter) an address that is not mapped: we saw that the first mapped address is 0x10000 in the previous figure. The movw instruction has also a “pl” suffix that makes the instruction execute only when the operand is positive or zero… which is obviously unnecessary as 0x48c4 is encoded in the instruction.

I resorted to doing objdump -d to disassemble the library and compare with gdb output. objdump shows, in that part of the file (subtracting the library load address gives us the offset inside the file: 0xf520bd02-0xf51f6000=0x15d02):

   15d02:	f7f3 eade 	blx	92c0 <__android_log_print@plt>;
   15d06:	f8c4 5304 	str.w	r5, [r4, #772]	; 0x304
   15d0a:	4628      	mov	r0, r5
   15d0c:	b00b      	add	sp, #44	; 0x2c
   15d0e:	e8bd 8ff0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}

which is completely different from what gdb shows! What is happening here? Taking a look at addresses for both code chunks, we see that instructions are always 4 bytes in gdb output, while they are 2 or 4 in objdump‘s. Well, you have guessed, don’t you? We are seeing “normal” ARM instructions in gdb, while objdump is decoding THUMB-2 instructions. Certainly objdump seems to be right here as the output is more sensible: we have a call to an executable part of the process space in 0x15d02 (it is resolved to a known function, __android_log_print), and the following instructions seems like a normal function epilogue in ARM: a return value is stored in r0, the sp (stack pointer) is incremented (we are freeing space in the stack), and we restore registers.

If we get back to the register values, we see that cpsr (current program status register [1]) does not have the T bit set, so gdb thinks we are using ARM instructions. We can change this by doing

(gdb) set $cpsr=0x60000030
(gdb) disass 0xf520bd02,+15
Dump of assembler code from 0xf520bd02 to 0xf520bd11:
   0xf520bd02:	blx	0xf51ff2c0
   0xf520bd06:	str.w	r5, [r4, #772]	; 0x304
   0xf520bd0a:	mov	r0, r5
   0xf520bd0c:	add	sp, #44	; 0x2c
   0xf520bd0e:	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc}
End of assembler dump.

Ok, much better now [2]. The thumb bit in cpsr is determined by last bx/blx call: if the address is odd, the procedure to which we are calling contains THUMB instructions, otherwise they are ARM (a good reference for these instructions is [3]). In this case, after an exception the CPU moves to arm mode, and gdb is unable to know which is the right mode when disassembling. We can search for hints on which parts of the code are arm/thumb by looking at the values in registers used by bx/blx, or by looking at the lr (link register): we can see above that the value after the crash was 0xf520bd07, which is odd and indicates that 0xf520bd06 contains a thumb instruction. However, for some reason gdb is not able to take advantage of this information.

Of course this problem does not happen if we have debugging information: in that case we have special symbols that let gdb know if the section where the code is contains thumb instructions or not [4]. As those are not found, gdb uses the cpsr value. Here objdump seems to have better heuristics though.

After solving this issue with instruction decoding, I started to debug __android_log_print to check what was happening there, as it looked like the crash was happening in that call. I spent quite a lot of time there, but found nothing. All looked fine, and I started to despair. Until I inserted a breakpoint in address 0xf520bd06, right after the call to __android_log_print, run the program… and it stopped at that address, no crash happened. I started to execute the program instruction by instruction after that:

(gdb) b *0xf520bd06
(gdb) run
Breakpoint 1, 0xf520bd06 in ?? ()
(gdb) si
0xf520bd0a in ?? ()
(gdb) si
0xf520bd0c in ?? ()
(gdb) si
0xf520bd0e in ?? ()
Cannot insert breakpoint 0.
Cannot access memory at address 0x0

Something was apparently wrong with instruction ldmia, which restores registers, including the pc, from the stack. I took a look at the stack in that moment (taking into account that ldmia had already modified the sp after restoring 9 registers == 36 bytes):

(gdb) x/16xw $sp-36
0xf49dde4c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde5c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde6c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde7c:	0x00000000	0x00000000	0x00000000	0x00000000

All zeros! At this point it is clear that this is the real point where the crash is happening, as we are loading 0 into the pc. This looked clearly like a stack corruption issue.

But, before moving forward, why are we getting a wrong backtrace from gdb? Well, gdb is seeing a corrupted stack, so it is not able to unwind it. It would not be able to unwind it even if having full debug information. The only hint it has is the lr. This register contains the return address after execution of a bl/blx instruction [3]. If the called procedure is non-leaf, it is saved in the prologue, and restored in the epilogue, because it gets overwritten when branching to other procedures. In this case, it is restored on the pc and sometimes it is also saved back in the lr, depending on whether we have arm-thumb interworking built in the procedure or not [5]. It is not overwritten if we have a leaf procedure (as there are no procedure calls inside these).

As gdb has no additional information, it uses the lr to build the backtrace, assuming we are in a leaf procedure. However this is not true and the backtrace turns out to be wrong. Nonetheless, this information was not completely useless: lr was pointing to the instruction right after the last bl/blx instruction that was executed, which was not that far away from the real point where the program was crashing. This happened because fortunately __android_log_print has interworking code and restores the lr, otherwise the value of lr could have been from a point much far away from the point where the real crash happens. Believe or not, but it could have been even worse!

Having now a clear idea of where and why the crash was happening, things accelerated. The procedure where the crash happened, as disassembled by objdump, was (I include here only the more relevant parts of the code)

00015b1c <ProprietaryProcedure@@Base>:
   15b1c:	e92d 4ff0 	stmdb	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   15b20:	b08b      	sub	sp, #44	; 0x2c
   15b22:	497c      	ldr	r1, [pc, #496]	; (15d14 <ProprietaryProcedure@@Base+0x1f8>)
   15b24:	2500      	movs	r5, #0
   15b26:	9500      	str	r5, [sp, #0]
   15b28:	4604      	mov	r4, r0
   15b2a:	4479      	add	r1, pc
   15b2c:	462b      	mov	r3, r5
   15b2e:	f8df 81e8 	ldr.w	r8, [pc, #488]	; 15d18 <ProprietaryProcedure@@Base+0x1fc>
   15b32:	462a      	mov	r2, r5
   15b34:	f8df 91e4 	ldr.w	r9, [pc, #484]	; 15d1c <ProprietaryProcedure@@Base+0x200>
   15b38:	ae06      	add	r6, sp, #24
   15b3a:	f8df a1e4 	ldr.w	sl, [pc, #484]	; 15d20 <ProprietaryProcedure@@Base+0x204>
   15b3e:	200f      	movs	r0, #15
   15b40:	f8df b1e0 	ldr.w	fp, [pc, #480]	; 15d24 <ProprietaryProcedure@@Base+0x208>
   15b44:	f7f3 ef76 	blx	9a34 <prctl@plt>
   15b48:	44f8      	add	r8, pc
   15b4a:	4629      	mov	r1, r5
   15b4c:	44f9      	add	r9, pc
   15b4e:	2210      	movs	r2, #16
   15b50:	44fa      	add	sl, pc
   15b52:	4630      	mov	r0, r6
   15b54:	44fb      	add	fp, pc
   15b56:	f7f3 ea40 	blx	8fd8 <memset@plt>
   15b5a:	a807      	add	r0, sp, #28
   15b5c:	f7f3 ef70 	blx	9a40 <sigemptyset@plt>
   15b60:	4b71      	ldr	r3, [pc, #452]	; (15d28 <ProprietaryProcedure@@Base+0x20c>)
   15b62:	462a      	mov	r2, r5
   15b64:	9508      	str	r5, [sp, #32]
   15b66:	4631      	mov	r1, r6
   15b68:	447b      	add	r3, pc
   15b6a:	681b      	ldr	r3, [r3, #0]
   15b6c:	200a      	movs	r0, #10
   15b6e:	9306      	str	r3, [sp, #24]
   15b70:	f7f3 ef6c 	blx	9a4c <sigaction@plt>
   15d02:	f7f3 eade 	blx	92c0 <__android_log_print@plt>
   15d06:	f8c4 5304 	str.w	r5, [r4, #772]	; 0x304
   15d0a:	4628      	mov	r0, r5
   15d0c:	b00b      	add	sp, #44	; 0x2c
   15d0e:	e8bd 8ff0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}

The addresses where this code is loaded can be easily computed by adding 0xf51f6000 to the file offsets shown in the first column. We see that a few calls to different external functions [6] are performed by ProprietaryProcedure, which is itself an exported symbol.

I restarted the debug session, added a breakpoint at the start of ProprietaryProcedure, right after stmdb saves the state, and checked the stack values:

(gdb) b *0xf520bb20
Breakpoint 1 at 0xf520bb20
(gdb) cont
Breakpoint 1, 0xf520bb20 in ?? ()
(gdb) p $sp
$1 = (void *) 0xf49dde4c
(gdb) x/16xw $sp
0xf49dde4c:	0xf49de460	0x0007df00	0x00000000	0xf49dde70
0xf49dde5c:	0xf49de694	0x00000000	0xf77e9000	0x00000000
0xf49dde6c:	0xf75b4491	0x00000000	0xf49de460	0x00000000
0xf49dde7c:	0x00000000	0xfd5b4eba	0xfe9dd4a3	0xf49de460

We can see that the stack contains something, including a return address that looks valid (0xf75b4491). Note also that the procedure must never touch this part of the stack, as it belongs to the caller of ProprietaryProcedure.

Now it is a simply a matter of bisecting the code between the beginning and the end of ProprietaryProcedure to find out where we are clobbering the stack. I will save you of developing here this tedious process. Instead, I will just show, that, in the end, it turned out that the call to sigemptyset() is the culprit [7]:

(gdb) b *0xf520bb5c
Breakpoint 1 at 0xf520bb5c
(gdb) b *0xf520bb60
Breakpoint 2 at 0xf520bb60
(gdb) run
Breakpoint 1, 0xf520bb5c in ?? ()
(gdb) x/16xw 0xf49dde4c
0xf49dde4c:	0xf49de460	0x0007df00	0x00000000	0xf49dde70
0xf49dde5c:	0xf49de694	0x00000000	0xf77e9000	0x00000000
0xf49dde6c:	0xf75b4491	0x00000000	0xf49de460	0x00000000
0xf49dde7c:	0x00000000	0xfd5b4eba	0xfe9dd4a3	0xf49de460
(gdb) cont
Breakpoint 2, 0xf520bb60 in ?? ()
(gdb) x/16xw 0xf49dde4c
0xf49dde4c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde5c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde6c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde7c:	0x00000000	0x00000000	0x00000000	0x00000000

Note here that I am printing the part of the stack not reserved by the function (0xf49dde4c is the value of the sp before execution of the line at offset 0x15b20, see the code).

What is going wrong here? Now, remember that at the beginning of the article I mentioned that we were using libhybris. libProprietary assumes a bionic environment, and the libc functions it calls are from bionic’s libc. However, libhybris has hooks for some bionic functions: for them bionic is not called, instead the hook is invoked. libhybris does this to avoid conflicts between bionic and glibc: for instance having two allocators fighting for process address space is a recipe for disaster, so malloc() and related functions are hooked and the hooks call in the end the glibc implementation. Signals related functions were hooked too, including sigemptyset(), and in this case the hook simply called glibc implementation.

I looked at glibc and bionic implementations, in both cases sigemptyset() is a very simple utility function that clears with memset() a sigset_t variable. All pointed to different definitions of sigset_t depending on the library. Definition turned out to be a bit messy when looking at the code as it depended on build time definitions, so I resorted to gdb to print the type. For a executable compiled for glibc, I saw

(gdb) ptype sigset_t
type = struct {
    unsigned long __val[32];

and for one using bionic

(gdb) ptype sigset_t
type = unsigned long

This finally confirms where the bug is, and explains it: we are overwriting the stack because libProprietary reserves in the stack memory for bionic’s sigset_t, while we are using glibc’s sigemptyset(), which uses a different definition for it. As this definition is much bigger, the stack gets overwritten after the call to memset(). And we get the crash later when trying to restore registers when the function returns.

After knowing this, the solution was simple: I removed the libhybris hooks for signal functions, recompiled it, and… all worked just fine, no crashes anymore!

However, this is not the final solution: as signals are shared resources, it makes sense to hook them in libhybris. But, to do it properly, the hooks have to translate types between bionic in glibc, thing that we were not doing (we were simply calling glibc implementation). That, however, is “just work”.

Of course I wondered why the heck a library that is kind of generic needs to mess around with signals, but hey, that is not my fault ;-).


I can say I learned several things while debugging this:

  1. Not having the sources is terrible for debugging (well, I already knew this). Unfortunately not open sourcing the code is still a standard practice in part of the industry.
  2. The most interesting technical bit here is IMHO that we need to be very cautious with the backtrace that debuggers shows after a crash. If you start to see things that do not make sense it is possible that registers or stack have been messed up and the real crash happens elsewhere. Bear in mind that the very first thing to do when a program crashes is to make sure that we know the exact point where that happens.
  3. We have to be careful in ARM when disassembling, because if there is no debug information we could be seeing the wrong instruction set. We can check evenness of addresses used by bx/blx and of the lr to make sure we are in the right mode.
  4. Some times taking a look at assembly code can help us when debugging, even when we have the sources. Note that if I had had the C sources I would have seen the crash happening right when returning from a function, and it might not have been that immediate to find out that the stack was messed up. The assembly clearly pointed to an overwritten stack.
  5. Finally, I personally learned some bits of ARM architecture that I did not know, which was great.

Well, this is it. I hope you enjoyed the (lengthy, I know) article. Thanks for your reading!

[2] We can get the same result by executing in gdb set arm fallback-mode thumb, but changing the register seemed more pedagogical here.
[6] In fact the calls are to the PLT section, which is inside the library. The PLT calls in turn, by using addresses in the GOT data section, either directly the function or the dynamic loader, as we are doing lazy loading. See, for instance.
[7] I had to use two breakpoints between consecutive instructions because the “ni” gdb command was not working well here.

Read more
Barry Warsaw


Snappy Ubuntu Core is a new edition of the Ubuntu you know and love, with some interesting new features, including atomic, transactional updates, and a much more lightweight application deployment story than traditional Debian/Ubuntu packaging.  Much of this work grew out of our development of a mobile/touch based version of Ubuntu for phones and tablets, but now Ubuntu Core is available for clouds and devices.

I find the transactional nature of upgrades to be very interesting.  While you still get a perfectly normal Ubuntu system, your root file system is read-only, so traditional apt-get based upgrades don't work.  Instead, your system version is image based; today you are running image 231 and tomorrow a new image is released to get you to 232.  When you upgrade to the new image, you get all the system changes.  We support both full and delta upgrades (the latter which reduces bandwidth), and even phased updates so that we can roll out new upgrades and quickly pull them from the server side if we notice a problem.  Snappy devices even support rolling back upgrades on a single device, by using a dual-partition root file system.  Phones generally don't support this due to lack of available space on the device.

Of course, the other part really interesting thing about Snappy is the lightweight, flexible approach to deploying applications.  I still remember my early days learning how to package software for Debian and Ubuntu, and now that I'm both an Ubuntu Core Developer and Debian Developer, I understand pretty well how to properly package things.  There's still plenty of black art involved, even for relatively easy upstream packages such as distutils/setuptools-based Python package available on the Cheeseshop (er, PyPI).  The Snappy approach on Ubuntu Core is much more lightweight and easy, and it doesn't require the magical approval of the archive elves, or the vagaries of PPAs, to make your applications quickly available to all your users.  There's even a robust online store for publishing your apps.

There's lots more about Snappy apps and Ubuntu Core that I won't cover here, so I encourage you to follow the links for more information.  You might also want to stop now and take the tour of Ubuntu Core (hey, I'm a poet and I didn't even realize it).

In this post, I want to talk about building and deploying snappy Python applications.  Python itself is not an officially supported development framework, but we have a secret weapon.  The system image client upgrader -- i.e. the component on the devices that checks for, verifies, downloads, and applies atomic updates -- is written in Python 3.  So the core system provides us with a full-featured Python 3 environment we can utilize.

The question that came to mind is this: given a command-line application available on PyPI, how easy is it to turn into a snap and install it on an Ubuntu Core system?  With some caveats I'll explore later, it's actually pretty easy!

Basic approach

The basic idea is this: let's take a package on PyPI, which may have additional dependencies also on PyPI, download them locally, and build them into a snap that we can install on an Ubuntu Core system.

The first question is, how do we build a local version of a fully-contained Python application?  My initial thought was to build a virtual environment using virtualenv or pyvenv, and then somehow turn that virtual environment into a snap.  This turns out to be difficult in practice because virtual environments aren't really designed for this.  They have issues with being relocated for example, and they can contain a lot of extraneous stuff that's great for development (virtual environment's actual purpose ) but unnecessary baggage for our use case.

My second thought involved turning a Python application into a single file executable, and from there it would be fairly easy to snappify.  Python has a long tradition of such tools, many with varying degrees of cross platform portability and standalone-ishness.  After looking again at some oldies but goodies (e.g. cx_freeze) and some new offerings, I decided to start with pex.

pex is a nice tool developed by Brian Wickman and the Twitter folks which they use to deploy Python applications to their production environment.  pex takes advantage of modern Python's support for zip imports, and a clever trick of zip files.

Python supports direct imports (of pure Python modules) from zip files, and the python executable's -m option works even when the module is inside a zip file.  Further, the presence of a file within a package can be used as shorthand for executing the package, e.g. python -m myapp will run myapp/ if it exists.

Zip files are interesting because their index is at the end of the file.  This allows you to put whatever you want at the front of the file and it will still be considered a zip file.  pex exploits this by putting a shebang in the first line of the file, e.g. #!/usr/bin/python3 and thus the entire zip file becomes a single file executable of Python code.

There are of course, plenty of caveats.  Probably the main one is that Python cannot import extension modules directly from the zip, because the dlopen() function call only takes a file system path.  pex handles this by marking the resulting file as not zip safe, so the zip is written out to a temporary directory first.

The other issue of course, is that the zip file must contain all the dependencies not present in the base Python.  pex is actually fairly smart here, in that it will chase dependencies, much like pip and it will include those dependencies in the zip file.  You can also specify any missed dependencies explicitly on the pex command line.

Once we have the pex file, we need to add the required snappy metadata and configuration files, and run the snappy command to generate the .snap file, which can then be installed into Ubuntu Core.  Since we can extract almost all of the minimal required snappy metadata from the Python package metadata, we only need just a little input from the user, and the rest of work can be automated.

We're also going to avail ourselves of a convenient cheat.  Because Python 3 and its standard library are already part of Ubuntu Core on a snappy device, we don't need to worry about any of those dependencies.  We're only going to support Python 3, so we get its full stdlib for free.  If we needed access to Python 2, or any external libraries or add-ons that can't be made part of the zip file, we would need to create a snappy framework for that, and then utilize that framework for our snappy app.  That's outside the scope of this article though.


To build Python snaps, you'll need to have a few things installed.  If you're using Ubuntu 15.04, just apt-get install the appropriate packages.  Otherwise, you can get any additional Python requirements by building a virtual environment and installing tools like pex and wheel into their, then invoking pex from that virtual environment.  But let's assume you have the Vivid Vervet (Ubuntu 15.04); here are the packages you need:
  •  python3
  •  python-pex-cli
  •  python3-wheel
  •  snappy-tools
  •  git
You'll also want a local git clone of which provides a convenient script called for automating the building of Python snaps.  We'll refer to this script extensively in the discussion below.

For extra credit, you might want to get a copy of Python 3.5 (unreleased as of this writing).  I'll show you how to do some interesting debugging with Python 3.5 later on.

From PyPI to snap in one easy step

Let's start with a simple example: world is a very simple script that can provide forward and reverse mappings of ISO 3166 two letter country codes (at least as of before ISO once again paywalled the database).  So if you get an email from you can find out where the BDFL has his secret lair:

$ world py
py originates from PARAGUAY

world is a pure-Python package with both a library and a command line interface. To get started with the script mentioned above, you need to create a minimal .ini file, such as:

name: world

verbose: true

Let's call this file world.ini.  (In fact, you'll find this very file under the examples directory in the snap git repository.)  What do the various sections and variables control?
  •  name is the name of the project on PyPI.  It's used to look up metadata about the project on PyPI via PyPI's JSON API.
  •  verbose variable just defines whether to pass -v to the underlying pex command.
Now, to create the snap, just run:

$ ./ examples/world.ini

You'll see a few progress messages and a warning which you can ignore.  Then out spits a file called world_3.1.1_all.snap.  Because this is pure Python, it's architecture independent.  That's a good thing because the snap will run on any device, such as a local amd64 kvm instance, or an ARM-based Ubuntu Core-compatible Lava Lamp.

Armed with this new snap, we can just install it on our device (in this case, a local kvm instance) and then run it:

$ snappy-remote --url=ssh://localhost:8022 install world_3.1.1_all.snap
$ ssh -p 8022 ubuntu@localhost
ubuntu@localhost:~$ py
py originates from PARAGUAY

From git repository to snap in one easy step

Let's look at another example, this time using a stupid project that contains an extension module. This aptly named package just prints a yes for every -y argument, and no for every -n argument.

The difference here is that stupid isn't on PyPI; it's only available via git.  The helper is smart enough to know how to build snaps from git repositories.  Here's what the stupid.ini file looks like:

name: stupid
origin: git

verbose: yes

Notice that there's a [project]origin variable.  This just says that the origin of the package isn't PyPI, but instead a git repository, and then the public repo url is given.  The first word is just an arbitrary protocol tag; we could eventually extend this to handle other version control systems or origin types.  For now, only git is supported.

To build this snap:

$ ./ examples/stupid.ini

This clones the repository into a temporary directory, builds the Python package into a wheel, and stores that wheel in a local directory.  pex has the ability to build its pex file from local wheels without hitting PyPI, which we use here.  Out spits a file called stupid_1.1a1_all.snap, which we can install in the kvm instance using the snappy-remote command as above, and then run it after ssh'ing in:

ubuntu@localhost:~$ stupid.stupid -ynnyn

Watch out though, because this snap is really not architecture-independent. It contains an extension module which is compiled on the host platform, so it is not portable to different architectures.  It works on my local kvm instance, but sadly not on my Lava Lamp.

Entry points

pex currently requires you to explicitly name the entry point of your Python application.  This is the function which serves as your main and it's what runs by default when the pex zip file is executed.

Usually, a Python package will define its entry point in its file, like so:

        'console_scripts': ['stupid = stupid.__main__:main'],

And if you have a copy of the package, you can run a command to generate the various package metadata files:

$ python3 egg_info

If you look in the resulting stupid.egg_info/entry_points.txt file, you see the entry point clearly defined there.  Ideally, either pex or would just figure this out explicitly.  As it turns out, there's already a feature request open on pex for this, but in the meantime, how can we auto-detect the entry point?

For the stupid example, it's pretty easy.  Once we've cloned its git repository, we just run the egg_info command and read the entry_points.txt file.  Later, we can build the project's binary wheel from the same git clone.

It's a bit more problematic with world though because the package isn't downloaded from PyPI until pex runs, but the pex command line requires that you specify the entry point before the download occurs.

We can handle this by supporting an entry_point variable in the snap's .ini file.  For example, here's the world.ini file with an explicit entry point setting:

name: world
entry_point: worldlib.__main__:main

verbose: true

What if we still wanted to auto-detect the entry point?  We could of course, download the world package in and run the egg-info command over that.  But pex also wants to download world and we don't want to have to download it twice.  Maybe we could download it in and then build a local wheel file for pex to consume.

As it turns out there's an easier way.

Unfortunately, package egg-info metadata is not availble on PyPI, although arguably it should be.  Fortunately, Vinay Sajip runs an external service that does make the metadata available, such as the metadata for world. makes the entry_point variable optional, and if it's missing, it will grab the package metadata from a link like that given above.  An error will be thrown if the file can't be found, in which case, for now, you'd just add the [project]entry_point variable to the .ini file.

A little more detail

The script is more or less a pure convenience wrapper around several independent tools.  pex of course for creating the single executable zip file, but also the snappy command for building the .snap file.  It also utilizes python3 egg_info where possible to extract metadata and construct the snappy facade needed for the snappy build command.  Less typing for you!  In the case of a snap built from a git repository, it also performs the git cloning, and the python3 bdist_wheel command to create the wheel file that pex will consume.

There's one other important thing does: it fixes the resulting pex file's shebang line.  Because we're running these snaps on an Ubuntu Core system, we know that Python 3 will be available in /usr/bin/python3.  We want the pex file's shebang line to be exactly this.  While pex supports a --python option to specify the interpreter, it doesn't take the value literally.  Instead, it takes the last path component and passes it to /usr/bin/env so you end up with a shebang line like:

#!/usr/bin/env python3

That might work, but we don't want the pex file to be subject to the uncertainties of the $PATH environment variable.

One of the things that does is repack the pex file.  Remember, it's just a zip file with some magic at the top (that magic is the shebang), so we just read the file that pex spits out, and rewrite it with the shebang we want.  Eventually, pex itself will handle this and we won't need to do that anymore.


While I was working out the code and techniques for this blog post, I ran into an interesting problem.  The world script would crash with some odd tracebacks.  I don't have the details anymore and they'd be superfluous, but suffice to say that the tracebacks really didn't help in figuring out the problem.  It would work in a local virtual environment build of world using either the (pip installed) PyPI package or run from the upstream git repository, but once the snap was installed in my kvm instance, it would traceback.  I didn't know if this was a bug in world, in the snap I built, or in the Ubuntu Core environment.  How could I figure that out?

Of course, the go to tool for debugging any Python problem is pdb.  I'll just assume you already know this.  If not, stop everything and go learn how to use the debugger.

Okay, but how was I going to get a pdb breakpoint into my snap?  This is where Python 3.5 comes in!

PEP 441, which has already been accepted and implemented in what will be Python 3.5, aims to improve support for zip applications.  Apropos this blog post, the new zipapp module can be used to zip up a directory into single executable file, with an argument to specify the shebang line, and a few other options.  It's related to what pex does, but without all the PyPI interactions and dependency chasing.  Here's how we can use it to debug a pex file.

Let's ignore snappy for the moment and just create a pex of the world application:

$ pex -r world -o world.pex -e worldlib.__main__:main
Now let's say we want to set a pdb breakpoint in the main() function so that we can debug the program, even when it's a single executable file.  We start by unzipping the pex:
$ mkdir world
$ cd world
$ unzip ../world.pex
If you poke around, you'll notice a file in the current directory.  This is pex's own main entry point.  There are also two hidden directories, .bootstrap and .deps.  The former is more pex scaffolding, but inside the latter you'll see the unpacked wheel directories for world and its single dependency.

Drilling down a little farther, you'll see that inside the world wheel is the full source code for world itself.  Set a break point by visiting .deps/world-3.1.1-py2.py3-none-any.whl/worldlib/ in your editor.  Find the main() function and put this right after the def line:

import pdb; pdb.set_trace()

Save your changes and exit your editor.

At this point, you'll want to have Python 3.5 installed or available.  Let's assume that by the time you read this, Python 3.5 has been released and is the default Python 3 on your system.  If not, you can always download a pre-release of the source code, or just build Python 3.5 from its Mercurial repository.  I'll wait while you do this...

...and we're back!  Okay, now armed with Python 3.5, and still inside the world subdirectory you created above, just do this:

$ python3.5 -m zipapp . -p /usr/bin/python3 -o ../world.dbg

Now, before you can run ../world.dbg and watch the break point do its thing, you need to delete pex's own local cache, otherwise pex will execute the world dependency out of its cache, which won't have the break point set. This is a wart that might be worth reporting and fixing in pex itself.  For now:

$ rm -rf ~/.pex
$ ../world.dbg

And now you should be dropped into pdb almost immediately.

If you wanted to build this debugging pex into a snap, just use the snappy build command directly.  You'll need to add the minimal metadata yourself (since currently doesn't preserve it).  See the Snappy developer documentation for more details.

Summary and Caveats

There's a lot of interesting technology here; pex for building single file executables of Python applications, and Snappy Ubuntu Core for atomic, transactional system updates and lightweight application deployment to the cloud and things.  These allow you to get started doing some basic deployments of Python applications.  No doubt there are lots of loose ends to clean up, and caveats to be aware of.  Here are some known ones:

  • All of the above only works with Python 3.  I think that's a feature, but you might disagree. ;)   This works on Ubuntu Core for free because Python 3 is an essential piece of the base image.  Working out how to deploy Python 2 as a Snappy framework would be an interesting exercise.
  • When we build a snap from a git repository for an application that isn't on PyPI, I don't currently have a way to also grab some dependencies from PyPI.  The stupid example shown here doesn't have any additional dependencies so it wasn't a problem.  Fixing this should be a fairly simple matter of engineering on the wrapper (pull requests welcome!)
  • We don't really have a great story for cross-compilation of extension modules. Solving this is probably a fairly complex initiative involving the distros, setuptools and other packaging tools, and upstream Python.  For now, your best bet might be to actually build the snap on the actual target hardware.
  • Importing extension modules requires a file system cache because of limitations in the dlopen() API.  There have been rumors of extensions to glibc which would provide a dlopen()-from-memory type of API which could solve this, or upstream Python's zip support may want to grow native support for caching.
Even with these caveats, it's pretty easy to turn a Python application into a Snappy Ubuntu Core application, publish it to the world, and profit!  So what are you waiting for?  Snap to it!

Read more
Colin Ian King

health-check revisited

Earlier this month I wrote about health-check, a tool to sanity check application resource usage.  Since then I've re-worked and simplified the system call tracing and added some new features.

Originally, health-check was designed to attach itself to running processes. To make the tool even easier to use it can now start applications and follow any subsequent new processes spawned off from fork() or clone(). For example:

sudo health-check -u joeuser -f thunderbird
..will start thunderbird (running as user "joeuser") and follow any new processes it creates.

Sometimes applications will force flushing of data or metadata to disk by excessive use of fsync(), fdatasync() and sync().  Health-check will now keep track of these system calls and provide feedback on their use.

Health-check already has some memory checking techniques - however, it now has been extended to check explicitly for heap changes by examining the brk() system call and also keeping track of mmap() and munmap() mappings.  This allows better tracking of potential memory leaks.

Source code can be found at: git://

Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:
 sudo add-apt-repository ppa:colin-king/white  
sudo apt-get update
sudo apt-get install health-check

Read more
Colin Ian King

Kernel tracing using lttng

LTTng (Linux Trace Toolkit - next generation) is a highly efficient system tracer that allows tracing of the kernel and userspace. It also provides tools to view and analyse the gathered trace data.  So let's see how to install and use LTTng kernel tracing in Ubuntu. First, one has to install the LTTng userspace tools:

 sudo apt-get update  
sudo apt-get install lttng-tools babeltrace
LTTng was already recently added into the Ubuntu 13.10 Saucy kernel, however, with earlier releases one needs to install the LTTng kernel driver using lttng-modules-dkms as follows:

 sudo apt-get install lttng-modules-dkms  
It is a good idea to sanity check to see if the tools and driver are installed correctly, so first check to see the available kernel events on your machine:

 sudo lttng list -k  
And you should get a list similar to the following:
 Kernel events:  
mm_vmscan_kswapd_sleep (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_kswapd_wake (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_wakeup_kswapd (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_direct_reclaim_begin (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_memcg_reclaim_begin (loglevel: TRACE_EMERG (0)) (type: tracepoint)
Next, we need to create a tracing session:
 sudo lttng create examplesession  
..and enable events to be traced using:
 sudo lttng enable-event sched_process_exec -k  
One can also specify multiple events as a comma separated list. Next, start the tracing using:
 sudo lttng start  
and to stop and complete the tracing use:
 sudo lttng stop  
sudo lttng destroy
and the trace data will be saved in the directory ~/lttng-traces/examplesession-[date]-[time]/.  One can examine the trace data using the babeltrace tool, for example:
 sudo babeltrace ~/lttng-traces/examplesession-20130517-125533  
And you should get a list similar to the following:
 [12:56:04.490960303] (+?.?????????) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/bin/firefox", tid = 4892, old_tid = 4892 }  
[12:56:04.493116594] (+0.002156291) x220i sched_process_exec: { cpu_id = 0 }, { filename = "/usr/bin/which", tid = 4895, old_tid = 4895 }
[12:56:04.496291224] (+0.003174630) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/lib/firefox/firefox", tid = 4892, old_tid = 4892 }
[12:56:05.472770438] (+0.976479214) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/lib/libunity-webapps/unity-webapps-service", tid = 4910, old_tid = 4910 }
[12:56:05.478117340] (+0.005346902) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/bin/ubuntu-webapps-update-index", tid = 4912, old_tid = 4912 }
[12:56:10.834043409] (+5.355926069) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/top", tid = 4937, old_tid = 4937 }
[12:56:13.668306764] (+2.834263355) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/bin/ps", tid = 4938, old_tid = 4938 }
[12:56:16.047191671] (+2.378884907) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/sudo", tid = 4939, old_tid = 4939 }
[12:56:16.059363974] (+0.012172303) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/lttng", tid = 4940, old_tid = 4940 }
The LTTng wiki contains many useful worked examples and is well worth exploring.

As it stands, LTTng is relatively light weight.   Research by Romik Guha Anjoy and Soumya Kanti Chakraborty shows that LTTng describes how the CPU overhead is ~1.6% on a Intel® CoreTM 2 Quad with four 64 bit Q9550 cores.  With measurements I've made with oprofile on a Nexus 4 with 1.5 GHz quad-core Snapdragon S4 Pro processor shows a CPU overhead of < 1% for kernel tracing.  In flight recorder mode, one can generate a lot of trace data. For example, with all tracing enabled running multiple stress tests I was able to generate ~850K second of trace data, so this will obviously impact disk I/O.

Read more
Colin Ian King

Valgrind stack traces

Sometimes when debugging an application it is useful to generate a stack dump when a specific code path is being executed.  The valgrind tool provides a very useful and easy to use mechanism to do this:

1. Add in the following to the source file:

 #include <valgrind/valgrind.h>  
2. Generate the stack trace at the point you desire (and print a specific message) using VALGRIND_PRINTF_BACKTRACE(), for example:
 VALGRIND_PRINTF_BACKTRACE("Stack trace @ %s(), %d", __func__, __LINE__);  
3. Run the program with valgrind.  You may wish to use the --tool=none option to make valgrind run a little faster:
  valgrind --tool=none ./generate/unix/bin64/acpiexec *.dat  
4. Observe the strack trace. For example, I added this to the ACPICA acpiexec in AcpiDsInitOneObject() and got stack traces such as:
 ACPI: SSDT 0x563a480 00249 (v01 LENOVO TP-SSDT2 00000200 INTL 20061109)  
**7129** Stack trace @ AcpiDsInitOneObject(), 174 at 0x416041: VALGRIND_PRINTF_BACKTRACE (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4160A6: AcpiDsInitOneObject (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x441F76: AcpiNsWalkNamespace (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x416312: AcpiDsInitializeObjects (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x43D84D: AcpiNsLoadTable (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x450448: AcpiTbLoadNamespace (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4502F6: AcpiLoadTables (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x405D1A: AeInstallTables (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4052E8: main (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)

There are a collection of very useful tricks to be found in the Valgrind online manual which I recommend perusing at your leisure.

Read more
Colin Ian King

Striving for better code quality.

Software is complex and is never bug free, but fortunately there are many different tools and techniques available to help to identify and catch a large class of common and obscure bugs.

Compilers provide build options that can help drive up code quality by being particularly strict to detect questionable code constructions, for example gcc's -Wall and -pedantic flags.  The gcc -Werror flag is useful during code development to ensure compilation halts with an error on warning messages, this ensures the developer will stop and fix code.

Static analysis during compilation is also a very useful technique, tools such as smatch and Concinelle can identify bugs such as deferencing of NULL pointers, checks for return values and ranges,  incorrect use of && and ||, bad use of unsigned or signed values and many more beside.  These tools were aimed for use on the Linux kernel source code, but can be used on C application source too.  Let's take a moment to see how to use smatch when building an application.

Download the dependencies:

 sudo apt-get install libxml2-dev llvm-dev libsqlite3-dev

Download and build smatch:
 mkdir ~/src  
cd ~/src
git clone git://
cd smatch

Now build your application using smatch:
 cd ~/your_source_code  
make clean
make CHECK="~/src/smatch/smatch --full-path" \
CC=~/src/smatch/cgcc | tee warnings.log

..and inspect the warnings and errors in the file warnings.log.  Smatch will produce false-positives, so not every warning or error is necessarily buggy code.

Of course, run time profiling of programs also can catch errors.  Valgrind is an excellent run time profiler that I regularly use when developing applications to catch bugs such as memory leaks and incorrect memory read/writes. I recommend starting off using the following valgrind options:
 --leak-check=full --show-possibly-lost=yes --show-reachable=yes --malloc-fill=  

For example:
 valgrind --leak-check=full --show-possibly-lost=yes --show-reachable=yes \
--malloc-fill=ff your-program

Since the application is being run on a synthetic software CPU execution can be slow, however it is amazingly thorough and produces detailed output that is extremely helpful in cornering buggy code.

The gcc compiler also provides mechanism to instrument code for run-time analysis.  The -fmudflap family of options instruments risky pointer and array dereferencing operations, some standard library string and heap functions as well as some other range + validity tests.   For threaded applications use -fmudflapth instead of -fmudflap.   The application also needs to be linked with libmudflap.

Here is a simple example:
 int main(int argc, char **argv)  
static int x[100];
return x[100];

Compile with:
 gcc example.c -o example -fmudflap -lmudflap  

..and mudflap detects the error:
mudflap violation 1 (check/read): time=1347817180.586313 ptr=0x701080 size=404
pc=0x7f98d3d17f01 location=`example.c:5:2 (main)'
/usr/lib/x86_64-linux-gnu/ [0x7f98d3d17f01]
./example(main+0x7a) [0x4009c6]
/lib/x86_64-linux-gnu/ [0x7f98d397276d]
Nearby object 1: checked region begins 0B into and ends 4B after
mudflap object 0x190a370: name=`example.c:3:13 x'
bounds=[0x701080,0x70120f] size=400 area=static check=3r/0w liveness=3
alloc time=1347817180.586261 pc=0x7f98d3d175f1
number of nearby objects: 1

These are just a few examples, however there are many other options too. Electric Fence is a useful malloc debugger, and gcc's -fstack-protector produces extra code to check for buffer overflows, for example in stack smashing. Tools like bfbtester allow us to brute force check command line overflows - this is useful as I don't know many developers who try to thoroughly validate all the options in their command line utilities.

No doubt there are many more tools and techniques available.  If we use these wisely and regularly we can reduce bugs and drive up code quality.

Read more
Colin Ian King

Debugging using Visualisation.

There are many different classes of bugs and hence many different debugging techniques can be used.   Sometimes there is a lot of complexity in a problem and it is hard to make a mental model of what exactly is going on between multiple interdependent components, especially when non-deterministic behaviour is occurring.

Unfortunately the human mind is limited; there is only so much debugging state that it can follow.  The problem is compounded when a bug manifests itself after several hours or days of continuous execution time - there just seems like there is too much state to track and to make sense from.

Looking at thousands of lines of debug trace and trying to spot any lurking evidence that may offer some hint to why code is failing is not trivial. However the brain is highly developed at making sense of visual input, so it makes sense to visualise copious amounts of debug data to spot anomalies.

The kernel offers many ways to gather data, be it via different trace mechanisms or just by using plain old printk().   The steps to debug are thus:

1.  Try and find a way to reliably reproduce the problem to be debugged.
2.  Where possible, try to remove complexity from the problem and still end up with a reliable way of reproducing the problem.
3.  Rig up a way of collecting internal state or data on system activity.   Obviously the type of data to be collected is dependant on the type of problem being worked on.   Be careful that any instrumentation does not affect the behaviour of the system.
4.  Start collecting data and try to reproduce the bug.  You may have to do this multiple times to collect enough data to allow one to easily spot trends over different runs.
5.  Visualise the data.

Iterating on steps 2 to 5 allow one to keep on refining a problem down to the bare minimum required to corner a bug.

Now step 5 is the fun part.  Sometimes one has to lightly parse the output to collect specific items of data. I generally use tools such as awk to extract specific fields or to re-munge data into a format that can be easily processed by graphing tools.  It can be useful to also collect time stamps with the data as some bugs are timing dependant and seeing interactions between components is key to understanding why issues occur.  If one gathers multiple sets of data from different sources then being able to correlate the data on a timestamp  can be helpful.

If I have just a few tens of hundreds items of data to visualise I generally just collate my data into tab or comma separated output and then import it into LibreOffice Calc and then generate graphs from the raw data.   However, for more demanding graphing I normally resort to using gnuplot.   Gnuplot is an incredibly powerful plotting tool - however for more complex graphs one often needs to delve deep into the manual and perhaps crib from the very useful worked examples.

Graphing data allows one to easily spot trends or correlate between seemingly unrelated parts of a system. What was originally an overwhelmingly huge mass of debug data turns into a powerful resource.   Sometimes I find it useful to run multiple tests over a range of tweaked kernel tuneables to see if bugs change behaviour - often this aids understanding when there is significant amounts of inter-component complexity occurring.

Perhaps it is just the way I like to think about issues, but I do recommend experimenting with collecting large data sets and creatively transforming the data into visualise representations to allow one to easily spot issues.  It can be surprising just how much one can glean from thousands of seemingly unrelated  traces of debug data.

Read more
Colin Ian King

A new Ubuntu portal is a jump-start page containing links to pages and documents useful for Original Design Manufactures (ODMs), Original Equipment Manufacturers (OEMs) and Independent BIOS vendors.

Some of the highlights include:

  • A BIOS/UEFI requirements document that containing recommendations to ensure firmware is compatible with the Linux kernel.
  • Getting started links describing how to download, install, configure and debug Ubuntu.
  • Links to certified hardware, debugging tools, SystemTap guides, packaging guides, kernel building notes.
  • Debugging tips, covering: hotkeys, suspend/resume, sound, X and wireless and an A5 sized Ubuntu Debugging booklet.
  • Link to fwts-live, the Firmware Test Suite live image. lots of useful technical resources to call upon.

Kudos to Chris Van Hoof for organizing this useful portal.

Read more
Jussi Pakkanen

I played around with btrfs snapshots and discovered two new interesting uses for them. The first one deals with unreliable operations. Suppose you want to update a largish SVN checkout but your net connection is slightly flaky. The reason can be anything, bad wires, overloaded server, electrical outages, and so on.

If SVN is interrupted mid-transfer, it will most likely leave your checkout in a non-consistent state that can’t be fixed even with ‘svn cleanup’. The common wisdom on the Internet is that the way to fix this is to delete or rename the erroneous directory and do a ‘svn update’, which will either work or not. With btrfs snapshots you can just do a snapshot of your source tree before the update. If it fails, just nuke the broken directory and restore your snapshot. Then try again. If it works, just get rid of the snapshot dir.

What you essentially gain are atomic operations on non-atomic tasks (such as svn update). This has been possible before with ‘cp -r’ or similar hacks, but they are slow. Btrfs snapshots can be done in the blink of an eye and they don’t take extra disk space.

The other use case is erroneous state preservation. Suppose you hack on your stuff and encounter a crashing bug in your tools (such as bzr or git). You file a bug on it and then get back to doing your own thing. A day or two later you get a reply on your bug report saying “what is the output of command X”. Since you don’t have the given directory tree state around any more, you can’t run the command.

But if you snapshot your broken tree and store it somewhere safe, you can run any analysis scripts on it any time in the future. Even possibly destructive ones, because you can always run the analysis scripts in a fresh snapshot. Earlier these things were not feasible because making copies took time and possibly lots of space. With snapshots they don’t.

Read more
Colin Ian King

Monitoring /proc/timer_stats

The /proc/timer_stats interface allows one to check on timer usage in a Linux system and hence detect any misuse of timers that can cause excessive wake up events (and also waste power).  /proc/timer_stats reports the process id (pid) of a task that initialised the timer, the name of the task, the name of the function that initialised the timer and the name of the timer callback function.  To enable timer sampling, write "1\n" to /proc/timer_stats and to disable write "0\n".

While this interface is simple to use, collecting multiple samples over a long period of time to monitor overall system behaviour takes a little more effort.   To help with this, I've written a very simple tool called eventstat that calculates the rate of events per second and can dump the data in a .csv (comma separated values) format for importing into a spreadsheet such as LibreOffice for further analysis (such as graphing).

In its basic form, eventstat will run ad infinitum and can be halted by control-C. One can also specify the sample period and number of samples to gather, for example:

 sudo eventstat 10 60  

.. this gathers samples every 10 seconds for 60 samples (which equates to 10 minutes).

The -t option specifies an events/second threshold to discard events less than this threshold, for example:  sudo cpustat -t 10 will show events running at 10Hz or higher.

To dump the samples into a .csv file, use the -r option followed by the name of the .csv file.  If you just want to collect just the samples into a .csv file and not see the statistics during the run, use also the -q option, e.g.

 sudo eventstat -q -r event-report.csv

With eventstat you can quickly identify rouge processes that cause a high frequency of wake ups.   Arguably one can do this with tools such as PowerTop, but eventstat was written to allow one to collect the event statistics over a very long period of time and then help to analyse or graph the data in tools such as Libre Office spreadsheet.

The source is available in the following git repository:  git:// and in my power management tools PPA:

In an ideal world, application developers should check their code with tools like eventstat or PowerTop to ensure that the application is not misbehaving and causing excessive wake ups especially because abuse of timers could be happening in the supporting libraries that applications may be using.

Read more
Colin Ian King

Dumping the contents of the Embedded Controller (EC) can be useful when debugging some x86 BIOS/kernel related issues.  At a hardware level to get access to the EC memory one goes via the EC command/status and data port.   As a side note, one can determine these ports as follows:

cat /proc/ioports  | grep EC

The preferred way to access these is via the ACPI EC driver in drivers/acpi/ec.c which is used by the ACPI driver to handle read/write operations to the EC memory region.

In addition to this driver, there the ec_sys module that provides a useful debugfs interface to allow one to read + write to the EC memory.  Write support is enabled with the ec_sys module parameter 'write_support' but it is generally discouraged as one may be poking data into memory may break things in an unpredictable manner, hence by default write support is disabled.

So, to dump the contents of the first EC (assuming debugfs is mounted), do:

sudo modprobe ec_sys
sudo od -t x1 /sys/kernel/debug/ec/ec0/io


As a bonus, the General Purpose Event bits are also readable from /sys/kernel/debug/ec/ec0/gpe.

Before I stumbled upon ec_sys.c I used a SystemTap script to execute ec_read() in ec.c to do the reading directly.  Yes it's ugly and stupid, but it does prove SystemTap is a very useful tool.

Read more
Colin Ian King

SystemTap is a very useful and powerful tool that enables one to insert kernel debug into a running kernel.  Today I wanted to inspect the I/O read/write operations occurring when running some ACPI AML, so it was a case of hacking up a few lines of system tap to dump out the relevant state (e.g. which port being accessed, width of the I/O operation in bits and value being written or read).

So instead of spinning a bespoke kernel with a few lines of debug in, I use SystemTap to quickly write a re-usable script.  Simple and easy!

I've put the SystemTap script in my git repository for any who are interested.

Read more
Colin Ian King

The ACPI engine in the kernel can be debugged by building with CONFIG_ACPI_DEBUG and configuring /sys/module/acpi/parameters/debug_layer and /sys/module/acpi/parameters/debug_level appropriately.   This can provide a wealth of data and is generally a very powerful debug state tracing mechanism.  However, there are times when one wants to get a little more debug data out or perhaps just drill down on a specific core area of functionality without being swamped by too much ACPI debug. This is where tools like SystemTap are useful.

SystemTap is a very powerful tool that allows one to add extra debug instrumentation into a running kernel without the hassle and overhead of rebuilding a kernel with debug printk() statements in. It allows very quick turnaround in writing debug and one does not have to reboot a machine to load a new kernel since the debug is loaded and unloaded dynamically.

SystemTap has its own scripting language for writing debug scripts, but for specialised hackery it provides a mechanism ('guru mode') to embed C directly which can be called from the SystemTap script.   The SystemTap language is fairly small and easy to understand and one easily becoming proficient with the language in a day.

The only downside is that one requires a .ddeb kernel package which is huge since it contains all the necessary kernel debug information. 

Over the past week  I have been looking at debugging various aspects of the ACPI core, such as fulling tracing suspend/resume and dumping out executed AML code at run time.   I was able to quickly prototype a SystemTap script that dumps out AML opcodes on the Oneiric kernel - this saved me the usual build of a debug kernel with CONFIG_ACPI_DEBUG enabled and then capturing the appropriate debug and wading through copious amounts of debug data.

Conclusion: Some initial investment in time and effort is required to understand SystemTap (and to get to grips with the more useful features in 'guru mode'). However, one can be far more productive because the debug cycle is made far more efficient. Also, SystemTap provides plenty of functionality to allow very detailed and targeted debugging scripts.

Read more
Colin Ian King

PC Speaker weirdnesses

Today we were trying to do some debugging by getting some tones out of a laptop speaker by frobbing bit 1 of port 0x61 on the keyboard controller.  Rather unexpectedly I got no sound whatsoever out of the speaker, yet I had managed to do so the day before.   So I double checked what had changed since the day before:

1. Was it because I upgraded my kernel?
2. Did I unexpected disabled the speaker when tweaking BIOS settings?
3. Was it something interfering with my port 0x61 bit twiddling?
4. Was the hardware now broken?

As per usual, I first assumed that the most complex parts of the system were to blame as they normally can go wrong in the most subtle way.  After a lot of fiddling around I discovered that the PC speaker only worked when I plugged the AC power into the laptop.  Now that wasn't obvious.

I suspect I should have applied Occam's Razor to this problem to begin with. We live and learn...

Read more
Colin Ian King

Back in February I wrote about turning off a PC using the Intel I/O Controller Hub and had some example code to do this, and it was a dirty hack.  I've revised this code to be more aligned with how the Linux kernel does this, namely:

1. Cater for the possible existence of PM1b_EVT_BLK and PM1b_CNT_BLK registers.
2. Clear WAKE_STATUS before transitioning to S5
3. Instead of setting SLP_TYP and SLP_EN on, one sets SLP_TYPE and then finally sets SLP using separate writes.

The refined program requires 4 arguments, namely the port address of the PM1a_EVT_BLK, PM1b_EVT_BLK, PM1a_CNT_BLK and PM1b_CNT_BLK.  If the PM1b_* ports are not defined, then these should be 0.

To find these ports run either:

cat /proc/ioports | grep PM1

or do:

sudo fwts acpidump - | grep PM1 | grep Block:

For example, on a Sandybridge laptop, I have PM1a_EVT_BLK = 0x400, PM1a_CNT_BLK = 0x404 and the PM1b_* ports are not defined, so I use:

sudo ./halt 0x400 0 0x404 0

..and this will transition the laptop to the S5 state very quickly. Needless to say, make sure you have sync'd and/or unmounted your filesystems before doing this.

The PM1_* port addresses from /proc/ioport and the fwts acpidump come from the PM1* configuration data from the ACPI FACP, so if this is wrong, then powering down the machine won't work.  So, if you can't shutdown your machine using this example code then it's possible the FACP is wrong. At this point, one should sanity check the port addresses using the appropriate Southbridge data sheet for your machine - generally speaking look for:

PM1_STS—Power Management 1 Status Register (aka PM1a_EVT_BLK)
PM1_CNT—Power Management 1 Control  (aka PM1a_CNT_BLK)

..however these are offsets from PMBASE which are defined in the LPC Interface PCI Register Address Map so you may require a little bit of work to figure out the addresses of these registers on your hardware.

Read more
Barry Warsaw

For the last couple of days I've been debugging a fun problem in the Ubuntu tool called Jockey. Jockey is a tool for managing device drivers on Ubuntu. It actually contains both a command-line and a graphical front-end, and a dbus backend service that does all the work (with proper authentication, since it modifies your system). None of that is terribly relevant to the problem, although the dbus bit will come back to haunt us later.

What is important is that Jockey is a Python application, written using many Python modules interfacing to low-level tools such as apt and dbus. The original bug report was mighty confusing. Aside from not being reproducible by myself and others, the actual exception made no fricken sense! Basically, it was code like this that was throwing a TypeError:

_actions = []
# _actions gets appended to at various times and later...
for item in _actions[:]:
# do something

Everyone who reported the problem said the TypeError was getting thrown on the for-statement line. The exception message indicated that Python was getting some object that it was trying to convert to an integer, but was failing. How could you possible get that exception when either making a copy of a list or iterating over that copy? Was the list corrupted? Was it not actually a list but some list-like object that was somehow returning non-integers for its min and max indexes?

To make matters worse, this little code snippet was in Python's standard library, in the subprocess module. A quick search of Python's bug database did reveal some recent threads about changes here, made to ensure that popen objects got properly cleaned up by the garbage collector if they weren't cleaned up explicitly by the program. Note that we're using Python 2.7 here, and after some reading of the tracker issues and the Python code, I just couldn't see any problem here, or at least none that could possibly lead to the error being seen.

When I originally looked at the Launchpad bug report, that was about as far as I got. I couldn't see any possible way this bug could happen, and I couldn't reproduce it, so I set the bug to Incomplete. Unfortunately, it kept hitting Ubuntu Natty beta testers so it wasn't going to go away. Fortunately, Martin Pitt found a test recipe, with which I could reproduce the bug 100% of the time. Yay! At least it probably wasn't going to be a race condition.

How to debug this though? Normally, I'd just attach gdb to the thing and start tracing, but the problem was that when I set the Jockey dbus backend to use the debug version of Python, the error went away (or rather morphed into something that was really unrelated). Here's where I started to sharpen up my favorite yak shaving blade.

The parable about shaving yaks is so entirely appropriate to problems like this that I'll take a quick detour. One of my (and my brother's) favorite shows in the 90's was Ren and Stimpy. That show was groundbreaking, and you can see elements of it in just about every NickToons cartoon on cable today. Some of the episodes are brilliant and others are horrible but as a whole, Ren and Stimpy are an undeniable classic of American animation. In one particular awesome episode, Ren and Stimpy celebrate Kilted Yaks-eve where a Yak comes up from the tub drain and shaves his stubble, leaving behind a present of the highly desired shaving scum for Stimpy in the morning. Based on this episode (most likely!) the programming term "yak shaving" typically means having to take detour after meaningless detour before you can actually solve the problem you're having.

In the context of this bug, the attempt to use the debug build of Python in the Jockey dbus backend was the first yak to shave. Because I wanted to get more information out of the process, I tried to attach to a running backend process, but this proved to be quite difficult. I was debugging this in a 64bit virtual machine, and gdb+debug-python was just not cooperating.

Now the question is: do you debug that problem (or at least get to a place where you can decide to file a bug or not), or do you punt and try a different approach? The former is most definitely shaving the yak; it gets you no closer to fixing the original problem, but certain wastes enough of your time to make you seem productive. Of course, it's inevitable that if you do follow that secondary bug, it'll lead to a third one, and a fourth one, and so on until you're deep in the shaving scum and no closer to a solution to the original problem. You need to be constantly on guard against these types of yak, er, rabbit holes.

I'll spare you the gory details about hitting bugs in my text editor, bugs in building Python on a multiarch system, and other paths that all lead to smooth chinned yaks, but not to happy Pythons. When I punted on trying to attach to the Jockey backend, I actually decided to try to figure out exactly where the TypeError was occurring by using a twist on old-fashioned print statements. By grepping the Python source I found about a dozen occurrences of the error message "an integer is required". Which one was getting tripped? I added a little marker text to every such occurrence and re-built a Python package to aid in debugging.

Here's where I did detour to yak shaving. The Ubuntu source package for Python itself, runs the full test suite on every build. That's great for ensuring a high quality Python package, but it's terrible for turnaround time when experimentally hacking Python. Building the package with DEB_BUILD_OPTIONS="nocheck nobench" is supposed to work, but for some reason did not with my sbuild environment. I guess debugging that would have been like shaving the yak's left check, but instead it was so much easier to shave his right check. So I spent some time with my razor chopping out huge sections of the debian/rules file so that Python 2.7 would build as fast as possible. While a seemingly meaningless task, this did actually help quite a bit since I was able to try out ideas with a much shorter turnaround.

Anyway, with many iterations on the exception marking idea, I finally nailed down the culprit. It was in the Python C API function PyInt_AsLong(), but even here I was unsure which arm of the conditional was getting triggered. Another round of hack-build-scp-reset-test and I found what I suspected: PyInt_AsLong() was getting passed an object that could not be turned into an integer. But what was that object?

So now back to the original problem with gdb. To solve this I downloaded and built a 32bit virtual machine, and was able to reproduce the bug there. Fortunately, in the 32bit environment I was much more successful in attaching to a running backend Jockey dbus process, and even though I didn't have the Python source available (and no, I won't talk about where that particular yak shaving detour led), I could fairly easily print the objects in the debugger at the offending code, where I learned that PyInt_AsLong() was getting called with None as its argument. And yeah, you can't turn None into an integer in Python!

But, how was this function getting called with None? Popping up the call stack led me eventually to PyArg_Parse(), an older C API function that parses a Python tuple into a set of objects based on some format flags. This is used when implementing Python functions in C, to parse an argument list. Popping up the stack again led me into some python-apt code. python-apt is a C++ library that exposes the APT system to Python programs. It's fairly mature and robust, but I wasn't as familiar with all its dark corners as I was with Python.

Now, my first instinct is never that I've found a bug in Python. That's not to say they don't exist, but just that Python has been around so long, is so well tested, and used so extensively that I'm always suspicious in cases like this (i.e. strange, inexplicable bugs that make no sense) of extension modules and third party code. And indeed, my sleuthing has led me to python-apt, exactly the kind of complex Python extension module that can have odd lurking bugs in it. Still, the problem I was now faced with was this: the call stack led me into a code path that had nothing to do with iterating over a list, or copying that list. So, what gives?

Well, it's useful to know how Python exceptions work at the C level. In general terms, when some C code raises an exception, it basically sets some global state, and then returns error codes up the stack until either something catches it and deals with it, or it percolates up to Python's top eval loop. The key thing here is that there are usually two states that exist: the global exception value currently in effect, and an error code that gets returned up the C call stack. Usually this return code is a zero or one, but it can also be NULL or -1 in certain cases. The Python C API documentation is very good at describing these.

So now that I looked closely at the python-apt code, I could see what what going on and it all began to make sense! The python-apt package was pulsing a progress meter, set up as a callback by the client of the python-apt code. Meaning, python-apt couldn't really control what this callback would return. python-apt expected the callback to return either Python's True or False, but it could return anything, including nothing! In Python, True can be coerced into the integer 1 and False into the integer 0, and python-apt wanted an integer, so indeed the call stack lead right to a call to PyArg_Parse() to turn the callback's return object into an integer. What happens if the callback didn't return something that could be turned into an integer, or worse, didn't return anything at all?

In Python, a function always returns something, even if there is no explicit return statement. In those cases, None is implicitly returned. Yes, you see it now. And if not, there was this clue in the python-apt code: "most of the time the user who subclasses the pulse() method forgot to add a return {True,False} so we just assume he wants a True." Translated: Hey guy! You forgot to add a "return True" or "return False" to your pulse() method, and it probably fell of the end, giving us None, which we dutifully passed to PyArg_Parse().

PyArg_Parse() did its thing when getting None, by correctly setting the global exception state to the TypeError, and returning a zero code to indicate an error occurred. But looking at the python-apt code, it recognizes the error code, but forgets that some global exception state was set! Meaning, even though python-apt was ignoring the exception, Python still knew about it. But because control wasn't being passed on up to the Python eval loop, the exception was just lurking there like a angry unshaven yak, waiting to be discovered. And in fact, the next time Python itself checked the exception state was in, yep, the for-loop iterating over the perfectly fine list object. Python hits the for-loop, finds this orphaned TypeError lurking there and raises it, in a place that has literally nothing to do with the original exception.

The fix is a one-liner. In python-apt, where its ignoring any exception returned by PyArg_Parse(), it must both swallow the error code (which it was doing), and clear the global exception state (which it was notdoing). By adding a call to PyErr_Clear(), python-apt was keeping the interpreter's state consistent, and properly ignoring the argument parsing error, thus fixing the bug.

As I mentioned to my colleague Colin Watson, this did turn out to be a fun one to debug, though not as "fun" as one he recently worked on.

Happy Yak Shaving.

Read more
Colin Ian King

Poking around the HD-Audio Configuration

HD-Audio can be reconfigured without having to re-load the driver using the special sysfs files - this enables one to twiddle and debug HD-audio in a relatively pain free way.

Each codec-hwdep device has a sysfs directory in /sys/class/sound populated with a bunch of files which can be read. Some of these files can be written to, enabling one to over-ride the default. For example on my Lenovo I have /sys/class/sound/hwC0D0 which contains:

32 bit codec vendor id (hex) (read-write)

32 bit subsystem id (hex) (read-write)

32 bit revision id (hex) (read-write)

name of chipset

AFG id (read-only)

MFG id (read-only)

code name string (read-write)

current model option (read-write)

verbs to be execute at initialisation time. Extra verbs can be added by writing to init_verbs as numbers in the form: nid verb parameter

hint strings in format 'key= value', e.g. eapd_switch = yes (for example, this particular hint is picked up by a call to snd_hda_get_bool_hint(codec, "eapd_switch") in the patch_sigmatel.c source)

show the default initial pin configurations as set up by the BIOS at boot time

show the default pin configurations as set by the codec parser. Only pin configurations changed by the parser are shown.

show ping configurations used to override the BIOS set up configurations. One can append new values by writing a nid and value tuple to this file.

triggers codec reconfiguration by writing any value to this file

codec reset, remove mixer elements, clear all init verbs and hints

For example to see the BIOS pin configurations on my Lenovo laptop:

$ cat /sys/class/sound/hwC0D0/init_pin_configs
0x14 0x99130110
0x15 0x411111f0
0x16 0x411111f0
0x17 0x411111f0
0x18 0x03a19820
0x19 0x99a3012f
0x1a 0x411111f0
0x1b 0x0321401f
0x1c 0x411111f0
0x1d 0x40178e2d
0x1e 0x411111f0

And to identify my audio hardware, I can use:

$ cat /sys/class/sound/hwC0D0/ \

..showing that my Lenovo has Realtek ALC861-VD and the vendor and subsystem ids.


Read more
Colin Ian King

Kernel Oops page fault error codes

The x86 Linux kernel Oops messages provide normally just enough information to help a kernel developer corner and fix critical bugs. The start of a typical Oops message may look like the following:

kernel BUG at kernel/signal.c:1599!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pc = 84427f6a
*pde = 00000000
Oops: 0001 [#1]

The 4 digit value after the "Oops:" message dumps out the page fault error code in hexadecimal which in turn can help one deduce what caused the oops. The page fault error code is encoded as follows:

bit 0 - 0 = no page found, 1 = protection fault
bit 1 - 0 = read access, 1 = write access
bit 2 - 0 = kernel-mode access, 1 = user mode access
bit 3 - 0 = n/a, 1 = use of reserved bit detected
bit 4 - 0 = n/a, 1 = fault was an instruction fetch

So, in the above example, the Oops error code was 0x0001 which means it was a page protection fault, read access in kernel mode.

A lot of Oops error codes are 0x0000, which means a page was not found by a read access in kernel mode.

For more information, consult arch/x86/mm/fault.c

Read more
Colin Ian King

Kernel Early Printk Messages

I've been messing around with the earlyprintk kernel options to allow me to get some form of debug out before the console drivers start later on in the kernel init phase. The earlyprintk kernel option supports debug output via the VGA, serial port and USB debug port.

The USB debug port is of interest - most modern systems seem to provide a debug port capability which allows one to send debug over USB to another machine. To check if your USB controller has this capability, use:

sudo lspci -vvv | grep "Debug port"

and look for a string such as "Capabilities: [58] Debug port: BAR=1 offset=00a0". You may have more than one of these on your system, so beware you use the correct one.

One selects this mode of earlyprintk debugging using:


for the default first port, or select the Nth debug enabled port using:


One also needs to build a kernel with the following config option enabled:


On my debug set-up I used a NET20DC-USB Hi-Speed USB 2.0 Host-to-Host Debug Device connecting the target machine and a host with which I capture the USB debug using /dev/ttyUSB0 with minicom. So that I won't bore you with the details, this is all explained in the kernel documentation in Documentation/x86/earlyprintk.txt

As it was, I needed to tweak the earlyprintk driver to put in some delays in the EHCI probing and reset code to get it working on my fairly fast target laptop.
My experience with this approach wasn't great - I had to plug/unplug the debug device quite frequently for the earlyprintk EHCI reset and probe to work. Also, the EHCI USB driver initialisation later on in the kernel initialisation hung which wasn't useful.

Overall, I may have had problems with the host/target and/or the NET20DC-USB host-to-host device, but it did allow me to get some debug out, be it rather unreliably.

Probably an easier way to get debug out is just using the boot option:


however this has the problem that the messages are eventually overwritten by the real console.

Finally, for anyone with old legacy serial ports on their machine (which is quite unlikely nowadays with newer hardware), one can use:


where ttySn is the nth tty serial port.

One can also append the ",keep" option to not disable the earlyprintk once the real console is up and running.

So, with earlyprintk, there is some chance of being able to get some form of debug out to a device to allow one to debug kernel problems that occur early in the initialisation phase.

Read more
Colin Ian King

Debugging grub2 with gdb

?ubomír Rintel has written a detailed and very helpful debugging guide for grub2. The guide covers how to debug with gdb using emulators such as QEMU and Bochs as well as traditional serial line debugging using a null-modem.

The tricky part is to be able to pull in the debug and symbol files for dynamically loaded modules, however this has been solved by ?ubomír with a gdb and perl script.

The guide gives some useful tricks which can be used not just with grub2 but other boot loaders too. It's well worth a look just to learn some useful gdb hacks.

Read more