Canonical Voices

What Alex Chiang talks about

alex

Appy polly loggies for the super long delay between episodes, but I finally carved out some time for our exciting dénouement in the memory leak detection series. Past episodes included detection and analysis.

As a gentle reminder, during analysis, we saw the following block of code:

 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;

And we concluded with:

Is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?
We can definitively say that it is not safe to return NULL without doing anything to dupes. We definitely allocated memory, stuck it into dupes, and then threw dupes away. This is our smoking gun.

But there’s a twist! Eagle-eyed reader Dave Jackson (a former colleague of mine from HP, natch) spotted a second leak! It turns out that line 879 was exceptionally leaky during its inception. As Dave points out, the call to g_slist_prepend() passes g_strdup() as an argument. And as the documentation says:

Duplicates a string. If str is NULL it returns NULL. The returned string should be freed with g_free() when no longer needed.

In memory-managed languages like python, the above idiom of passing a function as an argument to another function is quite common. However, one needs to be more careful about doing so in C and C++, taking great care to observe if your function-as-argument allocates memory and returns it. There is no mechanism in the language itself to automatically free memory in the above situation, and thus the call to g_strdup() seems like it also leaks memory. Yowza!

So, what to do about it?

The basic goal here is that we don’t want to throw dupes away. We need to actually do something with it. Here again are the 3 most pertinent lines.

 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 881                 return NULL;

Let’s break these lines down.

  1. On line 877, we retrieve the dupes list from the dup_data.found object
  2. Line 878 gets a path to the duplicate wifi access point
  3. Finally, line 879 adds the duplicate access point to the old dupes list
  4. Line 881 throws it all away!

To me, the obvious thing to do is to change the code between lines 879 and 881, so that after we modify the duplicates list, we save it back into the dup_data object. That way, the next time around, the list stored inside of dup_data will have our updated list. Makes sense, right?

As long as you agree with me conceptually (and I hope you do), I’m going to take a quick shortcut and show you the end result of how to store the new list back into the dup_data object. The reason for the shortcut is that we are now deep in the details of how to program using the glib API, and like many powerful APIs, the key is to know which functions are necessary to accomplish your goal. Since this is a memory leak tutorial and not a glib API tutorial, just trust me that the patch hunk will properly store the dupes list back into the dup_data object. And if it’s confusing, as always, read the documentation for g_object_steal_data and g_object_set_data_full.

@@ -706,14 +706,15 @@
 +		GSList *dupes = NULL;
 +		const char *path;
 +
-+		dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
++		dupes = g_object_steal_data (G_OBJECT (dup_data.found), "dupes");
 +		path = nm_object_get_path (NM_OBJECT (ap));
 +		dupes = g_slist_prepend (dupes, g_strdup (path));
++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);
 +#endif
  		return NULL;
  	}

If the above patch format looks funny to you, it’s because we are changing a patch.

-+		dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
++		dupes = g_object_steal_data (G_OBJECT (dup_data.found), "dupes");

This means the old patch had the line calling g_object_get_data() and the refreshed patch now calls g_object_steal_data() instead. Likewise…

++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);

The above call to g_object_set_data_full is a brand new line in the new and improved patch.

Totally clear, right? Don’t worry, the more sitting and contemplating of the above you do, the fuller and more awesomer your neckbeard grows. Don’t forget to check it every once in a while for small woodland creatures who may have taken up residence there.

And thus concludes our series on how to detect, analyze, and fix memory leaks. All good? Good.

waiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiit!!!!!!!11!

I can hear the observant readers out there already frantically scratching their necks and getting ready to point out the mistake I made. After all, our newly refreshed patch still contains this line:

 +		dupes = g_slist_prepend (dupes, g_strdup (path));

And as we determined earlier, that’s our incepted memory leak, right? RIGHT!??

Not so fast. Take a look at the new line in our updated patch:

++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);

See that? The last argument to g_object_set_data_full() looks quite interesting indeed. It is in fact, a cleanup function named clear_dupes_list(), which according to the documentation, will be called

when the association is destroyed, either by setting it to a different value or when the object is destroyed.

In other words, when we are ready to get rid of the dup_data.found object, as part of cleaning up that object, we’ll call the clear_dupes_list() function. And what does clear_dupes_list() do, praytell? Why, let me show you!

static void
clear_dupes_list (GSList *list)
{
	g_slist_foreach (list, (GFunc) g_free, NULL);
	g_slist_free (list);
}

Trés interesante! You can see that we iterate across the dupes list, and call g_free on each of the strings we did a g_strdup() on before. So there wasn’t an inception leak after all. Tricky tricky.

A quick digression is warranted here. Contrary to popular belief, it is possible to write object oriented code in plain old C, with inheritance, method overrides, and even some level of “automatic” memory management. You don’t need to use C++ or python or whatever the web programmers are using these days. It’s just that in C, you build the OO features you want yourself, using primitives such as structs and function pointers and smart interface design.

Notice above we have specified that whenever the dup_data object is destroyed, it will free the memory that was stuffed into it. Yes, we had to specify the cleanup function manually, but we are thinking of our data structures in terms of objects.

In fact, the fancy features of many dynamic languages are implemented just this way, with the language keeping track of your objects for you, allocating them when you need, and freeing them when you’re done with them.

Because at the end of the day, it is decidedly not turtles all the way down to the CPU. When you touch memory in in python or ruby or javascript, I guarantee that something is doing the bookkeeping on your memory, and since CPUs only understand assembly language, and C is really just pretty assembly, you now have a decent idea of how those fancy languages actually manage memory on your behalf.

And finally now that you’ve seen just how tedious and verbose it is to track all this memory, it should no longer be a surprise to you that most fancy languages are slower than C. Paperwork. It’s always paperwork.

And here we come to the upshot, which is, tracking down memory leaks can be slow and time consuming and trickier than first imagined (sorry for the early head fake). But with the judicious application of science and taking good field notes, it’s ultimately just like putting a delicious pork butt in the slow cooker for 24 hours. Worth the wait, worth the effort, and it has a delicious smoky sweet payoff.

Happy hunting!


kalua pork + homemade mayo and cabbage

Read more
alex

In our last exciting episode, we learned how to capture a valgrind log. Today we’re going to take the next step and learn how to actually use it to debug memory leaks.

There are a few prerequisites:

  1. know C. If you don’t know it, go read the C programming language which is often referred to as K&R C. Be sure to understand the sections on pointers, and after you do, come back to my blog. See you in 2 weeks!
  2. a nice supply of your favorite beverages and snacks. I prefer coffee and bacon, myself. Get ready because you’re about to read an epic 2276 word blog entry.

That’s it. Ok, ready? Let’s go!

navigate the valgrind log
Open the valgrind log that you collected. If you don’t have one, you can grab one that I’ve already collected. Take a deep breath. It looks scary but it’s not so bad. I like to skip straight to the good part near the bottom. Search the file for “LEAK SUMMARY”. You’ll see something like:

==13124== LEAK SUMMARY:
==13124==    definitely lost: 916,130 bytes in 37,528 blocks
==13124==    indirectly lost: 531,034 bytes in 12,735 blocks
==13124==      possibly lost: 82,297 bytes in 891 blocks
==13124==    still reachable: 2,578,733 bytes in 42,856 blocks
==13124==         suppressed: 0 bytes in 0 blocks
==13124== Reachable blocks (those to which a pointer was found) are not shown.
==13124== To see them, rerun with: --leak-check=full --show-reachable=yes

You can see that valgrind thinks we’ve definitely leaked some memory. So let’s go figure out what leaked.

Valgrind lists all the leaks, in order from smallest to largest. The leaks are also categorized as “possibly” or “definitely”. We’ll want to focus on “definitely” for now. Right above the summary, you’ll see the worst, definite leak:

==13124== 317,347 (77,312 direct, 240,035 indirect) bytes in 4,832 blocks are definitely lost in loss record 10,353 of 10,353
==13124==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13124==    by 0x74E3A78: g_malloc (gmem.c:159)
==13124==    by 0x74F6CA2: g_slice_alloc (gslice.c:1003)
==13124==    by 0x74F7ABD: g_slist_prepend (gslist.c:265)
==13124==    by 0x4275A4: get_menu_item_for_ap (applet-device-wifi.c:879)
==13124==    by 0x427ACE: wireless_add_menu_item (applet-device-wifi.c:1138)
==13124==    by 0x41815B: nma_menu_show_cb (applet.c:1643)
==13124==    by 0x4189EC: applet_update_indicator_menu (applet.c:2218)
==13124==    by 0x74DDD52: g_main_context_dispatch (gmain.c:2539)
==13124==    by 0x74DE09F: g_main_context_iterate.isra.23 (gmain.c:3146)
==13124==    by 0x74DE499: g_main_loop_run (gmain.c:3340)
==13124==    by 0x414266: main (main.c:106)

Wow, we lost 300K of memory in just a few hours. Now imagine if you don’t reboot your laptop for a week. Yeah, that’s not so good. Time for a coffee and bacon break, the next part is about to get fun.

read the stack trace
What you saw above is a stack trace, and it’s printed chronologically “backwards”. In this example, malloc() was called by g_malloc(), which was called by g_slice_alloc(), which in turn was called by g_slist_prepend(), which itself was called by get_menu_item_for_ap() and so forth. The first function ever called was main(), which should hopefully make sense.

At this point, we need to use a little bit of extra knowledge to understand what is happening. The first function, main() is in our program, nm-applet. That’s fairly easy to understand. However, the next few functions that begin with g_main_ don’t actually live inside nm-applet. They are part of glib, which is a library that nm-applet depends on. I happened to have just known this off the top of my head, but if you’re ever unsure, you can just google for the function name. After searching, we can see that those functions are in glib, and while there is some magic that is happening, we can blissfully ignore it because we see that we soon jump back into nm-applet code, starting with applet_update_indicator_menu().

a quick side note
Many Linux programs will have a stack trace similar to the above. The program starts off in its own main(), but will call various other libraries on your system, such as glib, and then jump back to itself. What’s going on? Well, glib provides a feature known as a “main loop” which is used by the program to look for inputs and events, and then react to them. It’s a common programming paradigm, and rather than have every application in the world write their own main loop, it’s easier if everyone just uses the one provided by glib.

The other observation is to note how the function names appear prominently in the stack trace. Pundits wryly say that naming things is one of the hardest things in computer science, and I completely agree. So take care when naming your functions, because people other than you will definitely see them and need to understand them!

Alright, let’s get back to the stack trace. We can see a few functions that look like they belong to nm-applet, based on their names and their associated filenames. For example, the function wireless_add_menu_item() is in the file applet-device-wifi.c on line 1138. Now you see why we wanted symbols from the last episode. Without the debug symbols, all we would have seen would have been a bunch of useless ??? and we’d be gnashing our teeth and wishing for more bacon right now.

Finally, we see a few more g_* functions, which means we’re back in the memory allocation functions provided by glib. It’s important to understand at this point that g_malloc() is not the memory leak. g_malloc() is simply doing whatever nm-applet asks it to do, which is to allocate memory. The leak is highly likely to be in nm-applet losing a reference to the pointer returned by g_malloc().

What does it mean?
Now we’re ready to start the real debugging. We know approximately where we are leaking memory inside nm-applet: get_menu_item_for_ap() which is the last function before calling the g_* memory functions. Time to top off on coffee because we’re about to get our hands dirty.

reading the source
The whole point of open source is being able to read the source. Are you as excited as I am? I know you are!

First, let’s get the source to nm-applet. Assuming you are using Ubuntu and you are using 12.04, you’d simply say:

$ cd Projects
$ mkdir network-manager-gnome
$ cd network-manager-gnome
$ apt-get source network-manager-gnome
$ cd network-manager-applet-0.9.4.1

Woo hoo! That wasn’t hard, right?

side note #2
Contrary to popular belief, reading code is harder than writing code. When you write code, you are transmitting the thoughts of your messy brain into an editor, and as long as it kinda works, you’re happy. When you read code, now you’re faced with the problem of trying to understand exactly what the previous messy brain wrote down and making sense of it. Depending on how messy that previous brain was, you may have real trouble understanding the code. This is where pencil and paper and plenty of coffee come into play, where you literally trace through what the program is doing to try and understand it.

Luckily there are at least a few tools to help you do this. My favorite tools are cscope and ctags, which help me to rapidly understand the skeleton of a program and navigate around its complex structure.

Assuming you are in the network-manager-applet-0.9.4.1 source tree:

$ apt-get install cscope ctags
$ cscope -bqR
$ ctags -R
$ cscope -dp4


You are now presented with a menu. Use control-n and control-p to navigate input fields at the bottom. Try navigating to “Find this C symbol:” and then type in get_menu_item_for_ap, and press enter. The search results are returned, and you can press ’0′ or ’1′ to jump to either of the locations where the function is referenced. You can also press the space bar to see the final search result. Play around with some of the other search types and see what happens. I’ll talk about ctags in a bit.

Alrighty, let’s go looking for our suspicious nm-applet function. Start up cscope as described above. Navigate to “Find this global definition:” and search for get_menu_item_for_ap. cscope should just directly put you in the right spot.

Based on our stack trace, it looks like we’re doing something suspicious on line 879, so let’s go see what it means.

 869         if (dup_data.found) {
 870 #ifndef ENABLE_INDICATOR
 871                 nm_network_menu_item_best_strength (dup_data.found, nm_acce
 872                 nm_network_menu_item_add_dupe (dup_data.found, ap);
 873 #else
 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;
 882         }

Cool, we can now see where the source code is matching up with the valgrind log.

Let’s start doing some analysis. The first thing to note are the #ifdef blocks on lines 870, 873, and 880. You should know that ENABLE_INDICATOR is defined, meaning we do not execute the code in lines 871 and 872. Instead, we do lines 874 to 879, and then we do 881. Why do we do 881 if it is after the #endif? That’s because we fell off the end of the #ifdef block, and then we do whatever is next, after we fall off, namely returning NULL.

Don’t worry, I don’t know what’s going on yet, either. Time for a refill!

Back? Great. Alright, valgrind says that we’re doing something funky with g_slist_prepend().

==13124==    by 0x74F7ABD: g_slist_prepend (gslist.c:265)

And our relevant code is:

 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;

We can see that we declare the pointer *dupes on line 874, but we don’t do anything with it. Then, we assign something to it on line 877. Then, we assign something to it again on line 879. Finally, we end up not doing anything with *dupes at all, and just return NULL on line 881.

This definitely seems weird and worth a second glance. At this point, I’m asking myself the following questions:

  • did g_object_get_data() allocate memory?
  • did g_slist_prepend() allocate memory?
  • are we overwriting *dupes on line 879? that might be a leak.
  • is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?

Let’s take them in order.

did g_object_get_data() allocate memory?
g_object_get_data has online documentation, so that’s our first stop. The documentation says:

Returns :
the data if found, or NULL if no such data exists. [transfer none]

Since I am not 100% familiar with glib terminology, I guess [transfer none] means that g_object_get_data() doesn’t actually allocate memory on its own. But let’s be 100% sure. Time to grab the glib source and find out for ourselves.

$ apt-get source libglib2.0-0
$ cd glib2.0-2.32.1
$ cscope -bqR
$ ctags -R
$ cscope -dp4
search for global definition of g_object_get_data

Pretty simple function.

3208 gpointer
3209 g_object_get_data (GObject     *object,
3210                    const gchar *key)
3211 {
3212   g_return_val_if_fail (G_IS_OBJECT (object), NULL);
3213   g_return_val_if_fail (key != NULL, NULL);
3214 
3215   return g_datalist_get_data (&object->qdata, key);
3216 }

Except I have no idea what g_datalist_get_data() does. Maybe that guy is allocating memory. Now I’ll use ctags to make my life easier. In vim, put your cursor over the “g” in “g_datalist_get_data” and then press control-]. This will “step into” the function. Magic!

 844 gpointer
 845 g_datalist_get_data (GData       **datalist,
 846                      const gchar *key)
 847 {
 848   gpointer res = NULL; 
 ... 
 856   d = G_DATALIST_GET_POINTER (datalist);
 ...
 859       data = d->data;
 860       data_end = data + d->len;
 861       while (data < data_end)
 862         {
 863           if (strcmp (g_quark_to_string (data->key), key) == 0)
 864             {
 865               res = data->data;
 866               break;
 867             }
 868           data++;
 869         }
 ... 
 874   return res;
 875 }

This is a pretty simple loop, walking through an existing list of pointers which have already been allocated somewhere else, starting on line 861. We do our comparison on line 863, and if we get a match, we assign whatever we found to res on line 865. Note that all we are doing here is a simple assignment. We are not allocating any memory!

Finally, we return our pointer on line 874. Press control-t in vim to pop back to your last location.

Now we know for sure that g_object_get_data() and g_datalist_get_data() do not allocate any memory at all, so there can be no possibility of a leak here. Let’s try the next function.

did g_slist_prepend() allocate memory?
First, read the documentation, which says:

The return value is the new start of the list, which may have changed, so make sure you store the new value.

This probably means it allocates memory for us, but let’s double-check just to be sure. Back to cscope!

 259 GSList*
 260 g_slist_prepend (GSList   *list,
 261                  gpointer  data)
 262 {
 263   GSList *new_list;
 264 
 265   new_list = _g_slist_alloc ();
 266   new_list->data = data;
 267   new_list->next = list;
 268 
 269   return new_list;
 270 }

Ah ha! Look at line 265. We are 100% definitely allocating memory, and returning it on line 269. Things are looking up! Let’s keep going with our questions.

are we overwriting *dupes on line 879? that might be a leak.
Remember:

 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));

We’ve already proven to ourselves that line 877 doesn’t allocate any memory. It just sets dupes to some value. However, on line 879, we do allocate memory. It is equivalent to this code:

  int *dupes;
  dupes = 0x12345678;
  dupes = malloc(128);

So simply setting dupes to the return value of g_object_get_data() and later overwriting it with the return value of malloc() does not inherently cause a leak.

By way of counter-example, the below code is a memory leak:

  int *dupes;
  dupes = malloc(64);
  dupes = malloc(128);    /* leak! */

The above essentially illustrates the scenario I was worried about. I was worried that g_object_get_data() allocated memory, and then g_slist_prepend() also allocated memory which would have been a leak because the first value of dupes got scribbled over by the second value. My worry turned out to be incorrect, but that is the type of detective work you have to think about.

As a clearer example of why the above is a leak, consider the next snippet:

  int *dupes1, *dupes2;
  dupes1 = malloc(64);     /* ok */
  dupes2 = malloc(128);    /* ok */
  dupes1 = dupes2;         /* leak! */

First we allocate dupes1. Then allocate dupes2. Finally, we set dupes1 = dupes2, and now we have a leak. No one knows what the old value of dupes1 was, because we scribbled over it, and it is gone forever.

is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?
We can definitively say that it is not safe to return NULL without doing anything to dupes. We definitely allocated memory, stuck it into dupes, and then threw dupes away. This is our smoking gun.

Next time, we’ll see how to actually fix the problem.

Read more
alex


leaky plumbing?

An important piece of optimizing the Ubuntu core on the Nexus 7 is slimming down Ubuntu’s memory requirements. It turns out this focus area has plenty of opportunity to help contribute, and today, I’ll talk about how to find memory leaks in an individual application using valgrind.

The best part? You don’t even have to be a developer to help. The second best part? You don’t even need a Nexus 7! What I describe below works on any Ubuntu machine. Let’s get started!

The first step is to find an application to profile. This is the easiest step. Maybe you have an app you use all the time and really care about making it perform as well as possible. Or maybe you’re experiencing a strange behavior problem in an app that takes a little while to show up. Or maybe you just pick a random application from the dash because you’re in a great mood. They’re all good.

In my case, I’ll use nm-applet as my example, since I’ve been struggling with LP: #780602 for a while, where the list of wifi access points would stop displaying after a day or two. Trés annoying!

Next, install valgrind if it is not already installed.

sudo apt-get install valgrind

Pay attention to the next bit because it is important. In order for your valgrind report to be as helpful as possible for developers, you will also need to install debug packages related to your app. The debug packages contain information to help developers narrow in on exactly where problems might be. “Great” you say, “what do I need to install?”

UPDATE: 29 January 2013

After a bit more thinking and discussing with smart folks like infinity, xnox, and pitti, we realized that I was essentially reinventing a lot of code that already exists in apport-retrace, as that tool already knows how to go from a binary to a package and then solve dependencies.

I tossed the idea (and a really rough crappy version of a prototype) to Kyle Nitzsche who took the idea, ran with it, and fixed all my crap! Woo hoo! With a little bit of effort, we ended up with apport-valgrind which has already landed in raring (along with the required valgrind support patch). Even better, Kyle wrote a great apport-valgrind introduction explaining how it works.

So ignore the script below and use apport-valgrind instead (unfortunately only available in raring).

Today is your lucky day because I’ve written a small script to help you figure out which debug packages you’ll need. Go ahead and grab the python version of valgrind-ubuntu-dbg-packages. (Ignore the go version for now, that’s just something I’m playing with in my other spare time!)

Ok, now comes the tricky part. We have to do a quick valgrind run to see what libraries your app uses. Then we’ll use the helper script to see if there are debug packages for those libraries. Ready?

To run valgrind, use this command:

G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log --track-origins=yes 


Replace with the name of your app.


Let this run long enough for your app to launch (which may take a while under valgrind) and then play with your app just a bit where you would reproduce your bug but without actually reproducing the bug. In the case of nm-applet, I did the following sequence:

killall nm-applet	# stop earlier instances of nm-applet

G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log --track-origins=yes nm-applet


Then I clicked the “More networks” menu item in the applet just to get it to display the other wifi access points, since this is the thing that was breaking for me. After doing that just once, I stopped my valgrind run completely by pressing control-c in the terminal where I launched it.


A valgrind log file should now exist, and you can run the helper script on the log:

./valgrind-ubuntu-dbg-packages.py valgrind.log

You will see quite a bit of output, but at the end, you will get a list of recommended extra packages to install.

It is recommended to install the following packages:
libnss3-dbg libdbus-glib-1-2-dbg libdconf-dbg gvfs-dbg libcanberra-gtk3-module-dbg libatk1.0-dbg librsvg2-dbg libfontconfig1-dbg

Go ahead and install the packages.

Now we are finally ready to collect our real logs.

Update: 29 January 2013

Instead of doing all that janky stuff above, just:

  1. apt-get install apport-valgrind
  2. run: apport-valgrind <executable>
  3. Do step 2 for as long as it takes to reproduce the bug. There is no step 3!

Re-run valgrind exactly as above, but this time, let the app run as long as it needs to reproduce the bug. In the case of nm-applet, I had to let it just sit there and run normally for 24 hours before I saw the bug again. Hopefully your bug reproduces faster! Patience is key. I recommend eating a delicious sandwich if you can’t think of anything better to do.

After your bug has reproduced itself, kill the valgrind run. File a bug — you can use the Ubuntu Nexus7 project — and be sure to attach the valgrind log. It would also be great if you could describe how you reproduced the bug. Be sure to read the bug filing guidelines for more detail.

Huzzah, you’ve contributed something extremely valuable to making Ubuntu leaner and meaner — a great log file. With any luck, a developer will be able to pick up your bug and fix the problem.

And… if we’re even luckier, maybe that developer will be you! Next time I’ll show you how to actually analyze the valgrind log. Stay tuned.

Read more
alex

Week ending 16 November 2012

Accomplishments

  • New kernel has been uploaded to raring archive. Now we’re just waiting on a fix for nux to land before everyone can dist-upgrade to raring. Look for an announcement soon. Thanks to Jani Monoses for doing the heavy lifting here and the Ubuntu kernel team for taking care of the last mile.
  • New benchmarking packages — ubuntu-benchmarking-tools and ubuntu-remote-debug-host-tools — uploaded to raring. Once your Nexus 7 is on raring, you’ll be able to install these convenient metapackages and help us start Ubuntu Pilates! Well done, Chris Wayne!
  • A juju pbuilder charm has been submitted to the charm store. Once this is accepted, developers will be able to easily build ARM packages in the cloud. Thanks Scott Sweeny.
  • Performance optimizations are already landing. Our onscreen keyboard, onboard, recently reduced its startup time from 45 seconds down to 6 seconds. Check out all the gory details in the bug, and big thanks to marmuta and Francesco Fumanti!
  • We had our first weekly status meeting in #ubuntu-meeting. Come back every week for more.

Worst 5 Bugs

Upcoming Plans

  • We are working with Platform QA on creating a set of guidelines and tools for the community to help us start benchmarking memory consumption and usage. Expect an announcement around 23 November.
  • Converting our FAQ to a more friendly and community maintainable AskUbuntu format.

Grab Bag
Brave souls can try upgrading to raring today with this apt-pinning recipe. Create the following file /etc/apt/preferences.d/ubuntu-nexus7-ppa and add the contents below. Thanks to Colin Watson for the tidbit.

Package: *
Pin: release o=LP-PPA-ubuntu-nexus7
Pin-Priority: 600

Read more
alex

how i email

Since I’ve been asked this several times, on the flight back from UDS-R, I decided to document my email workflow.

I have a medium-sophisticated mutt setup that I’ve been using and refining for the past 12 years or so. It used to be a lot more complicated but over time, I’ve been attempting to reduce the delta between my quirks and the mutt defaults, and this is about where I am today.

The remaining reasons for my quirks are:

  • keybindings that are vim-ish
  • supports 2 separate IMAP accounts (including google apps for your domain)
  • color scheme and visual layout quirks
  • fix some annoying default behaviors

In any case, if you’re interested, you can grab my setup over at github.

Read more
alex

gone hacking

Out of the country for 2 weeks. See you in November!

Read more
alex

charging into copenhagen

I’ve been somewhat hard to find lately, and I apologize for that. By way of a minor bit of explanation, I’ve been driving a little squad preparing for Copenhagen.

Two not-unrelated pieces of information –

Mark writes:

So what will we be up to in the next six months? We have two short cycles before we’re into the LTS, and by then we want to have the phone, tablet and TV all lined up. So I think it’s time to look at the core of Ubuntu and review it through a mobile lens: let’s measure our core platform by mobile metrics, things like battery life, number of running processes, memory footprint, and polish the rough edges that we find when we do that. The tighter we can get the core, the better we will do on laptops and the cloud, too.

So bring along a Nexus 7 if you’re coming to Copenhagen, because it makes a rumpty reference for our rootin’ tootin’ radionic razoring. The raving Rick and his merry (wo)men will lead us to a much leaner, sharper, more mobile world. We’ll make something… wonderful, and call it the Raring Ringtail. See you there soon.

And Victor just posted this:

See you in Copenhagen!

Read more
alex

Today I spent a little bit of time playing with sbuild and after an hour or so, decided I hated it. Tried to figure out why people recommend it, and it seems like the best answer is, “it’s the closest to what the buildds use”. I guess that’s a fair answer, but out of the box, sbuild feels clunky to me.

Luckily, Michael Terry is jawesome and wrote these really great pbuilder wrapper scripts and now they’ve landed in Quantal.

If you want to know why I ? them so, check out my contra answer on askubuntu:

Why use sbuild over pbuilder?

And if you want to speed up your pbuilder even moAR, then check out PbuilderHowto.

Maybe I don’t know what I’m doing so if you have tips or corrections, add them over there. If you see mterry out somewhere, buy him a beer!

Read more
alex

ubuntu 12.10 remote greeter

Back in early spring of 2012, I was living in Buenos Aires and decided to torture my team by asking for more work without really knowing at all what we might be asked to pick up. Call it “aspirations of a rookie manager”.

The result of my blind query was that we were told to “go figure out Windows remote desktops”, a strange path to walk down when working at a Linux distro company. The drawback when asking for new work is that you typically have to go do it.

So my team went and figured out a whole bunch about VDI and got our hands dirty down in the plumbing layer, while also learning and adopting scrum and TDD and oh by the way vala just for grins along the way, all while driving towards an extremely aggressive internal demo date.

Requirements changed on us … often, including a major design change in the client-server architecture one month before the demo. I know we did at least three complete rewrites of the architecture and stopped counting after that.

But hey, we got it done because we worked with some really great teams inside Canonical including the design team, the Orange squad, and quite a number of others.

We transitioned our work in June to Ted’s extremely capable team, and wished them luck as we got asked to go fix other problems. A few rewrites later, it’s great to see the feature finally land in 12.10.

While only a tiny bit of my team’s actual code survived all the rewrites, I like to think that we at least provided some fruitful inspiration (or perhaps bloody corpses serving as warning signs) for the final implementation.

So chapeau to all the folks involved who helped make this happen, I’m super proud of all of you.

(and if you’re reading this in Google Reader, you should click to my actual blog if you can’t see the embedded youtube video below)

Read more
alex

In the linux.com interview with gregkh is the following q&a:

What’s the most amused you’ve ever been by the collaborative development process (flame war, silly code submission, amazing accomplishment)?

I think the most amazing thing is that you never know when you will run into someone you have interacted with through email, in person. A great example of this was one year in the Czech Republic, at a Linux conference. A number of the developers all went to a climbing gym one evening, and I found myself climbing with another kernel developer who worked for a different company, someone whose code I had rejected in the past for various reasons, and then eventually accepted after a number of different iterations. So I’ve always thought after that incident, “always try to be nice in email, you never know when the person on the other side of the email might be holding onto a rope ensuring your safety.”

The other wonderful thing about this process is that it is centered around the individual, not the company they work for. People change jobs all the time, yet, we all still work together, on the same things, and see each other all around the world in different locations, no matter what company we work for.

I was the “other kernel developer” and we were probably talking about Physical PCI slot objects, which took 16 rounds of revision before it was accepted.

The great myth of open source is that it’s a complete meritocracy. While there’s more truth there than not, the fact is that as with any shared human endeavor, the personalities in your community are just as important as the raw intellectual output of that community.

This is not to say Rusty is wrong, but rather to remind that if you’re both smart and easy to get along with, life is a lot easier.

Or perhaps if you’re a jerk, you should stick to safer sports like golf.

Read more
alex

After wandering around for a bit, I’ve settled back in San Francisco on a more or less permanent basis. Part of the moving process was finding an ISP and it seems like Comcast is the best option (for my situation). I signed up for their standard residential service, and remote teleworking continued on quite merrily… except for one tiny wart.

We use Google Plus hangouts quite extensively on my team including a daily standup with attendance that hovers between 5 to 10 people. The first time I tried a hangout with my new Comcast service, it was unusable with extreme lag everywhere, connection timeouts, and general unhappiness.

I had a strong hunch that I was suffering from bufferbloat, and a quick ping test confirmed it (more on that later). Obviously I wanted to fix the problem, but there is a lot of text to digest for someone that just wants to make the problem go away.

After a bit of irc whingeing and generous help from people smarter than me, here are my bufferbloat notes for the impatient.

background
Bufferbloat is a complex topic, go read the wiki page for excruciating detail.

But the basic conceptual outline is:

  • a too large buffer on your upstream may cause latency for sensitive applications like video chat
  • you must manage your upstream bandwidth to reduce latency (which typically means you intentionally reduce upstream bandwidth)
  • use QoS in your router to globally reduce upstream bandwidth (not for traffic shaping!)

diagnosis
Ensure your internet connection is idle. Then, start pinging google.com. Observe the “time” field, which will give you a value in ms. Watch this long enough to get an intuitive feel for what is a normal amount of latency on your link. For me, it hovered consistently around 20ms, with some intermittent spikes. You don’t need to be exact. If the values swing wildly, then you’ve got other problems that need to be fixed first. Stop reading this blog and call your ISP.

While the ping is running, visit http://testmy.net/upload and kick off a large upload, say 15MB or more.

If your ping times increase by an order of magnitude and stay there (like mine did to around 300ms), then you have bufferbloat.

This isn’t as rigorous as setting up smokeping and making pretty graphs, but trust me, it’s a lot faster and way easier. Thanks to Alex Williamson for this tip.

mitigation
You will need a router that can do QoS.

The easiest solution is to spend $100 and buy a Netgear WNDR3700 which is capable of running CeroWRT. Get that going and presumably you’re done, although I can’t verify it since I am el cheapo.

I didn’t want to spend $100 and I had an old Linksys WRT54GL lying around. Install Tomato onto it. (Big thanks to Paul Bame for helping me (remotely!!) recover a semi-bricked router.) Now it’s time to tune QoS.

In the Tomato admin interface, navigate to QoS => Basic Settings. Check the “Enable QoS” box and for the “Default class” dropdown list, change it to “highest”.

Figure out your maximum upload speed. You should be able to obtain this number after a few upload tests at testmy.net that you did in the previous step. Enter your max upload speed into the “Outbound Rate / Limit” => “Max Bandwidth” field. Make sure you use the right units, kbits/s please!

Finally, in the “Highest” QoS setting under Outbound, set your lower and upper bounds. I started with 50% as a lower bound and 60% as an upper bound.

Put a large fake number in for “Inbound Limit” and change all the settings there to “None”. These settings don’t seem to affect latency.

Click “save” at the bottom of the page — you do not need to reboot your router.

Re-run the google.com ping test + large upload test at testmy.net. Your ping times under load should remain relatively unchanged vs. an idle line. Congrats, you’ve solved your bufferbloat problem to 80%.

Update (7/29/2012): Thanks to John Taggart for pointing out a more rigorous page on QoS tuning for tomato.

Now you can experiment with increasing the lower and upper bounds of your QoS settings to get more upstream bandwidth. As always, make a change, save, re-run the ping + upload test, and check the results. Remember, the goal is to keep latency under load about equal to what it is on an idle line.

Now your colleagues will thank you for the increased smoothness of your video chats, although remembering to brush your teeth and put pants on is the “last mile” problem I can’t solve for you.

Read more
alex

South Korea is a land of details. From motion sensor escalators that only turn on when someone steps on, to elevator user interface, where pressing the button takes you to the floor, but pressing it again cancels the action (how often have you wished for something like that when obnoxious children mash all the buttons for fun).

There is minimal Engrish, for the most part, signage is well translated. The strange paradox is that for many people — I’m talking about young people — their command of spoken English isn’t that great. This was somewhat surprising to me, considering that to interact with much of the business world today, English is the standard.

Upon a bit of reflection, perhaps I am guilty of misunderestimating the vast, sheer, numbers of people in Asia, a region in ascendancy. It was a bit of a reality check on where the west currently stands in relation to the east in terms of importance. It’s a little early to claim we’re in the death throes of pax Americana but it’s still food for thought.

Another surprising aspect for me was how dirty the air was. Nowhere near as dirty as the air in Beijing, Shanghai, and Nanjing — visibility in those cities averaged approximately 400m when I was there, whereas you could see several km into the distance in Seoul. Still, the omnipresent haze was jarring to someone who spends a lot of time in the American Rockies, where visibility is essentially limited by geographic features, such as ridgelines or say, the curvature of the earth.

We’re experiencing a gigantic wildfire right now, and people in Fortlandia are rightly complaining about the air quality.

Imagine if you woke up to the above every single day.

Finally, axolotls are some of the best animals on earth. Ever.

I’m since back from my week-long work trip there, stopped in at Summit County to do laundry, and then off again. This blog post comes to you from London.

Some useful links:

  • the rest of my Korean photo album — enjoy
  • Learn to read Korean in 15 minutes — driving along in South Korea is actually a great place to practice this, because the signage is dual posted in both Hangul and English. I impressed my hosts with kindergarten reading proficiency (although of course I was just sounding out the words phonetically with nary a clue of what I was actually saying)

Read more
alex

alpenlust

lungs, legs, a-sploding
fairly priced stunning vistas
beloved rockies

Back from Buenos Aires, thanks to salgado and beuno for being excellent hosts. I’ll be back one day.

In the meantime, it’s good to be back on a bike again.

Feeling fat, slow, and out of shape. Let’s see where we are 6 weeks from now.

Read more
alex

buenos aires, redux

San Francisco has been lovely. And I’m coming back, pretty soon, actually.

But first, I’m taking an Argentinean interlude. Tickets are already booked, leaving 11 March, and returning 5 May.

Even better, I’ve already got a place lined up to live, Buenos Aires, in the Recoleta neighborhood. Looking forward to hanging out with my buddy Salgado.

My plan is to eat empanadas and steak and drink red wine until I gain 20 kilos.

If anyone wants to visit, I’ll have a guest bed.

Ciao!

Read more
alex


san francisco santacon, 2011

I’m happy to announce that a few packages I’ve been working on over the past year have finally landed in Ubuntu Precise[1].

If you have a 3G USB modem, and it currently doesn’t work well (or at all) in Debian or Ubuntu, you should check this list of modems[2]. If it listed, then you may be a candidate to try an alternative 3G networking stack.

$ sudo apt-get install wader-core

This command will remove ModemManager and install wader-core. It should be an entirely transparent operation to you, except that after you reboot, your modem should appear as a connection option in the network manager applet.

Yay!

###
1: naturally, I was a good boy and uploaded the packages to Debian unstable first
2: this list is predominantly composed of Vodafone-branded modems, but there are others in there as well.

Thanks to the Debian python team for mentoring me and to Al Stone and dann frazier for even more mentoring in addition to sponsoring me.

Read more
alex

Last January through April, I pretty much fell off the face of the earth, in real life as well as online. For those that asked, I alluded to some long hours at work, but of course couldn’t say much publicly.

Well, we finally launched, and I’m quite proud of all our team accomplished.

Without question, this was the hardest project I’ve taken on in my career to date. But I was part of a great team, and we pulled together to ship.

We’re bringing free, open software to the world. This is the mission I signed up for.

Some links for your reading pleasure, take with a grain of salt:

Read more
alex

life changes

SF prep #1
forsooth, a brake!

I’m moving. Travelling, really.

Around the world.

In 3 to 4 month chunks.

A city at a time.

Really, it’s about time. I’ve been thinking about it for several years now, planning piecemeal, laying down disjointed bits of foundation. But it’s happening. For real.

One of the best perquisites of Canonical is the inherent assumption of remote working. As long as you have a laptop and wifi, you could really work from anywhere in the world (modulo a tiny bit of reality, but for the most part true), assuming you remain productive and available for your colleagues.

It’s time to get while the getting’s good, and take advantage of the freedom. Have laptop, sense of adventure, and strong GI tract; hitting the road, in search of wifi and the perfect bánh mi (or empanada, I’m not terribly picky).

I love Fort Collins. It’s the perfect Pleasantville, and I’ve never been happier living here for 8 years. But Penelope claims that you cannot have both a happy life and an interesting life; you have to choose one.

So, I choose interesting.

FAQ
When are you leaving?
I leave Ft. Collins on 30 September 2011.

Where are you going?
First stop is San Francisco.

San Francisco is hilly, isn’t it?
Right-o. Hence the recent addition of a rear brake on my fixie. I’m not too scared of pedaling a 54×19 up hills, but I am scared of riding down them without additional stopping power.

For how long?
My sublease runs until 31 December 2011. I’ll probably extend it by an extra month and stay til 31 January 2012 because moving on New Year’s Eve sucks. Unless the world ends, of course, in which case the move will be permanent.

Then what?
I’ll come back to Ft. Collins to make sure my house hasn’t burnt down. Maybe gather a few things, maybe sell some other things, maybe do a bit of skiing (February is the best ski month in Colorado anyway), and figure out where I’m going next.

Oh, you’re not selling your house?
No, I’m too lazy to pack yet, or to fix the small nagging things that need to be fixed in order to sell a house.

Are you renting it out then?
Yes, I’ve some friends renting it out for the first stretch, but nothing lined up after that. Would you like to rent a nice house in early 2012?

How about your car?
My lovely renters will run it once in a while to keep the battery from dying. But I plan on leaving it garaged in Ft. Collins mostly.

Ok, so what’s next?
I’m not sure. I really want to go to Taipei, but it kinda depends on how my current work project is going. We currently have staff in two major timezones, the Americas and Europe. Stretching staff across 3 timezones into Asia is horrible. I did that for my last project, and it meant that someone always had a 2am meeting, which sucked. So, if current project is winding up as expected, Taipei is next. If not, then the next strongest candidate will be Buenos Aires.

What factors into your choices?
I’d really like to improve my Mandarin. I plan on taking lessons in San Francisco, and continuing them in Taipei if I end up there. Otherwise, my Spanish could use some tuning up as well. And I fucking love empanadas. Seriously. A lot.

One factor to consider is the length of the tourist visa. Most countries will give US citizens a 90-day stamp without too much hassle, so those countries are more appealing. But to be honest, this whole trip is an experiment in playing it by ear.

Why keep coming back to Ft. Collins? Why not just a ’round-the-world ticket?
I wouldn’t exactly call myself commitment-averse, but I’ve noticed a common pattern in my life heretofore has involved a lot of hedging. Also see above note re: ear-playing (which sounds a whole lot worse than the longer phrase).

Will you blog? Tweet? Facebook?
Yes. Yes. No.

Email works too.

Will we still get platypus Friday?
I shall endeavor to please.

Don’t you think fake-asking yourself questions on your own blog is a little pretentious?
At times, I hate me too.

And clichéd?
Ok, ok, I get the point.

In any case, if you have travel suggestions, tips, whathaveyou, I’m happy to hear them all.

Stay tuned to this space for the latest and greatest.

Cheers!

SPACE BAGS
SPACE BAGS

Read more
alex

16 months later

new digs

April 2010, just hired.

clean office, clean mind

August 2011.

Yes, it took me 16 months to get a bookshelf and hang that photo. There’s been a lot of life in-between.

(click for large versions)

Read more
alex

As of this writing, it is a little painful to use pbuilder to create a Debian chroot on an Ubuntu host due to LP: #599695.

The easiest workaround I could figure out was the following:

$ cat ~/.pbuilderrc-debian
COMPONENTS="main contrib non-free"
DEBOOTSTRAPOPTS=("${DEBOOTSTRAPOPTS[@]}" "--keyring=/usr/share/keyrings/debian-archive-keyring.gpg")

And then you can issue:

sudo pbuilder create --basetgz /var/cache/pbuilder/sid.tgz --distribution sid --mirror http://mirrors.kernel.org/debian --configfile ~/.pbuilderrc-debian

The better way to fix this of course, would be to fix above bug. But this works for now.

Read more
alex

It is a fact of life that everyone receives more email than they can handle.

It is also a fact that email is a skill, and there are varying levels of proficiency out there.

So, it is only a matter of time before you find yourself on the annoying end of an email thread gone awry. Perhaps it is a discussion on the wrong mailing list, or perhaps it is the infamous 1 grillion people in the To: or Cc: fields problem.

Before long, a “take me off this list” / “stop replying to all” storm ensues, and then something horrible like Facebook gets invented to “solve” this “problem”.

Of course mail filters can be deployed, shielding you from the idiocy. But what if you want to be more proactive? Is there a way to stop the insanity without having to hax0r into the mail server and just start BOFH‘ing luser accounts?

Yes, there is an easy solution that works most (but not all) of the time.

Put all the unintended recipients in the Bcc: field. Put the correct recipients in the To: field.

In the case of discussion on the wrong mailing list, this is easy; just put the correct list in the To: field. Include a note in the mail body, such as “Redirecting to foo list, which is more appropriate.” Respondents will then typically automatically respond to the correct list.

In the case of “too many Cc:s”, there’s no easy answer. You could move all the Cc: to Bcc:, and then put something like none@none.invalid in the To: address. You will get a single bounce, but then so will everyone else who attempts to respond to you. This trick only works because the people who tend to cause the problem also tend to be lazy and just respond to the last mail received. They can’t spam everyone else because their addresses are obfuscated via the Bcc:. If you feel brave, you could socially engineer the recipients by writing something inflammatory, in order to entice them to respond to you, rather than other mails in the thread, which will then result in a bounce.

Hope this helps.

[nb, .invalid is actually a reserved domain, read rfc2606 for more details.]

Read more