Canonical Voices

Posts tagged with 'nexus7'

Daniel Holbach

Going mobile

Many asked me in the last time what became of the Ubuntu on Nexus7 project. I’m happy to say that it’s going really well. Some weeks ago it was already very easy to install Ubuntu on a Nexus7, since then things got better and better. Many bugs were ironed out, but the piece most folks have been concentrating on recently was the desktop-r-reduced-power-ram blueprint.

The spec says:

In the past few cycles, we saw that our desktop took more and more RAM to run the full session. Also, more daemons mean more interruptions on the CPU, and less battery file. We will get services to not run when not needed and work on improving the code of those components to consume less resources

Why is this so relevant in a mobile setting? Simple. Most mobile devices are less well-equipped than the common Desktop or Laptop, and every interruption, every bit of CPU usage, every disk access costs precious battery life. Fixing this kind of bugs will have a great and positive impact for all devices running Ubuntu.

Here’s a quick summary of the work which has been done:

  • Robert Ancell: look at why lightdm is using 30MB (it’s due to the memory locking – without locking it drops to 3.7M)
  • Michael Terry: Make lightdm selectively lock memory instead of using mlockall
  • Sébastien Bacher: look if gnome-keyring needs to be running all the time (needs to, restarting would mean having to unlock it again, e.g ask user for password every time)
  • Sébastien Bacher: look at what is making goa run for some users (it’s e-d-s)
  • Sébastien Bacher: set up follow-up meetings about the topics we didn’t cover during the session
  • Ken vanDine: check with online team if signond needs to be running all the time
  • Ken vanDine: investigate long running telepathy-indicator/mission-control
  • Iain Lane: drop g-c-c recommends on goa so it’s not installed by default
  • Oliver Grawert: seed zram-conf
  • Brian Murray: look at what update-notifier is used for nowadays, identify if those functionalities could be replaced/moved to upstart jobs [http://wiki.ubuntu.com/UpdateNotifier]
  • Colin Watson: fix upower memory leaks
  • Colin Watson: reduce update-notifier memory use

Update: Sébastien also mailed the ubuntu-devel@ list with a nice summary of the work.

We need your help

If you have a look at the desktop-r-reduced-power-ram blueprint you can see that there is still quite a bit of work which need to be done. There are assignees for some of the work items, but all of them will be happy to hear you offer help. The effort is coordinated on #ubuntu-desktop, so you best head there and start chatting with the team.

More information – live hangout

Tomorrow, 7 Feb 2013, at 9 UTC I am going to talk with my friend Sébastien Bacher on http://ubuntuonair.com about this initiative, so if you want to find out more, be sure to tune in or watch the recording in the ubuntuonair youtube channel afterwards.

Read more
alex

Appy polly loggies for the super long delay between episodes, but I finally carved out some time for our exciting dénouement in the memory leak detection series. Past episodes included detection and analysis.

As a gentle reminder, during analysis, we saw the following block of code:

 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;

And we concluded with:

Is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?
We can definitively say that it is not safe to return NULL without doing anything to dupes. We definitely allocated memory, stuck it into dupes, and then threw dupes away. This is our smoking gun.

But there’s a twist! Eagle-eyed reader Dave Jackson (a former colleague of mine from HP, natch) spotted a second leak! It turns out that line 879 was exceptionally leaky during its inception. As Dave points out, the call to g_slist_prepend() passes g_strdup() as an argument. And as the documentation says:

Duplicates a string. If str is NULL it returns NULL. The returned string should be freed with g_free() when no longer needed.

In memory-managed languages like python, the above idiom of passing a function as an argument to another function is quite common. However, one needs to be more careful about doing so in C and C++, taking great care to observe if your function-as-argument allocates memory and returns it. There is no mechanism in the language itself to automatically free memory in the above situation, and thus the call to g_strdup() seems like it also leaks memory. Yowza!

So, what to do about it?

The basic goal here is that we don’t want to throw dupes away. We need to actually do something with it. Here again are the 3 most pertinent lines.

 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 881                 return NULL;

Let’s break these lines down.

  1. On line 877, we retrieve the dupes list from the dup_data.found object
  2. Line 878 gets a path to the duplicate wifi access point
  3. Finally, line 879 adds the duplicate access point to the old dupes list
  4. Line 881 throws it all away!

To me, the obvious thing to do is to change the code between lines 879 and 881, so that after we modify the duplicates list, we save it back into the dup_data object. That way, the next time around, the list stored inside of dup_data will have our updated list. Makes sense, right?

As long as you agree with me conceptually (and I hope you do), I’m going to take a quick shortcut and show you the end result of how to store the new list back into the dup_data object. The reason for the shortcut is that we are now deep in the details of how to program using the glib API, and like many powerful APIs, the key is to know which functions are necessary to accomplish your goal. Since this is a memory leak tutorial and not a glib API tutorial, just trust me that the patch hunk will properly store the dupes list back into the dup_data object. And if it’s confusing, as always, read the documentation for g_object_steal_data and g_object_set_data_full.

@@ -706,14 +706,15 @@
 +		GSList *dupes = NULL;
 +		const char *path;
 +
-+		dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
++		dupes = g_object_steal_data (G_OBJECT (dup_data.found), "dupes");
 +		path = nm_object_get_path (NM_OBJECT (ap));
 +		dupes = g_slist_prepend (dupes, g_strdup (path));
++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);
 +#endif
  		return NULL;
  	}

If the above patch format looks funny to you, it’s because we are changing a patch.

-+		dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupes");
++		dupes = g_object_steal_data (G_OBJECT (dup_data.found), "dupes");

This means the old patch had the line calling g_object_get_data() and the refreshed patch now calls g_object_steal_data() instead. Likewise…

++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);

The above call to g_object_set_data_full is a brand new line in the new and improved patch.

Totally clear, right? Don’t worry, the more sitting and contemplating of the above you do, the fuller and more awesomer your neckbeard grows. Don’t forget to check it every once in a while for small woodland creatures who may have taken up residence there.

And thus concludes our series on how to detect, analyze, and fix memory leaks. All good? Good.

waiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiit!!!!!!!11!

I can hear the observant readers out there already frantically scratching their necks and getting ready to point out the mistake I made. After all, our newly refreshed patch still contains this line:

 +		dupes = g_slist_prepend (dupes, g_strdup (path));

And as we determined earlier, that’s our incepted memory leak, right? RIGHT!??

Not so fast. Take a look at the new line in our updated patch:

++		g_object_set_data_full (G_OBJECT (dup_data.found), "dupes", (gpointer) dupes, (GDestroyNotify) clear_dupes_list);

See that? The last argument to g_object_set_data_full() looks quite interesting indeed. It is in fact, a cleanup function named clear_dupes_list(), which according to the documentation, will be called

when the association is destroyed, either by setting it to a different value or when the object is destroyed.

In other words, when we are ready to get rid of the dup_data.found object, as part of cleaning up that object, we’ll call the clear_dupes_list() function. And what does clear_dupes_list() do, praytell? Why, let me show you!

static void
clear_dupes_list (GSList *list)
{
	g_slist_foreach (list, (GFunc) g_free, NULL);
	g_slist_free (list);
}

Trés interesante! You can see that we iterate across the dupes list, and call g_free on each of the strings we did a g_strdup() on before. So there wasn’t an inception leak after all. Tricky tricky.

A quick digression is warranted here. Contrary to popular belief, it is possible to write object oriented code in plain old C, with inheritance, method overrides, and even some level of “automatic” memory management. You don’t need to use C++ or python or whatever the web programmers are using these days. It’s just that in C, you build the OO features you want yourself, using primitives such as structs and function pointers and smart interface design.

Notice above we have specified that whenever the dup_data object is destroyed, it will free the memory that was stuffed into it. Yes, we had to specify the cleanup function manually, but we are thinking of our data structures in terms of objects.

In fact, the fancy features of many dynamic languages are implemented just this way, with the language keeping track of your objects for you, allocating them when you need, and freeing them when you’re done with them.

Because at the end of the day, it is decidedly not turtles all the way down to the CPU. When you touch memory in in python or ruby or javascript, I guarantee that something is doing the bookkeeping on your memory, and since CPUs only understand assembly language, and C is really just pretty assembly, you now have a decent idea of how those fancy languages actually manage memory on your behalf.

And finally now that you’ve seen just how tedious and verbose it is to track all this memory, it should no longer be a surprise to you that most fancy languages are slower than C. Paperwork. It’s always paperwork.

And here we come to the upshot, which is, tracking down memory leaks can be slow and time consuming and trickier than first imagined (sorry for the early head fake). But with the judicious application of science and taking good field notes, it’s ultimately just like putting a delicious pork butt in the slow cooker for 24 hours. Worth the wait, worth the effort, and it has a delicious smoky sweet payoff.

Happy hunting!


kalua pork + homemade mayo and cabbage

Read more
Matt Fischer

Last week I was running some cairo perf traces on the Nexus7. Cairo-perf traces are a great way to measure 2d graphics performance and to use those numbers to measure the effects of code, hardware, or driver changes. One other cool thing is that with this tool you can do a benchmark on something like Chromium or Firefox without even needing the application installed.

The purpose of this post is to briefly explain how to build the traces, how to run the tools on Ubuntu, and finally a quick look at some results on the Nexus7.

Before running the tools you need to get setup and build the traces. A full clone and build will use several gigs of disk space. Since the current N7 image only builds a 6G or so filesystem, you may want to build the traces in a pbuilder. The disk I/O on the N7 is also pretty slow, so I found that building in the pbuilder, even though it runs inside a qemu, is much faster on my Core i5 + SSD.

In the steps below I’ve tried to call out the things you can do to reduce the disk space.

Building the traces

1. Setup the build environment

sudo apt­-get install libcairo2­dev lzma git

2. Grab the traces from git

git clone git://anongit.freedesktop.org/cairo­traces

3. (Optional) Remove unused files to save on disk space. Don’t do this if you plan on submitting changes back upstream.

cd cairo­traces
rm -­rf .git

4. Build the benchmarks, I used -j4 on my laptop and -j2 on the Nexus7. I didn’t really investigate the optimal value.

make -j4 benchmarks

5. The benchmark directory is now ready to use for traces. If you built it on a different system, you only need to copy over this directory. You can delete the lzma files if you want.

The traces you are pixman version specific, so if you have a Raring based system like the Nexus7, you can’t re-use them on a Precise based box.

Running cairo-perf-trace

1, Before you start, delete the ocitysmap trace from the benchmarks folder. It uses too much RAM and ended up locking up my N7.

2. If you are at the command line, connected via ssh for example, you need to set the display or it will segfault, simply run export DISPLAY=:0

3. Run the tool, I’d start first with a simple trace to make sure that everything is working.

CAIRO_TEST_TARGET=image cairo-­perf-­trace ­i3 ­r ./benchmark/gvim.trace > ~/result_image.txt

In that command above we did a few things, first we set the cairo backend. Image is a software renderer, you probably want to use xlib or xcb to test hardware. If you don’t set the CAIRO_TEST_TARGET it will try all the back-ends, this will take a long long time and I don’t recommend doing it. A simple way to get the tool to list them all is to set it to a bad value, for example

mfisch@caprica:~$ CAIRO_TEST_TARGET=mfisch cairo-perf-trace
Cannot find target 'mfisch'.
Known targets: script, xcb, xcb-window, xcb-window&, xcb-render-0.0, xcb-fallback, xlib, xlib-window, xlib-render-0_0, xlib-fallback, image, image16, recording

The next argument, -i3 tells it to run 3 iterations, this gives us a good set of data to work with. -r asks for raw output, which is literally just the amount of time the trace took to run. Finally ./benchmark/gvim.trace shows which trace to run. You can pass in a directory here and it will run them all, but I’d recommend trying that just one until you know that it’s working. When you’re running a long set of traces doing a tail -f on the result file can help assure you that it’s working without placing too heavy of a load on the system. The hardware backend runs took almost all day to finish, so you should always be plugged into a power source when doing this.

The output should look something like this:
[ # ] backend.content test-size ticks-per-ms time(ticks) ...
[*] xlib.rgba chromium-tabs.0 1e+06 1962036000 1948712000 1938894000

Making Pretty Graphs

Once you have some traces you can make charts with cairo-perf-chart. This undocumented tool has several options which I determined by reading the code. I did send a patch to add a usage() statement to this tool, but nobody has accepted it yet. First, the basic usage, then the options:

cairo-perf-chart nexus7_fbdev_xlib.txt nexus7_tegra3_xlib.txt

cairo-perf-chart will build two charts with that command, one will be an absolute chart, on that chart, larger bars indicate worse performance. The second chart, the relative chart takes the first argument as the baseline and compares the rest of the results files against it. On the relative chart, a number below the zero line indicates that the results are slower than the baseline (which is the first argument to cairo-perf-chart.

Now a quick note about the useful arguments. cairo-perf-chart can take as many results files as you want to supply it when building graphs, if you’d like to compare more than two files. If you want to resize the chart, just pass –width= and –height=, defaults are 640×480. Another useful option is –html which generates an HTML comparison chart from the data. The only issue with this option is that you manually need to make a table header and stick it in to a basic HTML document.

Some Interesting Results

Now some results from the Nexus7 and they are actually pretty interesting. I compared the system with and without the tegra3 drivers enabled. Actually I just plain uninstalled the tegra3 drivers to get some numbers with fbdev. My first run used the image backend, pure software rendering. As expected the numbers are almost identical, since the software rendering is just using the same CPU+NEON.

Absolute Results - Tegra3 vs fbdev drivers, image (software) backend

Absolute Results – Tegra3 vs fbdev drivers, image (software) backend

Relative Results - Tegra3 vs fbdev drivers, image (software) backend

Relative Results – Tegra3 vs fbdev drivers, image (software) backend

The second set of results are more interesting. I switched to the xlib backend so we would get hardware rendering. With the tegra3 driver enabled we should expect a massive performance gain, right?

Absolute Results - Tegra3 vs fbdev drivers, xlib backend

Absolute Results – Tegra3 vs fbdev drivers, xlib backend

Relative Results - Tegra3 vs fbdev drivers, xlib backend

Relative Results – Tegra3 vs fbdev drivers, xlib backend

So as it turns out the tegra3 is actually way slower than fbdev and I don’t know why. I think that this could be for a variety of reasons, such as unoptimized 2d driver code or hardware (CPU+NEON vs Tegra3 GPU).

Now that we have a method for gathering data, perhaps we can solve that mystery?

If you want to know more about the benchmarks or see some more analysis, you should read this great post which is where I found out most of the info on running the tools. If you want to know more background about the cairo-perf trace tools you might want to read this excellent blog post.

Read more
Daniel Holbach

Raring!

Many have asked me what’s been going on with the work on Ubuntu on the Nexus 7 recently. A lot of people put work into getting the raring images ready for public consumption. 12.10 worked great on the Nexus, but there were a few blockers on getting 13.04 to work as well. On this road among other things these issues were fixed:

  • A new onboard pre-release made it into 13.04 which fixes many bugs already and makes our on-screen keyboard a lot easier to use. Thanks a lot to the onboard team.
  • The new Unity stack got into raring, which is now automatically tested after commits and auto-released into 13.04. This is a huge milestone from the Unity team. Among the issues fixed was a nux problem, which constituted a blocker.
  • The new raring images use oem-config to present us with an installer window, where you can specify a user name, the wireless network you want to use and other bits.
  • Many many other issues were fixed as well.
Ubuntu on the Nexus 7

Ubuntu on the Nexus 7

So what does this mean for you now? You can now very easily put Ubuntu 13.04 on your Nexus 7. It won’t need any additional PPA, it’s stock raring, you won’t have to reflash, but can just do your regular updates and enjoy the latest and greatest improvements day by day.

This is a huge achievement and will allow us to do better and more immediate testing and hacking on the device.

 

Hacking

One thing we want to improve on the Nexus 7 (and in Ubuntu in general) is memory consumption. Alex Chiang has put together some great blog posts on how to help with finding memory issues and debugging them. They are absolutely worth a read and an effort worth getting involved with. Here are the links:

If you want to make Ubuntu better and have a bit of a development background, be sure to check them out.

 

Meeting the team

Everybody who has been working on Ubuntu on the Nexus 7 has documented things on the wiki pages, so if you are excited about this, be sure head there first. Also does the team hang out in 24/7 in #ubuntu-arm on irc.freenode.net, so feel free to drop by, say Hi and get to know the others.

These are exciting times for Ubuntu and you can be part of it. :-)

Read more
alex

In our last exciting episode, we learned how to capture a valgrind log. Today we’re going to take the next step and learn how to actually use it to debug memory leaks.

There are a few prerequisites:

  1. know C. If you don’t know it, go read the C programming language which is often referred to as K&R C. Be sure to understand the sections on pointers, and after you do, come back to my blog. See you in 2 weeks!
  2. a nice supply of your favorite beverages and snacks. I prefer coffee and bacon, myself. Get ready because you’re about to read an epic 2276 word blog entry.

That’s it. Ok, ready? Let’s go!

navigate the valgrind log
Open the valgrind log that you collected. If you don’t have one, you can grab one that I’ve already collected. Take a deep breath. It looks scary but it’s not so bad. I like to skip straight to the good part near the bottom. Search the file for “LEAK SUMMARY”. You’ll see something like:

==13124== LEAK SUMMARY:
==13124==    definitely lost: 916,130 bytes in 37,528 blocks
==13124==    indirectly lost: 531,034 bytes in 12,735 blocks
==13124==      possibly lost: 82,297 bytes in 891 blocks
==13124==    still reachable: 2,578,733 bytes in 42,856 blocks
==13124==         suppressed: 0 bytes in 0 blocks
==13124== Reachable blocks (those to which a pointer was found) are not shown.
==13124== To see them, rerun with: --leak-check=full --show-reachable=yes

You can see that valgrind thinks we’ve definitely leaked some memory. So let’s go figure out what leaked.

Valgrind lists all the leaks, in order from smallest to largest. The leaks are also categorized as “possibly” or “definitely”. We’ll want to focus on “definitely” for now. Right above the summary, you’ll see the worst, definite leak:

==13124== 317,347 (77,312 direct, 240,035 indirect) bytes in 4,832 blocks are definitely lost in loss record 10,353 of 10,353
==13124==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13124==    by 0x74E3A78: g_malloc (gmem.c:159)
==13124==    by 0x74F6CA2: g_slice_alloc (gslice.c:1003)
==13124==    by 0x74F7ABD: g_slist_prepend (gslist.c:265)
==13124==    by 0x4275A4: get_menu_item_for_ap (applet-device-wifi.c:879)
==13124==    by 0x427ACE: wireless_add_menu_item (applet-device-wifi.c:1138)
==13124==    by 0x41815B: nma_menu_show_cb (applet.c:1643)
==13124==    by 0x4189EC: applet_update_indicator_menu (applet.c:2218)
==13124==    by 0x74DDD52: g_main_context_dispatch (gmain.c:2539)
==13124==    by 0x74DE09F: g_main_context_iterate.isra.23 (gmain.c:3146)
==13124==    by 0x74DE499: g_main_loop_run (gmain.c:3340)
==13124==    by 0x414266: main (main.c:106)

Wow, we lost 300K of memory in just a few hours. Now imagine if you don’t reboot your laptop for a week. Yeah, that’s not so good. Time for a coffee and bacon break, the next part is about to get fun.

read the stack trace
What you saw above is a stack trace, and it’s printed chronologically “backwards”. In this example, malloc() was called by g_malloc(), which was called by g_slice_alloc(), which in turn was called by g_slist_prepend(), which itself was called by get_menu_item_for_ap() and so forth. The first function ever called was main(), which should hopefully make sense.

At this point, we need to use a little bit of extra knowledge to understand what is happening. The first function, main() is in our program, nm-applet. That’s fairly easy to understand. However, the next few functions that begin with g_main_ don’t actually live inside nm-applet. They are part of glib, which is a library that nm-applet depends on. I happened to have just known this off the top of my head, but if you’re ever unsure, you can just google for the function name. After searching, we can see that those functions are in glib, and while there is some magic that is happening, we can blissfully ignore it because we see that we soon jump back into nm-applet code, starting with applet_update_indicator_menu().

a quick side note
Many Linux programs will have a stack trace similar to the above. The program starts off in its own main(), but will call various other libraries on your system, such as glib, and then jump back to itself. What’s going on? Well, glib provides a feature known as a “main loop” which is used by the program to look for inputs and events, and then react to them. It’s a common programming paradigm, and rather than have every application in the world write their own main loop, it’s easier if everyone just uses the one provided by glib.

The other observation is to note how the function names appear prominently in the stack trace. Pundits wryly say that naming things is one of the hardest things in computer science, and I completely agree. So take care when naming your functions, because people other than you will definitely see them and need to understand them!

Alright, let’s get back to the stack trace. We can see a few functions that look like they belong to nm-applet, based on their names and their associated filenames. For example, the function wireless_add_menu_item() is in the file applet-device-wifi.c on line 1138. Now you see why we wanted symbols from the last episode. Without the debug symbols, all we would have seen would have been a bunch of useless ??? and we’d be gnashing our teeth and wishing for more bacon right now.

Finally, we see a few more g_* functions, which means we’re back in the memory allocation functions provided by glib. It’s important to understand at this point that g_malloc() is not the memory leak. g_malloc() is simply doing whatever nm-applet asks it to do, which is to allocate memory. The leak is highly likely to be in nm-applet losing a reference to the pointer returned by g_malloc().

What does it mean?
Now we’re ready to start the real debugging. We know approximately where we are leaking memory inside nm-applet: get_menu_item_for_ap() which is the last function before calling the g_* memory functions. Time to top off on coffee because we’re about to get our hands dirty.

reading the source
The whole point of open source is being able to read the source. Are you as excited as I am? I know you are!

First, let’s get the source to nm-applet. Assuming you are using Ubuntu and you are using 12.04, you’d simply say:

$ cd Projects
$ mkdir network-manager-gnome
$ cd network-manager-gnome
$ apt-get source network-manager-gnome
$ cd network-manager-applet-0.9.4.1

Woo hoo! That wasn’t hard, right?

side note #2
Contrary to popular belief, reading code is harder than writing code. When you write code, you are transmitting the thoughts of your messy brain into an editor, and as long as it kinda works, you’re happy. When you read code, now you’re faced with the problem of trying to understand exactly what the previous messy brain wrote down and making sense of it. Depending on how messy that previous brain was, you may have real trouble understanding the code. This is where pencil and paper and plenty of coffee come into play, where you literally trace through what the program is doing to try and understand it.

Luckily there are at least a few tools to help you do this. My favorite tools are cscope and ctags, which help me to rapidly understand the skeleton of a program and navigate around its complex structure.

Assuming you are in the network-manager-applet-0.9.4.1 source tree:

$ apt-get install cscope ctags
$ cscope -bqR
$ ctags -R
$ cscope -dp4


You are now presented with a menu. Use control-n and control-p to navigate input fields at the bottom. Try navigating to “Find this C symbol:” and then type in get_menu_item_for_ap, and press enter. The search results are returned, and you can press ’0′ or ’1′ to jump to either of the locations where the function is referenced. You can also press the space bar to see the final search result. Play around with some of the other search types and see what happens. I’ll talk about ctags in a bit.

Alrighty, let’s go looking for our suspicious nm-applet function. Start up cscope as described above. Navigate to “Find this global definition:” and search for get_menu_item_for_ap. cscope should just directly put you in the right spot.

Based on our stack trace, it looks like we’re doing something suspicious on line 879, so let’s go see what it means.

 869         if (dup_data.found) {
 870 #ifndef ENABLE_INDICATOR
 871                 nm_network_menu_item_best_strength (dup_data.found, nm_acce
 872                 nm_network_menu_item_add_dupe (dup_data.found, ap);
 873 #else
 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;
 882         }

Cool, we can now see where the source code is matching up with the valgrind log.

Let’s start doing some analysis. The first thing to note are the #ifdef blocks on lines 870, 873, and 880. You should know that ENABLE_INDICATOR is defined, meaning we do not execute the code in lines 871 and 872. Instead, we do lines 874 to 879, and then we do 881. Why do we do 881 if it is after the #endif? That’s because we fell off the end of the #ifdef block, and then we do whatever is next, after we fall off, namely returning NULL.

Don’t worry, I don’t know what’s going on yet, either. Time for a refill!

Back? Great. Alright, valgrind says that we’re doing something funky with g_slist_prepend().

==13124==    by 0x74F7ABD: g_slist_prepend (gslist.c:265)

And our relevant code is:

 874                 GSList *dupes = NULL;
 875                 const char *path;
 876 
 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));
 880 #endif
 881                 return NULL;

We can see that we declare the pointer *dupes on line 874, but we don’t do anything with it. Then, we assign something to it on line 877. Then, we assign something to it again on line 879. Finally, we end up not doing anything with *dupes at all, and just return NULL on line 881.

This definitely seems weird and worth a second glance. At this point, I’m asking myself the following questions:

  • did g_object_get_data() allocate memory?
  • did g_slist_prepend() allocate memory?
  • are we overwriting *dupes on line 879? that might be a leak.
  • is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?

Let’s take them in order.

did g_object_get_data() allocate memory?
g_object_get_data has online documentation, so that’s our first stop. The documentation says:

Returns :
the data if found, or NULL if no such data exists. [transfer none]

Since I am not 100% familiar with glib terminology, I guess [transfer none] means that g_object_get_data() doesn’t actually allocate memory on its own. But let’s be 100% sure. Time to grab the glib source and find out for ourselves.

$ apt-get source libglib2.0-0
$ cd glib2.0-2.32.1
$ cscope -bqR
$ ctags -R
$ cscope -dp4
search for global definition of g_object_get_data

Pretty simple function.

3208 gpointer
3209 g_object_get_data (GObject     *object,
3210                    const gchar *key)
3211 {
3212   g_return_val_if_fail (G_IS_OBJECT (object), NULL);
3213   g_return_val_if_fail (key != NULL, NULL);
3214 
3215   return g_datalist_get_data (&object->qdata, key);
3216 }

Except I have no idea what g_datalist_get_data() does. Maybe that guy is allocating memory. Now I’ll use ctags to make my life easier. In vim, put your cursor over the “g” in “g_datalist_get_data” and then press control-]. This will “step into” the function. Magic!

 844 gpointer
 845 g_datalist_get_data (GData       **datalist,
 846                      const gchar *key)
 847 {
 848   gpointer res = NULL; 
 ... 
 856   d = G_DATALIST_GET_POINTER (datalist);
 ...
 859       data = d->data;
 860       data_end = data + d->len;
 861       while (data < data_end)
 862         {
 863           if (strcmp (g_quark_to_string (data->key), key) == 0)
 864             {
 865               res = data->data;
 866               break;
 867             }
 868           data++;
 869         }
 ... 
 874   return res;
 875 }

This is a pretty simple loop, walking through an existing list of pointers which have already been allocated somewhere else, starting on line 861. We do our comparison on line 863, and if we get a match, we assign whatever we found to res on line 865. Note that all we are doing here is a simple assignment. We are not allocating any memory!

Finally, we return our pointer on line 874. Press control-t in vim to pop back to your last location.

Now we know for sure that g_object_get_data() and g_datalist_get_data() do not allocate any memory at all, so there can be no possibility of a leak here. Let’s try the next function.

did g_slist_prepend() allocate memory?
First, read the documentation, which says:

The return value is the new start of the list, which may have changed, so make sure you store the new value.

This probably means it allocates memory for us, but let’s double-check just to be sure. Back to cscope!

 259 GSList*
 260 g_slist_prepend (GSList   *list,
 261                  gpointer  data)
 262 {
 263   GSList *new_list;
 264 
 265   new_list = _g_slist_alloc ();
 266   new_list->data = data;
 267   new_list->next = list;
 268 
 269   return new_list;
 270 }

Ah ha! Look at line 265. We are 100% definitely allocating memory, and returning it on line 269. Things are looking up! Let’s keep going with our questions.

are we overwriting *dupes on line 879? that might be a leak.
Remember:

 877                 dupes = g_object_get_data (G_OBJECT (dup_data.found), "dupe
 878                 path = nm_object_get_path (NM_OBJECT (ap));
 879                 dupes = g_slist_prepend (dupes, g_strdup (path));

We’ve already proven to ourselves that line 877 doesn’t allocate any memory. It just sets dupes to some value. However, on line 879, we do allocate memory. It is equivalent to this code:

  int *dupes;
  dupes = 0x12345678;
  dupes = malloc(128);

So simply setting dupes to the return value of g_object_get_data() and later overwriting it with the return value of malloc() does not inherently cause a leak.

By way of counter-example, the below code is a memory leak:

  int *dupes;
  dupes = malloc(64);
  dupes = malloc(128);    /* leak! */

The above essentially illustrates the scenario I was worried about. I was worried that g_object_get_data() allocated memory, and then g_slist_prepend() also allocated memory which would have been a leak because the first value of dupes got scribbled over by the second value. My worry turned out to be incorrect, but that is the type of detective work you have to think about.

As a clearer example of why the above is a leak, consider the next snippet:

  int *dupes1, *dupes2;
  dupes1 = malloc(64);     /* ok */
  dupes2 = malloc(128);    /* ok */
  dupes1 = dupes2;         /* leak! */

First we allocate dupes1. Then allocate dupes2. Finally, we set dupes1 = dupes2, and now we have a leak. No one knows what the old value of dupes1 was, because we scribbled over it, and it is gone forever.

is it safe to just return NULL without doing anything to dupes? maybe that’s our leak?
We can definitively say that it is not safe to return NULL without doing anything to dupes. We definitely allocated memory, stuck it into dupes, and then threw dupes away. This is our smoking gun.

Next time, we’ll see how to actually fix the problem.

Read more
Matt Fischer

This is a a brief post to alert everyone that the 32Gb Nexus7 with 3g cannot currently install Ubuntu.  The problems have been fixed, but we’re not going to re-roll a Quantal based image for this. You’ll get a fix when the Raring builds are ready, which should be soon, hopefully this week. The issue, if you’re curious, is that the new radio changed the device ID where we write the rootfs into. The new code will not make assumptions about the layout and will determine it dynamically.

Read more
alex


leaky plumbing?

An important piece of optimizing the Ubuntu core on the Nexus 7 is slimming down Ubuntu’s memory requirements. It turns out this focus area has plenty of opportunity to help contribute, and today, I’ll talk about how to find memory leaks in an individual application using valgrind.

The best part? You don’t even have to be a developer to help. The second best part? You don’t even need a Nexus 7! What I describe below works on any Ubuntu machine. Let’s get started!

The first step is to find an application to profile. This is the easiest step. Maybe you have an app you use all the time and really care about making it perform as well as possible. Or maybe you’re experiencing a strange behavior problem in an app that takes a little while to show up. Or maybe you just pick a random application from the dash because you’re in a great mood. They’re all good.

In my case, I’ll use nm-applet as my example, since I’ve been struggling with LP: #780602 for a while, where the list of wifi access points would stop displaying after a day or two. Trés annoying!

Next, install valgrind if it is not already installed.

sudo apt-get install valgrind

Pay attention to the next bit because it is important. In order for your valgrind report to be as helpful as possible for developers, you will also need to install debug packages related to your app. The debug packages contain information to help developers narrow in on exactly where problems might be. “Great” you say, “what do I need to install?”

UPDATE: 29 January 2013

After a bit more thinking and discussing with smart folks like infinity, xnox, and pitti, we realized that I was essentially reinventing a lot of code that already exists in apport-retrace, as that tool already knows how to go from a binary to a package and then solve dependencies.

I tossed the idea (and a really rough crappy version of a prototype) to Kyle Nitzsche who took the idea, ran with it, and fixed all my crap! Woo hoo! With a little bit of effort, we ended up with apport-valgrind which has already landed in raring (along with the required valgrind support patch). Even better, Kyle wrote a great apport-valgrind introduction explaining how it works.

So ignore the script below and use apport-valgrind instead (unfortunately only available in raring).

Today is your lucky day because I’ve written a small script to help you figure out which debug packages you’ll need. Go ahead and grab the python version of valgrind-ubuntu-dbg-packages. (Ignore the go version for now, that’s just something I’m playing with in my other spare time!)

Ok, now comes the tricky part. We have to do a quick valgrind run to see what libraries your app uses. Then we’ll use the helper script to see if there are debug packages for those libraries. Ready?

To run valgrind, use this command:

G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log --track-origins=yes 


Replace with the name of your app.


Let this run long enough for your app to launch (which may take a while under valgrind) and then play with your app just a bit where you would reproduce your bug but without actually reproducing the bug. In the case of nm-applet, I did the following sequence:

killall nm-applet	# stop earlier instances of nm-applet

G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log --track-origins=yes nm-applet


Then I clicked the “More networks” menu item in the applet just to get it to display the other wifi access points, since this is the thing that was breaking for me. After doing that just once, I stopped my valgrind run completely by pressing control-c in the terminal where I launched it.


A valgrind log file should now exist, and you can run the helper script on the log:

./valgrind-ubuntu-dbg-packages.py valgrind.log

You will see quite a bit of output, but at the end, you will get a list of recommended extra packages to install.

It is recommended to install the following packages:
libnss3-dbg libdbus-glib-1-2-dbg libdconf-dbg gvfs-dbg libcanberra-gtk3-module-dbg libatk1.0-dbg librsvg2-dbg libfontconfig1-dbg

Go ahead and install the packages.

Now we are finally ready to collect our real logs.

Update: 29 January 2013

Instead of doing all that janky stuff above, just:

  1. apt-get install apport-valgrind
  2. run: apport-valgrind <executable>
  3. Do step 2 for as long as it takes to reproduce the bug. There is no step 3!

Re-run valgrind exactly as above, but this time, let the app run as long as it needs to reproduce the bug. In the case of nm-applet, I had to let it just sit there and run normally for 24 hours before I saw the bug again. Hopefully your bug reproduces faster! Patience is key. I recommend eating a delicious sandwich if you can’t think of anything better to do.

After your bug has reproduced itself, kill the valgrind run. File a bug — you can use the Ubuntu Nexus7 project — and be sure to attach the valgrind log. It would also be great if you could describe how you reproduced the bug. Be sure to read the bug filing guidelines for more detail.

Huzzah, you’ve contributed something extremely valuable to making Ubuntu leaner and meaner — a great log file. With any luck, a developer will be able to pick up your bug and fix the problem.

And… if we’re even luckier, maybe that developer will be you! Next time I’ll show you how to actually analyze the valgrind log. Stay tuned.

Read more
Daniel Holbach

Ubuntu Core on the Nexus 7

There was lots of buzz around Ubuntu on the Nexus 7 and I was lucky enough to get my hands on a Nexus 7 myself, so I had to try out getting Ubuntu on it. The instructions were incredibly clear and everything worked just perfectly. I couldn’t believe how quickly the the Nexus 7 booted into a standard Ubuntu Desktop.

Ubuntu on the Nexus7

When I started to play around with it, I noticed how much of the vision which shaped Ubuntu in the last years works really well on such a small device. Unity is great, the Ubuntu font looks very crisp, the touch gestures, etc. It was a pleasure to see this and play around with it. Naturally I ran into a couple of issues, which was totally expected. We want to know what these issues are and want to work on them to make Ubuntu ready for ranges of hardware. It was great to see that almost every issue I noticed was filed as a bug report already and the most pressing ones have assignees already.

When I decided to dive into it some more, I was glad I had a a Micro USB Host Cable (OTG Cable) and the small USB hub, Intel gave to UDS participants. I was all set to use the tablet as a small computer. Awesome. (To everyone who has hacked on small devices before, this might be boring – I found it quite exciting to be honest. ;-) )

The current image uses 12.10 plus some patches, but it will soon move over to the 13.04 development release, where we can see bug fixes in a more immediate fashion and benefit from all the new goodness which goes into Ubuntu. Expect an announcement of the move to ‘raring’ very very soon.

This effort is a fantastic opportunity for Ubuntu, as we can all look at one reference device and fix whatever needs fixing and let Ubuntu benefit as a whole. If you think this is a worthwhile project, you can help out. What is most needed right now is people who help test the device, report bugs, triage the list of bugs, forward them to the upstream projects if necessary. If you know how to debug memory consumption and how to improve it, you’d be a great fit for the team as well. Please join #ubuntu-arm on irc.freenode.net and talk to the folks in there.

Our documentation is up at https://wiki.ubuntu.com/Nexus7 and there soon will be more meetings, more announces and more chances to get involved. These are exciting times.

Read more
Matt Fischer

I was asked last Friday about what type of bugs we see the most on Nexus7.  Right now we have about 75 unfixed bugs, and from having walked through that bugs list about twenty times, I know that we can break the bugs down into some categories. These are arbitrary categories and subject to debate, but they show the patterns I see in the bug list:

Kernel/Kernel Config/Drivers: 9 bugs

The initial kernel we used was the Android 3.1 kernel, which included binary drivers. This kernel has configuration and code differences from the standard Ubuntu kernel. The kernel team at Canonical is working on trying to get the kernel we’re using as close to Ubuntu as possible. This includes things as simple as enabling more modules and as complex as merging in patches so we can support things like overlayfs.

Onboard Related Issues: 9 bugs

For mobility impaired users, Onboard is a part of daily life. For most of us, we only have to use it when we’re playing with a tablet device, so I think it’s great that these bugs are getting some attention. One bug that was recently fixed by marmuta that should lead to Onboard launching 7x faster with certain themes, including the default theme. Some of the bugs in this category are not issues with Onboard itself, but impact Onboard specifically.

Unity/Nux: 6 bugs

Unity and nux have some bugs that impair the usability of the device and a couple bugs that lead to crashes or lock-ups. This is the area I know the least about. Many of the bugs here are sitting in the upstream project as “New”, if you can help confirm them upstream or even find an older dupe, please do.

Tegra3/nVidia: 6 bugs

We found several bugs that seem to be Tegra3 related, Tegra drivers related, or bugs that we need some input from nVidia on, for example, the sound only works after suspend/resume issue.  Not all of these may really be tegra related issues in the end, many require more investigation.

—–

So these are my top four categories, but that still leaves over half the bugs out. There are some smaller categories, which I’d list as Touch, Performance, and Bluetooth, and then there is Misc aka Everything Else.

Categories aside, another interesting thing I’ve noticed that aside from bugs that are specific to the kernel, drivers, or chipset, almost all the bugs we’ve found were confirmed on other platforms; usually they’re confirmed on a someone’s amd64/i386 laptop. Finding, bringing attention to, and fixing these bugs means shows that we’re achieving one of our goals, which is to fix issues in Ubuntu Core. These fixes will benefit all platforms. There are only a small number of bugs that we have not been able to confirm on other platforms yet, can any of my readers do so?  Here are the ones that stand-out in my mind:

Also I’d like to thank all the new contributors we’ve had in the past couple of weeks, we’re glad for all your help on bugs in any form you can provide it.

Below are the full bug lists for my generated categories:

Kernel:

  • 1068672 webcam support
  • 1072320 please consider adding OTG charging support to kernel
  • 1075549 please include fw_bcmdhd.bin and bcm4330.hcd in linux-firmware for support of the nexus7
  • 1076317 overlayfs support
  • 1070770 bluetoothd dies with glibc malloc memory corruption when used with brcm_patchram
  • 1071259 Setting brightness all the way down actually switches off the display completely
  • 1073499 please consider turning on all possible modules for external USB devices
  • 1073840 Sync kernel configuration with the one from the Ubuntu kernel
  • 1074673 JACK server fails to start

OnBoard:

  • 960537 Dash search box doesn’t unhide Onboard on-screen keyboard
  • 1078554 Onboard doesn’t respect launcher icon size
  • 1081227 onboard should optionally stay hidden if a keyboard is present
  • 1075326 On screen keyboard doesn’t re-position in order to see input
  • 421660 gksu’s and gksudo’s modal password prompt prevents OnBoard’s virtual keyboard input, causing accessibility issues
  • 1079591 onboard can be made thin to the point of unusable
  • 1071508 Onboard onscreen keyboard isn’t always shown when text input selected
  • 1077260 When using software center search, onboard goes away until text box is reslected after entering 2 chars
  • 1077277 Keyboard can’t type into Firefox bookmark dialog

Unity/Nux:

  • 1065638 Unity panels don’t display visuals
  • 1072249 Using desktop switcher via touchscreen causes Unity launcher to stop working
  • 1045256 Dash – It should be possible to vertically scroll the Dash left clicking and dragging
  • 1055949 Unity panel shadow appears as solid black bar on GLES/ARM (Pandaboard, Nexus 7)
  • 1075417 Unity panel/launcher width don’t scale with system DPI/font settings
  • 1070374 unity cannot be cleanly restarted from the command-line on Nexus7

Tegra3/nVidia:

  • 1065644 plymouth causes a hard reset of the nexus
  • 1068804 sound only works after suspend/resume cycle
  • 1070283 after reboot, framebuffer of previous boot appears on screen
  • 1073096 Screen is corrupted between rotations
  • 1067954 control-alt-f1 to bring up a VT shows a blank (black) screen
  • 1070755 screen rotates to portrait sometimes

PS – Thanks to Chris Wayne for vetting my bug category list.

Read more
alex

Week ending 16 November 2012

Accomplishments

  • New kernel has been uploaded to raring archive. Now we’re just waiting on a fix for nux to land before everyone can dist-upgrade to raring. Look for an announcement soon. Thanks to Jani Monoses for doing the heavy lifting here and the Ubuntu kernel team for taking care of the last mile.
  • New benchmarking packages — ubuntu-benchmarking-tools and ubuntu-remote-debug-host-tools — uploaded to raring. Once your Nexus 7 is on raring, you’ll be able to install these convenient metapackages and help us start Ubuntu Pilates! Well done, Chris Wayne!
  • A juju pbuilder charm has been submitted to the charm store. Once this is accepted, developers will be able to easily build ARM packages in the cloud. Thanks Scott Sweeny.
  • Performance optimizations are already landing. Our onscreen keyboard, onboard, recently reduced its startup time from 45 seconds down to 6 seconds. Check out all the gory details in the bug, and big thanks to marmuta and Francesco Fumanti!
  • We had our first weekly status meeting in #ubuntu-meeting. Come back every week for more.

Worst 5 Bugs

Upcoming Plans

  • We are working with Platform QA on creating a set of guidelines and tools for the community to help us start benchmarking memory consumption and usage. Expect an announcement around 23 November.
  • Converting our FAQ to a more friendly and community maintainable AskUbuntu format.

Grab Bag
Brave souls can try upgrading to raring today with this apt-pinning recipe. Create the following file /etc/apt/preferences.d/ubuntu-nexus7-ppa and add the contents below. Thanks to Colin Watson for the tidbit.

Package: *
Pin: release o=LP-PPA-ubuntu-nexus7
Pin-Priority: 600

Read more
Daniel Holbach

For the 13.04 cycle a team of people got together to make sure that the standard Ubuntu Desktop works great on the Nexus 7 device. This is a great opportunity for Ubuntu as we can all refer to one device and one chipset and make sure that Ubuntu is capable of dealing with the device. This will lay the foundations for making Ubuntu ready on many other devices.

The team meets on the 16th November 2012, 16:00 UTC in #ubuntu-meeting.

On the agenda are:

  • general Q&A – the team will answer all your questions,
  • an overview of the current work

We are actively looking for help, we need you to make Ubuntu even better. If you like to test, work with bug reports, measure, debug or fix doesn’t matter, we need you and want you on board.

We put together documentation which should provide pointers to installing Ubuntu on the Nexus 7, how to use it, how to debug and measure certain things like power consumption or memory usage and which bugs we want to fix.

If you have a Nexus 7, plan to get one or are generally excited about the initiative, or just want to find out more, make sure you’re there. Please also let your friends now.

Read more
Matt Fischer

There are dozens of ways that the rest of the community can participate in the Nexus7 project, but I’d like to call out one facet that I’ve been working hard on since I got back from Copenhagen. Chris Wayne and I spend a large portion of our day going through the Nexus7 bug list doing bug triage. In brief, what we do is:

  1. If the bug is unclear, ask the submitter for more info in a comment
  2. Checking for duplicate bug reports inside the Nexus7 project
  3. Confirm that the bug really happens on the Nexus7 device
  4. See if the bug occurs on our laptops/dev boxes (x86 running Quantal)
  5. Search the upstream launchpad projects to see if the bug is known already, and mark the Nexus7 one as a duplicate if so
  6. Search gnome bug reports to see if we can link to one
  7. File upstream LP and gnome bug reports if none exist
  8. Testing fixes from upstream
  9. Set bug priority
All of these are standard Ubuntu Bug Triage tasks that anyone from the community can do. Better yet, all of these except for #3 and maybe #8 can be done without even owning a Nexus7.

So today, I’m officially asking the community to help, if you’re interested in helping with the Nexus7 work and don’t know where to start, bug triage is a great place and we could use your help.

So how do you start?  A good place to start is by reviewing the New bug queue and then following the Triaging guide and try to get enough information so that a developer can get started on it. You should also join the #ubuntu-arm channel on freenode, where we can discuss bug status and priority. Feel free to ping me (mfisch) or Chris (cwayne) if you have questions about a bug or how to help.

You may also consider joining the Ubuntu Bug Squad, and later Ubuntu Bug Control if you want to keep doing this work generally in Ubuntu.

On a personal note: when I first started working on Ubuntu, I joined the Ubuntu Bug Squad because there’s really no better way to learn about the components of Ubuntu than to dive into a bunch of bug reports. After a while doing Bug triage work, I was very comfortable with launchpad and bug triage procedures. I also had a better feel for how the components of the system worked together, it’s amazing how much you can learn from digging into bug reports!

Read more
Matt Fischer

A new Nexus7 image just posted and can installed using the standard installer and install process. However before you rush out and re-install you should note the changes, only one of which requires you to do a full re-install.

The only fix you need to really reinstall for is this hostname fix:

The other changes are all in the kernel. These changes below can be installed by doing sudo apt-get update && sudo apt-get install linux-nexus7. This will upgrade you to version 3.1.10-7.11 of the kernel:

  • Enable ISO support
  • Enable NFS support
  • Add battery information (upower –dump now works)
  • Enable LXC support
  • Enable SND_USB_AUDIO, disable SND_HDA_INTEL

Over the next few weeks you should expect more changes to come through the standard apt-get upgrade process. These will include syncing the kernel config with the standard Ubuntu one and bug fixes in non-kernel packages.
We may not release a new image again until we have nightlies working. Please note that you should NOT enable any of the standard quantal archives and upgrade things from there. That will supercede some of our fixes like the ones in Nux and you will likely end up with an unusable system.

Read more
Matt Fischer

WARNING: This process has changed since the switch to Raring. I don’t yet have a new update for the process.

EDIT: I just added some notes about extracting files from the boot.img in the post.

Several people asked me some questions last week abut how the Nexus7 image is built and how they can hack it. Hopefully this post will help to answer some of those questions. Note that nothing described here is supported, it is just presented here to enable people interested in hacking the image to get going. This process requires the tools simg2img and make_ext4fs which you can download pre-compiled binaries for from here.

Hacking a pre-built image rootfs.img

    1. Take the rootfs.img file as input and use the tool simg2img to unpack it. It will be a large file when unpacked, 28G for the 32GB tablet, 13G for the 16GB tablet, and 6G for the 8GB tablet.

mfisch@caprica:~/build$ ./simg2img rootfs.img rootfs.ext4

    1. Mount the rootfs.ext4 file

mfisch@caprica:~/build$ sudo mount -o loop rootfs.ext4 tmpmnt/

    1. Inside tmpmnt, you’ll find the original rootfs.tar.gz. Copy this file out and unmount the directory. You can also remove the rootfs.ext4 file.


mfisch@caprica:~/build/tmpmnt$ cp rootfs.tar.gz ..
mfisch@caprica:~/build/tmpmnt$ cd ..
mfisch@caprica:~/build$ sudo umount tmpmnt/
mfisch@caprica:~/build$ rm rootfs.ext4

    1. Extract the rootfs.tar.gz


mfisch@caprica:~/build$ tar -xvzf rootfs.tar.gz

    1. The extracted filesystem is in ./binary/casper/filesystem.dir. You can copy files into and out of here or modify files.

Once you’re done with the changes, you need to rebuild the rootfs.img file. The first step is to re-tar and recompress the unpacked files.

mfisch@caprica:~/build$ tar -cvzf rebuilt.tar.gz binary/

From this point you can use the same process we use to build images and then flash them, following the process below.

Building or rebuilding a rootfs.img file

Given a tarball image, our image building script basically does a couple things:

  1. Extract the kernel and initrd from the rootfs.tar.gz
  2. Write a bootimg.cfg file out using the right values for the Nexus 7.
  3. Create a boot.img file using abootimg using the inputs of the kernel, the initrd, the bootimg.cfg. Note: We had to do some work here to make sure that the initrd was small. I think the limit was 2MB.
  4. Take the rootfs.tar.gz, and using the tool make_ext4fs, create a sparse ext4 filesystem and call the output rootfs.img.

We wrote a script to do all this which makes life easier. This process may change as we implement these image builds on cdimage.ubuntu.com and this script may not be updated, but it should be enough to get people hacking. If you do anything cool with this or have fixes for Ubuntu, please let me know or send a patch to one of our bugs.

Hacking/Rebuilding boot.img

After a few questions I decided to add a brief note about how to hack and rebuild the boot.img. It’s pretty simple and uses the abootimg tool which is in universe for quantal.

To extract the files, use abootimg -x

mfisch@caprica:~/upload-scripts$ abootimg -x boot.img
writing boot image config in bootimg.cfg
extracting kernel in zImage
extracting ramdisk in initrd.img

To rebuild, you see see the script I referenced above, or just run abootimg –create:

mfisch@caprica:~/upload-scripts$ abootimg --create newboot.img -k zImage -f ./bootimg.cfg -r initrd.img
reading config file ./bootimg.cfg
reading kernel from zImage
reading ramdisk from initrd.img
Writing Boot Image newboot.img

Read more
Matt Fischer

Soon

According to Alex Chiang, being in Copenhagen just means “more work while being tired”. Well we’ve been hard at work, and so this will be coming soon…

Read more