Canonical Voices

Posts tagged with 'tools'

Colin Ian King

One of my on-going projects is to try to reduce system activity where possible to try to shave off wasted power consumption.   One of the more interesting problems is when very short lived processes are spawned off and die and traditional tools such as ps and top sometimes don't catch that activity.   Over last weekend I wrote the bulk of the forkstat tool to track down these processes.

Forkstat uses the kernel proc connector interface to detect process activity.  Proc connector allows forkstat to receive notifications of process events such as fork, exec, exit, core dump and changing the process name in the comm field over a socket connection.

By default, forkstat will just log fork, exec and exit events, but the -e option allows one to specify one or more of the fork, exec, exit, core dump or comm events.  When a fork event occurs, forkstat will log the PID and process name of the parent and child, allowing one to easily identify where processes are originating.    Where possible, forkstat attempts to track the life time of a process and will log the duration of a processes when it exits (note: this is not an estimate of the CPU used).

The -S option to forkstat will dump out a statistical summary of activity.  This is useful to identify the frequency of processes activity and hence identifying the top offenders.

Forkstat is now available in Ubuntu 14.04 Trusty Tahr LTS.  To install forkstat use:

 sudo apt-get install forkstat  

For more information on the tool and examples of the forkstat output, visit the forkstat quick start page.

Read more
Colin Ian King

Keeping cool with thermald

The push for higher performance desktops and laptops has inevitably lead to higher power dissipation.  Laptops have also shrunk in size leading to increasing problems with removing excess heat and thermal overrun on heavily loaded high end machines.

Intel's thermald prevents machines from overheating and has been recently introduced in the Ubuntu Trusty 14.04 LTS release.  Thermald actively monitors thermal sensors and will attempt to keep the hardware cool by modifying a variety of cooling controls:
 

* Active or passive cooling devices as presented in sysfs
* The Running Average Power Limit (RAPL) driver (Sandybridge upwards)
* The Intel P-state CPU frequency driver (Sandybridge upwards)
* The Intel PowerClamp driver

Thermald has been found to be especially useful when using the Intel P-state CPU frequency scaling driver since this can push the CPU harder than other CPU frequency scaling drivers.

Over the past several weeks I've been working with Intel to shake out some final bugs and get thermald included into Ubuntu 14.04 LTS, so kudos to Srinivas Pandruvada for handling my patches and also providing a lot of timely fixes too.

By default, thermald works without any need for configuration, however, if one has incorrect thermal trip settings or other firmware related thermal zone bugs one can write one's own thermald configuration. 

For further details, consult the Ubuntu thermald wiki page.

Read more
Colin Ian King

Over the past months I have been using static code analysis tools such as smatch and Coverity Scan on various open source projects that I am involved with.  These, combined with using gcc's -Wall -Wextra have proved useful in tracking down and eliminating various bugs.

Recently I stumbled on cppcheck and gave it a spin on several larger projects.  One of the cppcheck project aims is to find errors that the compiler won't spot and also try to keep the number of false positives found to a minimum.

cppcheck is very easy to use, the default settings just work out of the box. However, for extra checking I enabled the --force option to check of all configurations and the --enable=all to report on checks to be totally thorough and pedantic.

The --enable option is especially useful. It allows one to select different types of checking, for example, coding style, execution performance, portability, unused functions and missing include files.

Even though my code has been through smatch and Coverity Scan, cppcheck still managed to find a few issues using --enable=all

1. unused functions
2. a potential memory leak with realloc(), for example:

buf = realloc(buf, new_size);
if (!buf)
     return NULL;

if realloc() fails, buf can be leaked.  A potential fix is:

tmp = realloc(buf, new_size);
if (!tmp) {
     free(buf);
     return NULL;
} else
     buf = tmp;

3. some potential sscanf buffer overflows
4. some coding style improvements, for example, local auto variables could be moved to a deeper scope

So cppcheck worked well for me.  I recommend referring to the cppcheck project wiki to check out the features and then subjecting your code to it and seeing if it can find any bugs.

Read more
Colin Ian King

fwts 13.12.00 released

Version 13.12.00 of the Firmware Test Suite has been released today.  This latest release includes some of the following new features:

  • ACPI method test, add _IPC, _IFT, _SRV tests
  • Update to version 20131115 of ACPICA
  • Check for thermal overrun messages in klog test
  • Test for CPU frequency maximum limits on cpufreq test
  • Add Ivybridge and Haswell CPU specific MSR checks
  • UEFI variable dump:
    • add more subtype support on acpi device path type  
    • handle the End of Hardware Device Path sub-type 0x01
    • add Fibre Channel Ex subtype-21 support on messaging device path type
    • add SATA subtype-18 support on messaging device path type
    • add USB WWID subtype-16 support on messaging device path type
    • add VLAN subtype-20 support on messaging device path type
    • add Device Logical Unit subtype-17 support on messaging device path typ
    • add SAS Ex subtype-22 support on messaging device path type
    • add the iSCSI subtype-19 support on messaging device path type
    • add the NVM Express namespace subtype-23 support on messaging device path type
..and also the usual bug fixes. 

For more details, please consult the release notes
 
The source tarball is available at:  http://fwts.ubuntu.com/release/fwts-V13.12.00.tar.gz and can be cloned from the repository: git://kernel.ubuntu.com/hwe/fwts.git

The fwts 13.12.00 .debs are available in the firmware testing PPA and will soon appear in Ubuntu Trusty 14.04

And thanks to Alex Hung, Ivan Hu, Keng-Yu Lin their contributions and keen eye for reviewing the numerous patches and also to Robert Moore for the on-going work on ACPICA.

Read more
Colin Ian King

Finding spelling mistakes in source code

There are quite a few useful open source utilities around for finding spelling mistakes in source code.  Most recently I've been using codespell which works well for me.

Codespell is regularly being updated and comes with a dictionary originally derived from Wikipedia. I normally pull the latest updates from the repository before running it against my source code.

Fetching it and using it is relatively simple:

 git clone git://git.profusion.mobi/users/lucas/codespell  
cd your-project-dir
/path-to-codespell/codespell.py
..and it will find common spelling mistakes in the entire project directory. Easy!

Read more
Colin Ian King

health-check revisited

Earlier this month I wrote about health-check, a tool to sanity check application resource usage.  Since then I've re-worked and simplified the system call tracing and added some new features.

Originally, health-check was designed to attach itself to running processes. To make the tool even easier to use it can now start applications and follow any subsequent new processes spawned off from fork() or clone(). For example:

sudo health-check -u joeuser -f thunderbird
..will start thunderbird (running as user "joeuser") and follow any new processes it creates.

Sometimes applications will force flushing of data or metadata to disk by excessive use of fsync(), fdatasync() and sync().  Health-check will now keep track of these system calls and provide feedback on their use.

Health-check already has some memory checking techniques - however, it now has been extended to check explicitly for heap changes by examining the brk() system call and also keeping track of mmap() and munmap() mappings.  This allows better tracking of potential memory leaks.

Source code can be found at: git://kernel.ubuntu.com/cking/health-check

Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:
 sudo add-apt-repository ppa:colin-king/white  
sudo apt-get update
sudo apt-get install health-check

Read more
Colin Ian King

health-check: a tool to diagnose resource usage

Recently I have been focused on ways to reduce power consumption on Ubuntu phones. To identify resource hungry issues one can use tools such as top, strace and gdb, or inspect per process properties in /proc/$pid, however this can be time consuming and not practical with many processes in a system. Instead, I decided it would be profitable to write a tool to inspect and monitor a running program and report on areas where it appeared to be sub-optimal.  And so health-check was born.

One provides health-check with a list of one or more processes and it will monitor all the associated threads (and child processes) and then report back on the resources used. Heath-check will report on:

  • CPU utilisation
  • Wakeup events
  • Context Switches
  • File I/O operations (Open/Read/Write/Close using fnotify)
  • System calls (using ptrace)
  • Analysis of polling system calls
  • Memory utilisation (including memory growth)
  • Network connections
  • Wakelock activity
CPU utilisation, wakeup events and context switches are useful just to see how much CPU related activity is occurring.   For example,  a program that has multiple threads that frequently ping-pongs between them will have a high context switch rate, which may indicate that it is busy passing data or messages between threads.  A process may be frequently polling on short timer delays may show up as generating a high level of wakeup events.

Some applications may be sub-optimally writing out data frequently, causing dirty pages and meta data that needs to be written back to the file system.  Health-check will capture file I/O activity and report on the names of the files being opened, read, written and closed.

To help identify excessive or heavy system call usage, health-check uses ptrace to trap and monitor all the system calls that the program makes.  For example, it has been observed that some applications excessively call poll() and nanosleep() with poorly chosen timeouts causing excessive CPU utilisation.  For system calls such as these where they can wait until an event or a timeout occur, health-check has some deeper monitoring.  It inspects the given timeout delay and checks to see if the call timed out, for example,  health-check can identify CPU sucking repeated polling where zero timeouts are being used or excessive nanosleeps with zero or negative delays.

The ptrace ability of heath-check also allow it to monitor per-process wake lock writes. Abuse of wakelocks can keep the a kernel from suspending into deep sleep so it is useful to keep track of wakelock activity on some processes. This is not enabled by default as it is an expensive operation to monitor this via ptrace and also some kernels may not have wakelocks, so one has to use the -W option to enable this.

Health-check also inspects /proc/$pid/smaps and will determine if memory utilisation has grown or shrunk.  Unusually high heap growth over time may indicate that an application has a memory leak.

Finally, health-check will inspect /proc/$pid/fd and from this determine any open sockets and then try and resolve the host names of the IP addresses.  For example, it is entirely possible for an application to be making spurious or unwanted connections to various machines, so it is helpful to check up on this kind of activity.

Health-check is still very early alpha quality, so beware of possible bugs.   However, it has been helpful in identifying some misbehaving applications, so it is already proving to be rather useful.

Source code can be found at: git://kernel.ubuntu.com/cking/health-check

Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:

 sudo add-apt-repository ppa:colin-king/white  
sudo apt-get update
sudo apt-get install health-check
..and go and track down some resource sucking apps..

Read more
Colin Ian King

The Firmware Test Suite (fwts) portal page is the first place to visit for all fwts related links.   It has links to:

  • Where to get the latest source code (git repository and tarballs)
  • PPAs for the latest and stable packages
  • Release notes (always read these to see what is new!)
  • Reference Guide / Documentation
  • How to report a bug (against firmware or fwts)
  • Release schedule, cadence and versioning
Thanks to Keng-Yu Lin for setting this up.

Read more
Colin Ian King

Kernel tracing using lttng

LTTng (Linux Trace Toolkit - next generation) is a highly efficient system tracer that allows tracing of the kernel and userspace. It also provides tools to view and analyse the gathered trace data.  So let's see how to install and use LTTng kernel tracing in Ubuntu. First, one has to install the LTTng userspace tools:

 sudo apt-get update  
sudo apt-get install lttng-tools babeltrace
LTTng was already recently added into the Ubuntu 13.10 Saucy kernel, however, with earlier releases one needs to install the LTTng kernel driver using lttng-modules-dkms as follows:

 sudo apt-get install lttng-modules-dkms  
It is a good idea to sanity check to see if the tools and driver are installed correctly, so first check to see the available kernel events on your machine:

 sudo lttng list -k  
And you should get a list similar to the following:
 Kernel events:  
-------------
mm_vmscan_kswapd_sleep (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_kswapd_wake (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_wakeup_kswapd (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_direct_reclaim_begin (loglevel: TRACE_EMERG (0)) (type: tracepoint)
mm_vmscan_memcg_reclaim_begin (loglevel: TRACE_EMERG (0)) (type: tracepoint)
..
Next, we need to create a tracing session:
 sudo lttng create examplesession  
..and enable events to be traced using:
 sudo lttng enable-event sched_process_exec -k  
One can also specify multiple events as a comma separated list. Next, start the tracing using:
 sudo lttng start  
and to stop and complete the tracing use:
 sudo lttng stop  
sudo lttng destroy
and the trace data will be saved in the directory ~/lttng-traces/examplesession-[date]-[time]/.  One can examine the trace data using the babeltrace tool, for example:
 sudo babeltrace ~/lttng-traces/examplesession-20130517-125533  
And you should get a list similar to the following:
 [12:56:04.490960303] (+?.?????????) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/bin/firefox", tid = 4892, old_tid = 4892 }  
[12:56:04.493116594] (+0.002156291) x220i sched_process_exec: { cpu_id = 0 }, { filename = "/usr/bin/which", tid = 4895, old_tid = 4895 }
[12:56:04.496291224] (+0.003174630) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/lib/firefox/firefox", tid = 4892, old_tid = 4892 }
[12:56:05.472770438] (+0.976479214) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/lib/libunity-webapps/unity-webapps-service", tid = 4910, old_tid = 4910 }
[12:56:05.478117340] (+0.005346902) x220i sched_process_exec: { cpu_id = 2 }, { filename = "/usr/bin/ubuntu-webapps-update-index", tid = 4912, old_tid = 4912 }
[12:56:10.834043409] (+5.355926069) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/top", tid = 4937, old_tid = 4937 }
[12:56:13.668306764] (+2.834263355) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/bin/ps", tid = 4938, old_tid = 4938 }
[12:56:16.047191671] (+2.378884907) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/sudo", tid = 4939, old_tid = 4939 }
[12:56:16.059363974] (+0.012172303) x220i sched_process_exec: { cpu_id = 3 }, { filename = "/usr/bin/lttng", tid = 4940, old_tid = 4940 }
The LTTng wiki contains many useful worked examples and is well worth exploring.

As it stands, LTTng is relatively light weight.   Research by Romik Guha Anjoy and Soumya Kanti Chakraborty shows that LTTng describes how the CPU overhead is ~1.6% on a Intel® CoreTM 2 Quad with four 64 bit Q9550 cores.  With measurements I've made with oprofile on a Nexus 4 with 1.5 GHz quad-core Snapdragon S4 Pro processor shows a CPU overhead of < 1% for kernel tracing.  In flight recorder mode, one can generate a lot of trace data. For example, with all tracing enabled running multiple stress tests I was able to generate ~850K second of trace data, so this will obviously impact disk I/O.

Read more
Colin Ian King

Oprofile is a powerful system wide profiler for Linux.  It can profile all running code on a system with minimal overhead.   Running oprofile requires the uncompressed vmlinux image, so one has to also install the kernel .ddeb images.

To install oprofile:

 sudo apt-get update && sudo apt-get install oprofile
..and then install the kernel .ddebs:
 echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" | \  
sudo tee -a /etc/apt/sources.list.d/ddebs.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 428D7C01
sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym
 ..the installed vmlinux image can be found in /usr/lib/debug/boot/vmlinux-$(uname-r)

Oprofile is now ready to be used.  Let's assume we want to profile the following command:
 dd if=/dev/urandom of=/dev/null bs=4K  
First, before running opcontrol, one may have to stop the NMI watchdog to free up counter 0 using the following:
 echo "0" | sudo tee /proc/sys/kernel/watchdog  
Next, we tell opcontrol the location of vmlinux, separate out kernel samples, initialize, reset profiling and start profiling:
 sudo opcontrol --vmlinux=/usr/lib/debug/boot/vmlinux-$(uname -r)  
sudo opcontrol --separate=kernel
sudo opcontrol --init
sudo opcontrol --reset
sudo opcontrol --start
 ..and run the command we want to profile for the desired duration. Next we stop profiling, generate a report for the executable we are interested in and de-initialize oprofile using:
 sudo opcontrol --stop  
sudo opreport image:/bin/dd -gl
sudo opcontrol --deinit
The resulting output from opreport is as follows:
 Using /var/lib/oprofile/samples/ for samples directory.  
warning: /kvm could not be found.
CPU: Intel Ivy Bridge microarchitecture, speed 2.501e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
55868 59.8973 vmlinux-3.9.0-0-generic sha_transform
14942 16.0196 vmlinux-3.9.0-0-generic random_poll
10971 11.7622 vmlinux-3.9.0-0-generic ftrace_define_fields_random__mix_pool_bytes
3977 4.2638 vmlinux-3.9.0-0-generic extract_buf
1905 2.0424 vmlinux-3.9.0-0-generic __mix_pool_bytes
1596 1.7111 vmlinux-3.9.0-0-generic _mix_pool_bytes
900 0.9649 vmlinux-3.9.0-0-generic __ticket_spin_lock
737 0.7902 vmlinux-3.9.0-0-generic copy_user_enhanced_fast_string
574 0.6154 vmlinux-3.9.0-0-generic perf_trace_random__extract_entropy
419 0.4492 vmlinux-3.9.0-0-generic extract_entropy_user
336 0.3602 vmlinux-3.9.0-0-generic random_fasync
146 0.1565 vmlinux-3.9.0-0-generic sha_init
133 0.1426 vmlinux-3.9.0-0-generic wait_for_completion
129 0.1383 vmlinux-3.9.0-0-generic __ticket_spin_unlock
72 0.0772 vmlinux-3.9.0-0-generic default_spin_lock_flags
69 0.0740 vmlinux-3.9.0-0-generic _copy_to_user
35 0.0375 dd /bin/dd
23 0.0247 vmlinux-3.9.0-0-generic __srcu_read_lock
22 0.0236 vmlinux-3.9.0-0-generic account
15 0.0161 vmlinux-3.9.0-0-generic fsnotify
...
This example just scratches the surface of the capabilities of oprofile. For further reading I recommend reading the oprofile manual as it contains some excellent examples.

Read more
Colin Ian King

The Firmware Test Suite (fwts) is a tool containing a large set of tests to exercise and diagnose firmware related bugs in x86 PC firmware.  So what new shiny features have appeared in the new Ubuntu Raring 13.04 release?

UEFI specific tests to exercise and stress test various UEFI run time services:
 
  * Stress test for miscellaneous run time service interfaces.
  * Test get/set time interfaces.
  * Test get/set wakeup time interfaces.
  * Test get variable interface.
  * Test get next variable name interface.
  * Test set variable interface.
  * Test query variable info interface. 
  * Set variable interface stress test.
  * Query variable info interface stress test.
  * Test Miscellaneous runtime service interfaces.

These use a new kernel driver to allow fwts to access the kernel UEFI run time interfaces.  The driver is built and installed using DKMS.

ACPI specific improvements:

  * Improved ACPI 5.0 support
  * Annotated ACPI _CRS (Current Resource Settings) dumping.

Kernel log scanning (finds and diagnoses errors as reported by the kernel):

  * Improved kernel log scanning with an additional 450 tests.

This release also includes many small bug fixes as well as minor improvements to the layout of the output of some of the tests.

Many thanks to Alex Hung, Ivan Hu, Keng-Yu Lin and Matt Fleming for all the improvements to fwts for this release.

Read more
Colin Ian King

Valgrind stack traces

Sometimes when debugging an application it is useful to generate a stack dump when a specific code path is being executed.  The valgrind tool provides a very useful and easy to use mechanism to do this:

1. Add in the following to the source file:

 #include <valgrind/valgrind.h>  
2. Generate the stack trace at the point you desire (and print a specific message) using VALGRIND_PRINTF_BACKTRACE(), for example:
 VALGRIND_PRINTF_BACKTRACE("Stack trace @ %s(), %d", __func__, __LINE__);  
3. Run the program with valgrind.  You may wish to use the --tool=none option to make valgrind run a little faster:
  valgrind --tool=none ./generate/unix/bin64/acpiexec *.dat  
4. Observe the strack trace. For example, I added this to the ACPICA acpiexec in AcpiDsInitOneObject() and got stack traces such as:
 ACPI: SSDT 0x563a480 00249 (v01 LENOVO TP-SSDT2 00000200 INTL 20061109)  
**7129** Stack trace @ AcpiDsInitOneObject(), 174 at 0x416041: VALGRIND_PRINTF_BACKTRACE (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4160A6: AcpiDsInitOneObject (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x441F76: AcpiNsWalkNamespace (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x416312: AcpiDsInitializeObjects (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x43D84D: AcpiNsLoadTable (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x450448: AcpiTbLoadNamespace (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4502F6: AcpiLoadTables (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x405D1A: AeInstallTables (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)
==7129== by 0x4052E8: main (in /home/king/repos/acpica/generate/unix/bin64/acpiexec)

There are a collection of very useful tricks to be found in the Valgrind online manual which I recommend perusing at your leisure.

Read more
Colin Ian King

Striving for better code quality.

Software is complex and is never bug free, but fortunately there are many different tools and techniques available to help to identify and catch a large class of common and obscure bugs.

Compilers provide build options that can help drive up code quality by being particularly strict to detect questionable code constructions, for example gcc's -Wall and -pedantic flags.  The gcc -Werror flag is useful during code development to ensure compilation halts with an error on warning messages, this ensures the developer will stop and fix code.

Static analysis during compilation is also a very useful technique, tools such as smatch and Concinelle can identify bugs such as deferencing of NULL pointers, checks for return values and ranges,  incorrect use of && and ||, bad use of unsigned or signed values and many more beside.  These tools were aimed for use on the Linux kernel source code, but can be used on C application source too.  Let's take a moment to see how to use smatch when building an application.

Download the dependencies:

 sudo apt-get install libxml2-dev llvm-dev libsqlite3-dev

Download and build smatch:
 mkdir ~/src  
cd ~/src
git clone git://repo.or.cz/smatch
cd smatch
make

Now build your application using smatch:
 cd ~/your_source_code  
make clean
make CHECK="~/src/smatch/smatch --full-path" \
CC=~/src/smatch/cgcc | tee warnings.log

..and inspect the warnings and errors in the file warnings.log.  Smatch will produce false-positives, so not every warning or error is necessarily buggy code.

Of course, run time profiling of programs also can catch errors.  Valgrind is an excellent run time profiler that I regularly use when developing applications to catch bugs such as memory leaks and incorrect memory read/writes. I recommend starting off using the following valgrind options:
 --leak-check=full --show-possibly-lost=yes --show-reachable=yes --malloc-fill=  

For example:
 valgrind --leak-check=full --show-possibly-lost=yes --show-reachable=yes \
--malloc-fill=ff your-program

Since the application is being run on a synthetic software CPU execution can be slow, however it is amazingly thorough and produces detailed output that is extremely helpful in cornering buggy code.

The gcc compiler also provides mechanism to instrument code for run-time analysis.  The -fmudflap family of options instruments risky pointer and array dereferencing operations, some standard library string and heap functions as well as some other range + validity tests.   For threaded applications use -fmudflapth instead of -fmudflap.   The application also needs to be linked with libmudflap.

Here is a simple example:
 int main(int argc, char **argv)  
{
static int x[100];
return x[100];
}

Compile with:
 gcc example.c -o example -fmudflap -lmudflap  

..and mudflap detects the error:
 ./example   
*******
mudflap violation 1 (check/read): time=1347817180.586313 ptr=0x701080 size=404
pc=0x7f98d3d17f01 location=`example.c:5:2 (main)'
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_check+0x41) [0x7f98d3d17f01]
./example(main+0x7a) [0x4009c6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f98d397276d]
Nearby object 1: checked region begins 0B into and ends 4B after
mudflap object 0x190a370: name=`example.c:3:13 x'
bounds=[0x701080,0x70120f] size=400 area=static check=3r/0w liveness=3
alloc time=1347817180.586261 pc=0x7f98d3d175f1
number of nearby objects: 1

These are just a few examples, however there are many other options too. Electric Fence is a useful malloc debugger, and gcc's -fstack-protector produces extra code to check for buffer overflows, for example in stack smashing. Tools like bfbtester allow us to brute force check command line overflows - this is useful as I don't know many developers who try to thoroughly validate all the options in their command line utilities.

No doubt there are many more tools and techniques available.  If we use these wisely and regularly we can reduce bugs and drive up code quality.

Read more
Colin Ian King

Counting code size with SLOCCount

David A. Wheeler's SLOCCount is a useful tool for counting lines of code in a software project.  It is simple to use, just provide it with the path to the source code and let it grind through all the source files.  The resulting output is a break down of code line count for each type of source based on the programming language.

SLOCCount also estimates development time in person-years as well as the number of developers and the cost to develop.  One can override the defaults and specify parameters such as costs per person, overhead and effort to make it match to your development model.

Of course, like all tools that produce metrics it can be abused, for example using it as a meaningless metric of programmer productivity.  Counting lines of code does not really measure project complexity, a vexing bug that took 2 days to figure out and resulted in a 1 line fix is obviously more expensive than a poorly written 500 line function that introduces a no new noticeable functionality.   As a rule of thumb, SLOCCount is a useful tool to get an idea of the size of a project and some idea of the cost to develop it.   There are of course more complex ways to examine project source code, such as cyclomatic complexity metrics, and there are specific tools such as Panopticode that do this.

As a small exercise, I gave SLOCCount the task of counting the lines of code in the Linux kernel from version 2.6.12 to 3.6 and used the default settings to produce an estimated cost to develop each version.


It is interesting to see that the rate of code being added seemed to increase around the 2.6.28 release.   So what about the estimated cost to develop?..


This is of course pure conjecture.  The total lines of code does not consider the code of some patches that remove code and assumes that the cost is directly related to lines of code.  Also, code complexity makes some lines of code far more expensive to develop than others.   It is interesting to see that each release is adding an average of 184,000 lines of code per release which SLOCCount estimates to cost about $8.14 million dollars or ~44.24 dollars per line of code; not sure how realistic that really is.

Anyhow, SLOCCount is easy to use and provides some very useful rule-of-thumb analysis on project size and costs.

Read more
Colin Ian King

Firmware Test Suite Live (fwts-live) is a USB live image that will automatically boot and run the Firmware Test Suite (fwts) - it will run on legacy BIOS and also UEFI firmware (x86_64) bit systems.

fwts-live will run a range of fwts tests and store the results on the USB stick - these can be reviewed while running fwts-live or at a later time on another computer if required.

To install fwts-live on to a USB first download either a 32 or 64 bit image from http://odm.ubuntu.com/fwts-live/ and then uncompress the image using:

 bunzip2 fwts-live-*.img.bz2  

Next insert a USB stick into your machine and unmount it. Now one has to copy the fwts-live image to the USB stick - one can find the USB device using:

 dmesg | tail -10 | grep Attached  
 [ 2525.654620] sd 6:0:0:0: [sdb] Attached SCSI removable disk  

..so the above example it is /dev/sdb, and copy using:

 sudo dd if=fwts-live-oneiric-*.img of=/dev/sdb  
 sync  

..and then remove the USB stick.

To run, insert the USB stick into the machine you want to test and then boot the machine.  This will start up fwts-live and then you will be shown a set of options - to either run all the fwts batch tests, to select individual tests to run, or abort testing and shutdown.


If you chose to run all the fwts batch tests then fwts will automatically run through a series of tests which will take a few minutes to complete:


and when complete one can chose to view the results log:


if "Yes" is selected then one can view the results. The cursor up/down and page up/down keys can be used to navigate the results log file.  When you have completed viewing the results log, fwts-live will inform you where the results have been saved on the USB stick (so that one can review them later by plugging the USB stick into a different machine).


A full user guide to fwts-live is available at: https://wiki.ubuntu.com/HardwareEnablementTeam/Documentation/FirmwareTestSuiteLive

To help interpret any errors or warnings found by fwts we recommend visiting  fwts reference guide - this is has comprehensive description of each test and detailed explanations of warnings and error messages.

Below is a demo of fwts-live running inside QEMU:

 
 
Kudos to Chris Van Hoof for producing fwts-live

Read more
Colin Ian King

Monitoring /proc/timer_stats

The /proc/timer_stats interface allows one to check on timer usage in a Linux system and hence detect any misuse of timers that can cause excessive wake up events (and also waste power).  /proc/timer_stats reports the process id (pid) of a task that initialised the timer, the name of the task, the name of the function that initialised the timer and the name of the timer callback function.  To enable timer sampling, write "1\n" to /proc/timer_stats and to disable write "0\n".

While this interface is simple to use, collecting multiple samples over a long period of time to monitor overall system behaviour takes a little more effort.   To help with this, I've written a very simple tool called eventstat that calculates the rate of events per second and can dump the data in a .csv (comma separated values) format for importing into a spreadsheet such as LibreOffice for further analysis (such as graphing).

In its basic form, eventstat will run ad infinitum and can be halted by control-C. One can also specify the sample period and number of samples to gather, for example:

 sudo eventstat 10 60  

.. this gathers samples every 10 seconds for 60 samples (which equates to 10 minutes).

The -t option specifies an events/second threshold to discard events less than this threshold, for example:  sudo cpustat -t 10 will show events running at 10Hz or higher.

To dump the samples into a .csv file, use the -r option followed by the name of the .csv file.  If you just want to collect just the samples into a .csv file and not see the statistics during the run, use also the -q option, e.g.

 sudo eventstat -q -r event-report.csv


With eventstat you can quickly identify rouge processes that cause a high frequency of wake ups.   Arguably one can do this with tools such as PowerTop, but eventstat was written to allow one to collect the event statistics over a very long period of time and then help to analyse or graph the data in tools such as Libre Office spreadsheet.


The source is available in the following git repository:  git://kernel.ubuntu.com/cking/eventstat.git and in my power management tools PPA: https://launchpad.net/~colin-king/+archive/powermanagement

In an ideal world, application developers should check their code with tools like eventstat or PowerTop to ensure that the application is not misbehaving and causing excessive wake ups especially because abuse of timers could be happening in the supporting libraries that applications may be using.

Read more
Colin Ian King

UEFI  Compatibility Support Module (CSM) provides compatibility support for traditional legacy BIOS.  This allows allows the booting an operating system that requires a traditional option ROM support, such as BIOS Int 10h video calls.

While looking at boot and runtime misbehaviour on UEFI systems I would like to know if CSM is enabled or not, but the question is how does one detect CSM support?   Well, making the assumption that CSM is generally enabled to support Int 10h video calls, we look for any video option ROMs and see if the real mode Int 10h vector is set to jump to a handler in one of the ROMs.  

Option ROMs are found in the region 0xc0000 to 0xe0000 and normally the video option ROM is found at 0xc0000.  Option ROMs are found on 512 byte boundaries with a header bytes containing 0x55, 0xaa and ROM length (divided by 512) so we just mmap in 0xc0000..xe0000 and then scan the memory for headers to locate option ROM images.  

My assumption for CSM being enabled is that Int 10h vectors into one of these option ROMs, and we can assume it is a video option ROM if it contains the string "VGA" somewhere in the ROM image.  Yes, it is a hack, but it seems to work on the range of UEFI enabled systems I've so far used.

For reference, I've put the code in my debug-code git repository and available for anyone to use.


Read more
Colin Ian King

Some problems are a little challenging to debug and require sometimes a bit of lateral thinking to solve.   One particular issue is when suspend/resume locks up and one has no idea where or why because the console has is suspended and any debug messages just don't appear.

In the past I've had to use techniques like flashing keyboard LEDs, making the PC speaker beep or even forcible rebooting the machine at known points to be able to get some idea of roughly where a hang has occurred.   This is fine, but it is tedious since we can only emit a few bits of state per iteration.   Saving state is difficult since when a machine locks up one has to reboot it and one looses debug state.   One technique is to squirrel away debug state in the real time clock (RTC) which allows one to store twenty or so bits of state, which is still quite tough going.

One project I've been working on is to use the power of system tap to instrument the entire suspend/resume code paths - every time a function is entered a hash of the name is generated and stored in the RTC.  If the machine hangs, one can then grab this hash out of the RTC can compare this to the known function names in /proc/kallsyms, and hopefully this will give some idea of where we got to before the machine hung.

However, what would be really useful is the ability to print out more debug state during suspend/resume in real time.   Normally I approach this by using a USB/serial cable and capturing console messages via this mechanism.  However, once USB is suspended, this provides no more information.

One solution I'm now using is with Kamal Mostafa's minimodem.  This wonderful tool is an implementation of a software modem and can send and receive data by emulating a Bell-type or RTTY FSK modem.  It allows me to transmit characters at 110 to 300 baud over a standard PC speaker and reliably receive them on a host machine.  If the wind is in the right direction, one can transmit at higher speeds with an audio cable plugged in the headphone jack of the transmitter and into the microphone socket on the receiver if hardware allows.

The 8254 Programmable Interval-timer on a PC can be used to generate a square wave at a predefined frequency and can be connected to the PC speaker to emit a beep.  Sending data using the speaker to minimodem is a case of sending a 500ms leader tone, then emitting characters.  Each character has a 1 baud space tone, followed by 8 bits (least significant bit first) with a zero being a 1 baud space tone and a 1 being represented by a 1 baud mark tone, and the a trailing bunch of stop bits.

So using a prototype driver written by Kamal, I tweaked the code and put it into my suspend/resume SystemTap script and now I can dump out messages over the PC speaker and decode them using minimodem.  300 baud may not be speedy, but I am able to now instrument and trace through the entire suspend/resume path.

The SystemTap scripts are "work-in-progress" (i.e. if it breaks you keep the pieces), but can be found in my pmdebug git repo git://kernel.ubuntu.com/cking/pmdebug.git.  The README file gives a quick run down of how to use this script and I have written up a full set of instructions.

The caveat to this is that one requires a PC where one can beep the PC speaker using the PIT.  Lots of modern machines seem to either have this disabled, or the volume somehow under the control of the Intel HDA audio driver.  Anyhow, kudos to Kamal for providing minimodem and giving me the prototype kernel driver to allow me to plug this into a SystemTap scrip.


Read more
Colin Ian King

Today my colleague Chris Van Hoof pointed me to a Gource visualization of the work I've been doing on the Firmware Test Suite.  Gource animates the software development sources as a tree with the root in the centre of the display and directories as branches and source files as leaves.


Static pictures do this no justice. I've uploaded an mp4 video of the entire software development history of fwts so you can see Gource in action.

To generate the video, the following incantation was used:

 gource -s 0.03 --auto-skip-seconds 0.1 --file-idle-time 500 \  
 --multi-sampling -1280x720 --stop-at-end \  
 --output-ppm-stream - | ffmpeg -y -r 24 \  
 -f image2pipe -vcodec ppm -i - -b 2048K fwts.mp4  

..kudos to Chris for this rune.


Read more
Colin Ian King

Dumping UEFI variables

UEFI variables in Linux can be found in /sys/firmware/efi/vars on UEFI firmware based machine, however, the raw variable data is in a binary format and hence not in a human readable form.   The Ubuntu Natty firmware test suite contains the uefidump tool to extract and decode the binary data into a more human readable form.

To run, use:

sudo fwts uefidump -


and you will see something similar to the following:

Name: AuthVarKeyDatabase.
  GUID: aaf32c78-947b-439a-a180-2e144ec37792
  Attr: 0x17 (NonVolatile,BootServ,RunTime).
  Size: 1 bytes of data.
  Data: 0000: 00                                               .

Name: Boot0000.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: Primary Master Harddisk
  Path: \BIOS(2,0,Primary Master Harddisk).

Name: Boot0001.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI Internal Shell
  Path: \Unknown-MEDIA-DEV-PATH(0x7)\Unknown-MEDIA-DEV-PATH(0x6).

Name: Boot0003.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: ubuntu
  Path: \HARDDRIVE(1,22,9897,0f52a6e132775546,ab,f6)\FILE('\EFI\ubuntu\grubx64.efi').

Name: Boot0004.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI DVD/CDROM
  Path: \ACPI(0xa0341d0,0x0)\PCI(0x2,0x1f)\ATAPI(0x0,0x1,0x0).

Name: BootOptionSupport.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  BootOptionSupport: 0x0303.

Name: BootOrder.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Boot Order: 0x0003,0x0000,0x0001,0x0004,0x0005,0x0006.

Name: ConIn.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7f010400)\USBCLASS(0xffff,0xffff,0x3,0x1,0x1).

Name: ConInDev.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7fff0400).

Name: Setup.
  GUID: 038bcef0-21e2-49d1-a47c-b7257296b980
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Size: 114 bytes of data.
  Data: 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0070: 01 00   
..

The tool will try to decode the binary data, however, if it cannot identify the variable type it will resort to doing a hex dump of the data instead.


Read more