Canonical Voices

Colin Ian King

UEFI EDK II Revisited

My colleague Manoj Iyer has written up a guide on how to download EDK II and build the UEFI firmware for QEMU.   This requires older versions of gcc found in Natty (since the newer Oneiric version is more pedantic and uses -Werror=unused-but-set-variable by default).

With a chroot I was able get it downloaded, built and tested in less than 40 minutes.   Here is a sample UEFI helloworld application running in QEMU using the firmware using Manoj's instructions.


Now I can rig up some tests to exercise Ubuntu and the Firmware Test Suite without the need for any real UEFI hardware..


Read more
Colin Ian King

PCI Interrupt Routing

Understanding PCI Interrupt Routing on the x86 platform is a not entirely straight forward.  The underlining principle is determining which interrupt is being asserted when a PCI interrupt signal occurs. Unfortunately this is generally platform specific and so firmware tables of various types have been used over the many years to describe the routing configuration.

While looking into the legacy PCI interrupt routing tables I found an excellent article by FreeBSD kernel hacker John Baldwin that explains PCI interrupt routing in a clear an succinct manner.   Although written for FreeBSD, this article is contains a lot of Linux relevant information.


Read more
Colin Ian King

Dennis Ritchie, R.I.P.

Dennis Ritchie has passed away. He gave us C and UNIX and much more beside. My tribute to Dennis Ritchie is as follows:

 #include <stdio.h>  
 #include <stdlib.h>  
 #define K continue  
 #define t /*|+$-*/9  
 #define _l /*+$*/25  
 #define s/*&|+*/0xD  
 #define _/*&|+*/0xC  
 #define _o/*|+$-*/2  
 #define _1/*|+$-*/3  
 #define _0/*|+$*/16  
 #define J/*&|*/case  
 char typedef signed   
  B;typedef H;H main(  
  ){B I['F'],V=0,E[]=  
   {s,0,s,31,t,1,s,111  
   ,_,t,-3,_,s,50,_l,-  
    1,t,1,s,0x48>>2,_l,  
    -2,_,_1,5,_o,s,0,_1  
     ,-8,s,0,s,-65,t,75,  
     s,100,_,t,8,_,_1,-5  
      ,s,82,t,32,s,111,s,  
      20,_l,-2,t,7,_,_1,5  
       ,_o,s,0,_1,-8,s,0,\  
      _0,};B*P=E;while(P)  
      {B L=*P,l=*(P+1),U=  
     I[V-1],A=(L>>2)&1,C  
     =(V-(1-A)),i;switch  
    (L)while(0){J _l:i=  
    l>0?U>>l:U<<-l;K;J\  
   t:i=U+l;K;J _:i=U;  
   K;J s:i=l;K;J _o:  
  putchar(U);K;J _1:  
  P+=U?0:l;K;J _0:e\  
 xit(0);}C[I]=(L&8)?  
 i:I[C];P+=(L&1)+1;V  
 +=A-((L&2)>>1);}re\  
 turn/*c.i.king*/0;}  

You can download the source here.


Read more
Colin Ian King

Today my colleague Chris Van Hoof pointed me to a Gource visualization of the work I've been doing on the Firmware Test Suite.  Gource animates the software development sources as a tree with the root in the centre of the display and directories as branches and source files as leaves.


Static pictures do this no justice. I've uploaded an mp4 video of the entire software development history of fwts so you can see Gource in action.

To generate the video, the following incantation was used:

 gource -s 0.03 --auto-skip-seconds 0.1 --file-idle-time 500 \  
 --multi-sampling -1280x720 --stop-at-end \  
 --output-ppm-stream - | ffmpeg -y -r 24 \  
 -f image2pipe -vcodec ppm -i - -b 2048K fwts.mp4  

..kudos to Chris for this rune.


Read more
Colin Ian King

Exponential Growth of Patents

The United States Patent and Trademark Office (USPTO) recently publish an interesting article about the millions of patents issued by the United States of America.  Using the current numbering system, patent #1 was issued in 1836 and patent #8,000,000 was recently issued in August this year.   I plotted issue date against patent number and lo and behold we get exponential growth of patents since the turn of the 20th century:

At this rate, we will see 8 million more patents issued by the end of 2016.  Intellectual property is abundant, perhaps too much so.  Are all these patents totally valid?  Is there any kind of quality control being applied?  Personally, I doubt it.  I don't want to be alarmist, but I really think this is getting totally out of control.

Patents are used as trading tokens as big businesses wage war against each other.  Companies are loading their war-chests with patent portfolios to block rivals from bringing to market innovative new products which leads to a product monoculture.   More perversely, patents are being used so sue users of technology rather than manufacturers.  Ultimately the consumer is the loser and patent lawyers and big business are the winners -  that's the price for patents protecting innovation.


Read more
Colin Ian King

More interesting uses for SystemTap

Now and again while debugging systems I would like to be able to evaluate an ACPI method or object and see what it returns.    To solve this problem in a generic way I put a little bit of effort today in developing a short SystemTap script that allows me to run acpi_evaluate_object() on a given named object and to dump out any returned data.

I've wrapped the gory details up into a small wrapper function called which is passed just the name of the object to evaluate and if necessary an acpi_object_list containing the arguments to be passed into a method call.   For methods that don't require arguments, we end up with a simple call like the  following:

                acpi_eval("_BIF", NULL);

..and for more complex examples, with arguments, we have:

        struct acpi_object_list arg_list;
        union acpi_object args[1];

        args[0].type = ACPI_TYPE_INTEGER;
        args[0].integer.value = 1; 
        arg_list.count = 1;
        arg_list.pointer = args; 
        acpi_eval("_WAK", &arg_list);

The script is designed to allow one to hack away and add in the calls to the objects that one requires to be evaluated, quick-n-dirty, but it does the job.

It was surprisingly easy to get this up and running with SystemTap once I had figured how to dump out the evalated objects to the tty - the script is fairly compact and the main bulk of the code is a terse error message table.

The script can be found it my SystemTap git repo: git://kernel.ubuntu.com/cking/systemtap-scripts.git


Read more
Colin Ian King

Proprietary Code - Where do we draw the line?

I can't help being amused when users say they chose not to use a specific Open Source distribution because it contains binary drivers and hence is not totally free.  I too really care that we have software freedom and try to work towards a totally free Operating System but where do we draw the line?

Some users state that they won't touch a specific brand of hardware such as Wireless or Video because one has to use a binary driver or that it contains firmware that is not Open Source.  While this is an admirable philosophical stance it has its blind-spots. For example,  laptops contain Embedded Controllers to do a variety of hardware interfacing tasks - do we refuse to use these laptops because the firmware in the Embedded Controllers are not Open Source?  Or how about the ACPI AML code that appears in the DSDT and SSDTs - so should we boot the machine with ACPI disabled because this code is not Open Source?

Taking it further, what about the Microcode inside the processor?  This binary blob is loaded by BIOS updated by the Operating System to fix subtle features in Microprocessors post-release.  So, should we stop using this because Intel won't supply us the source?

So at what point do we stop using a system because it is not fully Open Source?  OK, so I've taking the argument to its logical conclusion to stretch the point.   I fully understand that it is totally desirable to avoid using Closed Source binary blobs where possible and trying to keep a system totally Open Source keeps us honest.  However, sometimes I find the purest viewpoint rather blinkered if it refuses to use a specific distribution when their machine is riddled with Closed Source binary firmware blobs.  Perhaps they should start working on the BIOS vendors and Intel to release their code..


Read more
Colin Ian King

Why is my CPU Frequency Limited?

Sometimes the Scaling Maximum Frequency of a CPU is reduced below that of the possible top frequency and finding out why this is can be problematic.  The limitation could have been imposed by:

* Thermal limits
* Hardware limitations (e.g. ACPI _PPC object).
* Program that wrote to a /sys/devices/cpu/cpu*/cpufreq/scaling_max_freq

Fortunately Thomas Renninger introduced /sys/devices/cpu*/cpufreq/bios_limit that exports to user space the BIOS limited maximum frequency for each CPU.  This feature is available in Ubuntu Maverick 10.10 upwards.

So, if you have a machine that you believe should have CPUs running at a higher frequency, inspect the bios_limit files to see if the BIOS is mis-configured.


Read more
Colin Ian King

SystemTap provides a flexible programming language to prototype debugging scripts very quickly.  Sometimes however, one has to use "embedded C" functions in a SystemTap script to interface more deeply with the kernel. 

Today I was writing a script to dump out ACPI object names and required some embedded C in my SystemTap script to walk the ACPI namespace and this required a C callback function.   However, inside the C callback I wanted to print the handle and name of the ACPI object but couldn't figure out how to use the native SystemTap print() functions from within embedded C code.    So I crufted up a simple "HelloWorld" SystemTap script and ran it with -k to keep the temporary sources and then had a look at the automagically generated code.

It appears that SystemTap converts the script print statements into _stp_printf()  C calls, so I just plugged these into my C callback instead of using printk().  Now my output goes via the underlying SystemTap print mechanism and appears on the tty rather than going to the kernel log.  Bit of a hack, but the result is easy to use.  I wish it was documented though.

Here is a sample of the original script to illustrate the point:

 %{  
 #include <acpi/acpi.h>  
   
 static acpi_status dump_name(acpi_handle handle, u32 lvl, void *context, void **rv)  
 {  
     struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER};  
     int *count = (int*)context;  
   
     if (!ACPI_FAILURE(acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer))) {  
         _stp_printf(" %lx %s\n", handle, (char*)buffer.pointer);  
         kfree(buffer.pointer);  
         (*count)++;  
     }  
     return AE_OK;  
 }  
   
 ...  
 %}  


Read more
Colin Ian King

Mac Mini rebooting tweaks: setpci -s 0:1f.0 0xa4.b=0

Last night I was asked why Mac Minis require "setpci -s 0:1f.0 0xa4.b=0" to force the Mac to auto-reboot in the event of a power failure.  Well, after a lot of Googling around I found that this setpci rune is quoted in a lot of places and at a guess probably originated from advice on the Mythical Beasts website. However, the explanation of what this rune actually did was distinctly lacking.

So, why is it required?

After some more searching around I found that device 00:1f.0 on the Mac Mini refers to:

00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)

..so my next step was to figure out why writing a zero byte to register 0xa4 on this device allows the Mac Mini to reboot. I located and download the ICH7 PDF from Intel and register at offset 0xa4 can be found in section 10.8.1.3.  This refers to GEN_PMCON_3—General PM Configuration 3 Register.   Even though the setpci command is clearing this whole register, I suspect we are just interested in clearing bit zero. The PDF states:

"AFTERG3_EN — R/W. This bit determines what state to go to when power is re-applied after a power failure (G3 state). This bit is in the RTC well and is not cleared by any type of reset except writes to CF9h or RTCRST#.

0 = System will return to S0 state (boot) after power is re-applied.
1 = System will return to the S5 state (except if it was in S4, in which case it will return to S4). In the S5 state, the only enabled wake event is the Power Button or any enabled wake event that was preserved through the power failure."

So, it looks like the "setpci -s 0:1f.0 0xa4.b=0" magic is just to return to a S0 (boot state) after power is re-applied after a power failure.   All is explained, so not so magical after all.


Read more
Colin Ian King

Tweaking partitions for optimal use of the HDD

By default Ubuntu is installed with the root filesystem at the start of the disk drive and with swap right at the end.    If one analyses the read/write performance of a hard disk drive (HDD) one will quickly spot that the I/O rates differ depending on the physical location of the data.

From the relatively small sample of laptop and desktop drives that I've looked at it seems that reads from the logical start of the drive are fastest and drop off down to roughly half that rate near the end of the drive.    The rate is higher for data on the outer tracks (because there are more data sectors) and lower toward the inner tracks (fewer data sectors).

Since my new 7200rpm 250GB drive performs fastest at the lowest logical block locations,  it makes sense to construct my partitions to utilise this.  For my configuration, I want to load my kernels and initrd in quickly and be able to swap and hibernate fairly quickly too.  Next I want applications to load quickly, and my user data (such as mp3s, cached Email, etc) I care less about for performance.   So, with these constraints, I created separate partitions in this order:

1st /boot (ext4), 2nd swap, 3rd / (ext4) and 4th /home (ext4).

Some quick'n'dirty write benchmarks show me that:


/boot : 84.74 MB/s
swap  : 84.44 MB/s
/     : 82.26 MB/s
/home : 73.30 MB/s

..so this should make booting, swapping and hibernating just slightly faster.   Over the lifetime of the drive the random file writes and deletions in /home won't cause /boot new kernels and initrd images to be fragmented because the are on separate partitions.   Also I can avoid over-writing all my user data in /home if I do a clean installation of Ubuntu into /boot and / at a later date.


Read more
Colin Ian King

Laptop HDD woes

I do quite a bit of international travelling and my old klunky Lenovo 3000N200 takes a few knocks and consequently I've had to purchase my 2nd HDD for this laptop in the past 3.5 years.


Last week my laptop hung for tens of seconds while logging in - and once more again today.  Looking at the kernel log I was able to see repeated time-outs on read errors which was a little alarming.   The palimpsest utility showed that I had a few bad sectors and there were a few pending to be remapped.   I had a quick look at the S.M.A.R.T. data using:

sudo smartctl -d ata -a /dev/sda

..and saw that I'd got 5311 hours of use out of the drive and considering I bought it about 400 days ago works out to be ~13.25 hours of usage per day on average.  Peeking at  /sys/fs/ext4/sda*/lifetime_write_kbytes it appeared I had written 1.4TB of data, which works out to be 0.27GB of writes per hour of use on average - which sounds fair as my laptop is mainly used for Web, Email and the occasional bit of compilation (as I do most kernel builds on large servers).

So what do I replace it with?  Well, being a cheapskate, I did not want to splash out on an expensive SSD on this relatively old laptop (which I will palm off to my kids fairly soon), so I went for an spinny disk upgrade.  My original drive was a 160GB 5400rpm WD1600BEVT - this time I spent an extra £5 and got a 2500GB 7200rpm WD2500BEKT with double the internal cache and improved read performance - the postage was free from dabs.com so double win.


Read more
Colin Ian King

Formatting Source Code in Blogs

At last, I've found a useful tool for producing correctly formatted source code for the inclusion into my blog.

Thanks to codeformatter, one can paste in source, select the appropriate formatting style options and produce blog formatted output to paste into one's blog articles! Easy!


Read more
Colin Ian King

Reading MTRRs via the MTRRIOC_GET_ENTRY ioctl()

The MTRRIOC_GET_ENTRY ioctl() is a useful but under-used ioctl() for reading the MTRR configuration from /proc/mtrr.  Instead of having to read and parse /proc/mtrr, the ioctl() provides a simple interface to easily fetch each MTRR.

A struct mtrr_gentry is passed to the ioctl() with the regnum member set to the MTRR register one wants to read. After a successful ioctl() call, size member of struct mtrr_gentry is less than 1 if the MTRR is disabled, otherwise it is populated with the MTRR register configuration.

Below is an example showing how to use MTRRIOC_GET_ENTRY:
pre.CICodeFormatter{ font-family:arial; font-size:12px; border:1px dashed #CCCCCC; width:99%; height:auto; overflow:auto; background:#f0f0f0; line-height:20px; background-image:URL(http://2.bp.blogspot.com/_z5ltvMQPaa8/SjJXr_U2YBI/AAAAAAAAAAM/46OqEP32CJ8/s320/codebg.gif); padding:0px; color:#000000; text-align:left; } pre.CICodeFormatter code{ color:#000000; word-wrap:normal; }

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <string.h>  
 #include <sys/types.h>  
 #include <sys/stat.h>  
 #include <sys/ioctl.h>  
 #include <asm/mtrr.h>  
 #include <fcntl.h>  
   
 #define LONGSZ    ((int)(sizeof(long)<<1))  
   
 int main(int argc, char *argv[])  
 {  
     struct mtrr_gentry gentry;  
     int fd;  
   
     static char *mtrr_type[] = {  
         "Uncachable",  
         "Write Combining",  
         "Unknown",  
         "Unknown",  
         "Write Through",  
         "Write Protect",  
         "Write Back"  
     };  
   
     if ((fd = open("/proc/mtrr", O_RDONLY, 0)) < 0) {  
         fprintf(stderr, "Cannot open /proc/mtrr!\n");  
         exit(EXIT_FAILURE);  
     }  
           
     memset(&gentry, 0, sizeof(gentry));  
   
     while (!ioctl(fd, MTRRIOC_GET_ENTRY, &gentry)) {  
         if (gentry.size < 1)   
             printf("%u: Disabled\n", gentry.regnum);  
         else  
             printf("%u: 0x%*.*lx..0x%*.*lx %s\n", gentry.regnum,  
                 LONGSZ, LONGSZ, gentry.base,   
                 LONGSZ, LONGSZ, gentry.base + gentry.size,  
                 mtrr_type[gentry.type]);  
         gentry.regnum++;  
     }  
     close(fd);  
   
     exit(EXIT_SUCCESS);  
 }  


Read more
Colin Ian King

This week I'm attending the Linux Plumbers Conference in Santa Rosa, CA.  Yesterday I gave a brief presentation of the Firmware Test Suite in the Development Tools, and for reference, I've uploaded the slides here.


Read more
Colin Ian King

Dumping UEFI variables

UEFI variables in Linux can be found in /sys/firmware/efi/vars on UEFI firmware based machine, however, the raw variable data is in a binary format and hence not in a human readable form.   The Ubuntu Natty firmware test suite contains the uefidump tool to extract and decode the binary data into a more human readable form.

To run, use:

sudo fwts uefidump -


and you will see something similar to the following:

Name: AuthVarKeyDatabase.
  GUID: aaf32c78-947b-439a-a180-2e144ec37792
  Attr: 0x17 (NonVolatile,BootServ,RunTime).
  Size: 1 bytes of data.
  Data: 0000: 00                                               .

Name: Boot0000.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: Primary Master Harddisk
  Path: \BIOS(2,0,Primary Master Harddisk).

Name: Boot0001.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI Internal Shell
  Path: \Unknown-MEDIA-DEV-PATH(0x7)\Unknown-MEDIA-DEV-PATH(0x6).

Name: Boot0003.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: ubuntu
  Path: \HARDDRIVE(1,22,9897,0f52a6e132775546,ab,f6)\FILE('\EFI\ubuntu\grubx64.efi').

Name: Boot0004.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI DVD/CDROM
  Path: \ACPI(0xa0341d0,0x0)\PCI(0x2,0x1f)\ATAPI(0x0,0x1,0x0).

Name: BootOptionSupport.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  BootOptionSupport: 0x0303.

Name: BootOrder.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Boot Order: 0x0003,0x0000,0x0001,0x0004,0x0005,0x0006.

Name: ConIn.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7f010400)\USBCLASS(0xffff,0xffff,0x3,0x1,0x1).

Name: ConInDev.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7fff0400).

Name: Setup.
  GUID: 038bcef0-21e2-49d1-a47c-b7257296b980
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Size: 114 bytes of data.
  Data: 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0070: 01 00   
..

The tool will try to decode the binary data, however, if it cannot identify the variable type it will resort to doing a hex dump of the data instead.


Read more
Colin Ian King

Firmware Test Suite reference guide.

I've been working away polishing up the Firmware Test Suite for Ubuntu Natty 11.04 and to complement the tool I have eventually got around to writing up a reference guide for the tool.   The guide can be found at:

https://wiki.ubuntu.com/Kernel/Reference/fwts

This complements the rather terse man page and ever terser fwts --help output.


Read more
Colin Ian King

I've now competed the documentation of the Firmware Test Suite and this include documenting each of the 50+ tests (which was a bit of a typing marathon).  Each test has a brief description of what the test covers, example output from the test, how to run the test (possibly with different options) and explanations of test failure messages.

For example of the per-test documentation, check out the the suspend/resume test page and the ACPI tables test page.

I hope this is useful!


Read more
Colin Ian King

The Firmware Test Suite (fwts) is still a relatively new tool and hence this cycle I've still been adding some features and fixing bugs.  I've been running fwts against large data sets to soak test the tool to catch a lot of stupid corner cases (e.g. broken ACPI tables). Also, I am focused on getting some better documentation written (this is still "work in progress").

New tests for the Oneiric 11.10 release are as follows:

mpcheck:
    Sanity check tables against the MultiProcessor Specification (MPS). For more information about MPS, see the wikipedia MPS page.

mpdump:
    Dump annotated MPS tables.

msr:
    Sanity check Model Specific Registers across all CPUs. Does some form of MSR default field sanity checking.

s3power:
    Very simple suspend (S3) power checks.  This puts the machine into suspend and attempts to measure the power consumed while suspended. Generally this test gives more accurate results the longer one suspends the machine.  Your mileage may vary on this test.

ebdadump:
     Hex dump of the Extended BIOS Data Area.

In addition to the above, the fwts "method" test is now expanded to evaluate and exercise over 90 ACPI objects and methods.

One can also join the fwts mailing list by going to the project page and subscribing.


Read more
Colin Ian King

The dwarves package contains a set of useful tools that use the DWARF information placed in the ELF binaries by the compiler. Utilities in the dwarves package include:

pahole: This will find alignment holes in structs and classes in languages such as C/C++.  With careful repacking one can achieve better cache hits.  I could have done with this when optimising some code a few years back...

codiff:  This is a diff like tool use to compare the effect a change in source code can create on the compiled code.

pfunct:  This displays information about functions, inlines, goto labels, function size, number of variables and much more.

pdwtags: A DWARF information pretty-printer.

pglobal:  Dump out global symbols.

prefcnt:  A DWARF tags usage count.

dtagnames: Will lists tag names.

So, using pglobal, I was able to quickly check which variables I had made global (or accidentally not made them static!) on some code that I was developing as follows:

pglobal -v progname

and the same for functions:

pglobal -f progname

Easy!  Obviously these tools only work if the DWARF information is not stripped out.

All in all, these are really useful tools to know about and will help me in producing better code in the future.


Read more