Canonical Voices

Colin Ian King

Why is my CPU Frequency Limited?

Sometimes the Scaling Maximum Frequency of a CPU is reduced below that of the possible top frequency and finding out why this is can be problematic.  The limitation could have been imposed by:

* Thermal limits
* Hardware limitations (e.g. ACPI _PPC object).
* Program that wrote to a /sys/devices/cpu/cpu*/cpufreq/scaling_max_freq

Fortunately Thomas Renninger introduced /sys/devices/cpu*/cpufreq/bios_limit that exports to user space the BIOS limited maximum frequency for each CPU.  This feature is available in Ubuntu Maverick 10.10 upwards.

So, if you have a machine that you believe should have CPUs running at a higher frequency, inspect the bios_limit files to see if the BIOS is mis-configured.


Read more
Colin Ian King

SystemTap provides a flexible programming language to prototype debugging scripts very quickly.  Sometimes however, one has to use "embedded C" functions in a SystemTap script to interface more deeply with the kernel. 

Today I was writing a script to dump out ACPI object names and required some embedded C in my SystemTap script to walk the ACPI namespace and this required a C callback function.   However, inside the C callback I wanted to print the handle and name of the ACPI object but couldn't figure out how to use the native SystemTap print() functions from within embedded C code.    So I crufted up a simple "HelloWorld" SystemTap script and ran it with -k to keep the temporary sources and then had a look at the automagically generated code.

It appears that SystemTap converts the script print statements into _stp_printf()  C calls, so I just plugged these into my C callback instead of using printk().  Now my output goes via the underlying SystemTap print mechanism and appears on the tty rather than going to the kernel log.  Bit of a hack, but the result is easy to use.  I wish it was documented though.

Here is a sample of the original script to illustrate the point:

 %{  
 #include <acpi/acpi.h>  
   
 static acpi_status dump_name(acpi_handle handle, u32 lvl, void *context, void **rv)  
 {  
     struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER};  
     int *count = (int*)context;  
   
     if (!ACPI_FAILURE(acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer))) {  
         _stp_printf(" %lx %s\n", handle, (char*)buffer.pointer);  
         kfree(buffer.pointer);  
         (*count)++;  
     }  
     return AE_OK;  
 }  
   
 ...  
 %}  


Read more
Colin Ian King

Mac Mini rebooting tweaks: setpci -s 0:1f.0 0xa4.b=0

Last night I was asked why Mac Minis require "setpci -s 0:1f.0 0xa4.b=0" to force the Mac to auto-reboot in the event of a power failure.  Well, after a lot of Googling around I found that this setpci rune is quoted in a lot of places and at a guess probably originated from advice on the Mythical Beasts website. However, the explanation of what this rune actually did was distinctly lacking.

So, why is it required?

After some more searching around I found that device 00:1f.0 on the Mac Mini refers to:

00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)

..so my next step was to figure out why writing a zero byte to register 0xa4 on this device allows the Mac Mini to reboot. I located and download the ICH7 PDF from Intel and register at offset 0xa4 can be found in section 10.8.1.3.  This refers to GEN_PMCON_3—General PM Configuration 3 Register.   Even though the setpci command is clearing this whole register, I suspect we are just interested in clearing bit zero. The PDF states:

"AFTERG3_EN — R/W. This bit determines what state to go to when power is re-applied after a power failure (G3 state). This bit is in the RTC well and is not cleared by any type of reset except writes to CF9h or RTCRST#.

0 = System will return to S0 state (boot) after power is re-applied.
1 = System will return to the S5 state (except if it was in S4, in which case it will return to S4). In the S5 state, the only enabled wake event is the Power Button or any enabled wake event that was preserved through the power failure."

So, it looks like the "setpci -s 0:1f.0 0xa4.b=0" magic is just to return to a S0 (boot state) after power is re-applied after a power failure.   All is explained, so not so magical after all.


Read more
Colin Ian King

Tweaking partitions for optimal use of the HDD

By default Ubuntu is installed with the root filesystem at the start of the disk drive and with swap right at the end.    If one analyses the read/write performance of a hard disk drive (HDD) one will quickly spot that the I/O rates differ depending on the physical location of the data.

From the relatively small sample of laptop and desktop drives that I've looked at it seems that reads from the logical start of the drive are fastest and drop off down to roughly half that rate near the end of the drive.    The rate is higher for data on the outer tracks (because there are more data sectors) and lower toward the inner tracks (fewer data sectors).

Since my new 7200rpm 250GB drive performs fastest at the lowest logical block locations,  it makes sense to construct my partitions to utilise this.  For my configuration, I want to load my kernels and initrd in quickly and be able to swap and hibernate fairly quickly too.  Next I want applications to load quickly, and my user data (such as mp3s, cached Email, etc) I care less about for performance.   So, with these constraints, I created separate partitions in this order:

1st /boot (ext4), 2nd swap, 3rd / (ext4) and 4th /home (ext4).

Some quick'n'dirty write benchmarks show me that:


/boot : 84.74 MB/s
swap  : 84.44 MB/s
/     : 82.26 MB/s
/home : 73.30 MB/s

..so this should make booting, swapping and hibernating just slightly faster.   Over the lifetime of the drive the random file writes and deletions in /home won't cause /boot new kernels and initrd images to be fragmented because the are on separate partitions.   Also I can avoid over-writing all my user data in /home if I do a clean installation of Ubuntu into /boot and / at a later date.


Read more
Colin Ian King

Laptop HDD woes

I do quite a bit of international travelling and my old klunky Lenovo 3000N200 takes a few knocks and consequently I've had to purchase my 2nd HDD for this laptop in the past 3.5 years.


Last week my laptop hung for tens of seconds while logging in - and once more again today.  Looking at the kernel log I was able to see repeated time-outs on read errors which was a little alarming.   The palimpsest utility showed that I had a few bad sectors and there were a few pending to be remapped.   I had a quick look at the S.M.A.R.T. data using:

sudo smartctl -d ata -a /dev/sda

..and saw that I'd got 5311 hours of use out of the drive and considering I bought it about 400 days ago works out to be ~13.25 hours of usage per day on average.  Peeking at  /sys/fs/ext4/sda*/lifetime_write_kbytes it appeared I had written 1.4TB of data, which works out to be 0.27GB of writes per hour of use on average - which sounds fair as my laptop is mainly used for Web, Email and the occasional bit of compilation (as I do most kernel builds on large servers).

So what do I replace it with?  Well, being a cheapskate, I did not want to splash out on an expensive SSD on this relatively old laptop (which I will palm off to my kids fairly soon), so I went for an spinny disk upgrade.  My original drive was a 160GB 5400rpm WD1600BEVT - this time I spent an extra £5 and got a 2500GB 7200rpm WD2500BEKT with double the internal cache and improved read performance - the postage was free from dabs.com so double win.


Read more
Colin Ian King

Formatting Source Code in Blogs

At last, I've found a useful tool for producing correctly formatted source code for the inclusion into my blog.

Thanks to codeformatter, one can paste in source, select the appropriate formatting style options and produce blog formatted output to paste into one's blog articles! Easy!


Read more
Colin Ian King

Reading MTRRs via the MTRRIOC_GET_ENTRY ioctl()

The MTRRIOC_GET_ENTRY ioctl() is a useful but under-used ioctl() for reading the MTRR configuration from /proc/mtrr.  Instead of having to read and parse /proc/mtrr, the ioctl() provides a simple interface to easily fetch each MTRR.

A struct mtrr_gentry is passed to the ioctl() with the regnum member set to the MTRR register one wants to read. After a successful ioctl() call, size member of struct mtrr_gentry is less than 1 if the MTRR is disabled, otherwise it is populated with the MTRR register configuration.

Below is an example showing how to use MTRRIOC_GET_ENTRY:
pre.CICodeFormatter{ font-family:arial; font-size:12px; border:1px dashed #CCCCCC; width:99%; height:auto; overflow:auto; background:#f0f0f0; line-height:20px; background-image:URL(http://2.bp.blogspot.com/_z5ltvMQPaa8/SjJXr_U2YBI/AAAAAAAAAAM/46OqEP32CJ8/s320/codebg.gif); padding:0px; color:#000000; text-align:left; } pre.CICodeFormatter code{ color:#000000; word-wrap:normal; }

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <string.h>  
 #include <sys/types.h>  
 #include <sys/stat.h>  
 #include <sys/ioctl.h>  
 #include <asm/mtrr.h>  
 #include <fcntl.h>  
   
 #define LONGSZ    ((int)(sizeof(long)<<1))  
   
 int main(int argc, char *argv[])  
 {  
     struct mtrr_gentry gentry;  
     int fd;  
   
     static char *mtrr_type[] = {  
         "Uncachable",  
         "Write Combining",  
         "Unknown",  
         "Unknown",  
         "Write Through",  
         "Write Protect",  
         "Write Back"  
     };  
   
     if ((fd = open("/proc/mtrr", O_RDONLY, 0)) < 0) {  
         fprintf(stderr, "Cannot open /proc/mtrr!\n");  
         exit(EXIT_FAILURE);  
     }  
           
     memset(&gentry, 0, sizeof(gentry));  
   
     while (!ioctl(fd, MTRRIOC_GET_ENTRY, &gentry)) {  
         if (gentry.size < 1)   
             printf("%u: Disabled\n", gentry.regnum);  
         else  
             printf("%u: 0x%*.*lx..0x%*.*lx %s\n", gentry.regnum,  
                 LONGSZ, LONGSZ, gentry.base,   
                 LONGSZ, LONGSZ, gentry.base + gentry.size,  
                 mtrr_type[gentry.type]);  
         gentry.regnum++;  
     }  
     close(fd);  
   
     exit(EXIT_SUCCESS);  
 }  


Read more
Colin Ian King

This week I'm attending the Linux Plumbers Conference in Santa Rosa, CA.  Yesterday I gave a brief presentation of the Firmware Test Suite in the Development Tools, and for reference, I've uploaded the slides here.


Read more
Colin Ian King

Dumping UEFI variables

UEFI variables in Linux can be found in /sys/firmware/efi/vars on UEFI firmware based machine, however, the raw variable data is in a binary format and hence not in a human readable form.   The Ubuntu Natty firmware test suite contains the uefidump tool to extract and decode the binary data into a more human readable form.

To run, use:

sudo fwts uefidump -


and you will see something similar to the following:

Name: AuthVarKeyDatabase.
  GUID: aaf32c78-947b-439a-a180-2e144ec37792
  Attr: 0x17 (NonVolatile,BootServ,RunTime).
  Size: 1 bytes of data.
  Data: 0000: 00                                               .

Name: Boot0000.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: Primary Master Harddisk
  Path: \BIOS(2,0,Primary Master Harddisk).

Name: Boot0001.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI Internal Shell
  Path: \Unknown-MEDIA-DEV-PATH(0x7)\Unknown-MEDIA-DEV-PATH(0x6).

Name: Boot0003.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: ubuntu
  Path: \HARDDRIVE(1,22,9897,0f52a6e132775546,ab,f6)\FILE('\EFI\ubuntu\grubx64.efi').

Name: Boot0004.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Active: Yes
  Info: EFI DVD/CDROM
  Path: \ACPI(0xa0341d0,0x0)\PCI(0x2,0x1f)\ATAPI(0x0,0x1,0x0).

Name: BootOptionSupport.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  BootOptionSupport: 0x0303.

Name: BootOrder.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Boot Order: 0x0003,0x0000,0x0001,0x0004,0x0005,0x0006.

Name: ConIn.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7f010400)\USBCLASS(0xffff,0xffff,0x3,0x1,0x1).

Name: ConInDev.
  GUID: 8be4df61-93ca-11d2-aa0d-00e098032b8c
  Attr: 0x6 (BootServ,RunTime).
  Device Path: \ACPI(0xa0341d0,0x0)\PCI(0x0,0x1f)\ACPI(0x50141d0,0x0)\UART(115200 baud,8,1,1)\VENDOR(11d2f9be-0c9a-9000-273f-c14d7fff0400).

Name: Setup.
  GUID: 038bcef0-21e2-49d1-a47c-b7257296b980
  Attr: 0x7 (NonVolatile,BootServ,RunTime).
  Size: 114 bytes of data.
  Data: 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Data: 0070: 01 00   
..

The tool will try to decode the binary data, however, if it cannot identify the variable type it will resort to doing a hex dump of the data instead.


Read more
Colin Ian King

Firmware Test Suite reference guide.

I've been working away polishing up the Firmware Test Suite for Ubuntu Natty 11.04 and to complement the tool I have eventually got around to writing up a reference guide for the tool.   The guide can be found at:

https://wiki.ubuntu.com/Kernel/Reference/fwts

This complements the rather terse man page and ever terser fwts --help output.


Read more
Colin Ian King

I've now competed the documentation of the Firmware Test Suite and this include documenting each of the 50+ tests (which was a bit of a typing marathon).  Each test has a brief description of what the test covers, example output from the test, how to run the test (possibly with different options) and explanations of test failure messages.

For example of the per-test documentation, check out the the suspend/resume test page and the ACPI tables test page.

I hope this is useful!


Read more
Colin Ian King

The Firmware Test Suite (fwts) is still a relatively new tool and hence this cycle I've still been adding some features and fixing bugs.  I've been running fwts against large data sets to soak test the tool to catch a lot of stupid corner cases (e.g. broken ACPI tables). Also, I am focused on getting some better documentation written (this is still "work in progress").

New tests for the Oneiric 11.10 release are as follows:

mpcheck:
    Sanity check tables against the MultiProcessor Specification (MPS). For more information about MPS, see the wikipedia MPS page.

mpdump:
    Dump annotated MPS tables.

msr:
    Sanity check Model Specific Registers across all CPUs. Does some form of MSR default field sanity checking.

s3power:
    Very simple suspend (S3) power checks.  This puts the machine into suspend and attempts to measure the power consumed while suspended. Generally this test gives more accurate results the longer one suspends the machine.  Your mileage may vary on this test.

ebdadump:
     Hex dump of the Extended BIOS Data Area.

In addition to the above, the fwts "method" test is now expanded to evaluate and exercise over 90 ACPI objects and methods.

One can also join the fwts mailing list by going to the project page and subscribing.


Read more
Colin Ian King

The dwarves package contains a set of useful tools that use the DWARF information placed in the ELF binaries by the compiler. Utilities in the dwarves package include:

pahole: This will find alignment holes in structs and classes in languages such as C/C++.  With careful repacking one can achieve better cache hits.  I could have done with this when optimising some code a few years back...

codiff:  This is a diff like tool use to compare the effect a change in source code can create on the compiled code.

pfunct:  This displays information about functions, inlines, goto labels, function size, number of variables and much more.

pdwtags: A DWARF information pretty-printer.

pglobal:  Dump out global symbols.

prefcnt:  A DWARF tags usage count.

dtagnames: Will lists tag names.

So, using pglobal, I was able to quickly check which variables I had made global (or accidentally not made them static!) on some code that I was developing as follows:

pglobal -v progname

and the same for functions:

pglobal -f progname

Easy!  Obviously these tools only work if the DWARF information is not stripped out.

All in all, these are really useful tools to know about and will help me in producing better code in the future.


Read more
Colin Ian King

Assume Nothing.

When I was a very junior software engineer working on Fortran 77 signal processing modules on MicroVaxes, PDP-11s, Masscomps and old 286 PCs I was given some very wise words by the owner of the company:   "Assume nothing".   This has stuck with me for nearly quarter of a century.  It is pithy, easy to remember and is so true for software engineering.

1. "Assume nothing" makes me look up details when I'm not 100% sure.
2. "Assume nothing" means that I double check my facts when I think I'm 100% sure.
3. "Assume nothing" makes me question even the so called 'obvious'.  "Of course it will work.." turns into "are we sure it will work for every possible case?"
4. "Assume nothing" makes me dot the i's and cross the t's.
5. "Assume nothing" keeps me sceptical, which is useful as there is a lot of stupidity masquerading as knowledge on the internet.

I could ramble on. However enough said. Just assume nothing, it will keep you out of a lot of trouble.


Read more
Colin Ian King

How x86 computers boot up

Gustavo Duarte has written a concise and very readable article describing how computers boot up.  Well worth reading.


Read more
Colin Ian King

Which version of the GCC compiled my program?

Here's a quick one-liner to find out which version of GCC was used to compile some code:

readelf -p .comment a.out

String dump of section '.comment':
  [     0]  GCC: (Ubuntu/Linaro 4.6.1-7ubuntu1) 4.6.1
  [    2a]  GCC: (Ubuntu/Linaro 4.6.1-6ubuntu6) 4.6.1


Read more
Colin Ian King

Fragmentation on ext2/ext3/ext4 filesystems

If you want know how fragmented a file is on an ext2/ext3/ext4 filesystem there are a couple of methods of finding out.  One method is to use hdparm, but one needs CAP_SYS_RAWIO capability to do so, hence run with sudo:

sudo hdparm --fibmap /boot/initrd.img-2.6.38-9-generic

/boot/initrd.img-2.6.38-9-generic:
 filesystem blocksize 4096, begins at LBA 24000512; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0   35747840   35764223      16384
     8388608   39942144   39958527      16384
    16777216   40954664   40957423       2760

Alternatively, one can use the filefrag utility (part of the e2fsprogs package) for reporting the number of extents:

filefrag /boot/initrd.img-2.6.38-9-generic
/boot/initrd.img-2.6.38-9-generic: 3 extents found

..or more verbosely:

filefrag -v /boot/initrd.img-2.6.38-9-generic
Filesystem type is: ef53
File size of /boot/initrd.img-2.6.38-9-generic is 18189027 (4441 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  1468416            2048
   1    2048  1992704  1470463   2048
   2    4096  2119269  1994751    345 eof
/boot/initrd.img-2.6.38-9-generic: 3 extents found

Well, that's useful. So, going one step further, how many free extents are available on the filesystem? Well, e2freefrag is the tool for this:

sudo e2freefrag /dev/sda1
Device: /dev/sda1
Blocksize: 4096 bytes
Total blocks: 2999808
Free blocks: 1669815 (55.7%)

Min. free extent: 4 KB
Max. free extent: 1370924 KB
Avg. free extent: 6160 KB
Num. free extent: 1084

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
    4K...    8K-  :           450           450    0.03%
    8K...   16K-  :            97           227    0.01%
   16K...   32K-  :            99           527    0.03%
   32K...   64K-  :           134          1475    0.09%
   64K...  128K-  :            96          2193    0.13%
  128K...  256K-  :            50          2235    0.13%
  256K...  512K-  :            30          2643    0.16%
  512K... 1024K-  :            36          6523    0.39%
    1M...    2M-  :            36         13125    0.79%
    2M...    4M-  :            14          9197    0.55%
    4M...    8M-  :            15         18841    1.13%
    8M...   16M-  :             8         17515    1.05%
   16M...   32M-  :             7         38276    2.29%
   32M...   64M-  :             3         39409    2.36%
   64M...  128M-  :             3         67865    4.06%
  512M... 1024M-  :             4        802922   48.08%
    1G...    2G-  :             2        646392   38.71%

..thus reminding me to do some housekeeping and remove some junk from my file system... :-)


Read more
Colin Ian King

Making my 2nd Webcam the default for Empathy

It just so happens that I have two Webcams on my machine, one being a rather poor one built into the laptop and a 2nd better quality Logitech webcam.

Using the 2nd webcam by default in Empathy for video conference calls required a little bit of hackery with gconf-editor by changing /system/gstreamer/0.10/default/videosrc from v4l2src to vl4l2src device="/dev/video1"


This wasn't entirely the most user friendly way to configure the default. Ho hum..


Read more
Colin Ian King

The semantics of halt.

It appears that the semantics of halt mean it will stop the machine but it may or may not shut it down.  Back when I used UNIX boxes, halt basically stopped the machine but never powered it down; to power it down one had to explicitly use "halt -p".

So things change. With upstart, halt is a symbolic link to reboot and reboot calls shutdown -h.  The man page to shutdown states for the -h option:

"Requests that the system be either halted or powered off after it has been brought down, with the choice as to which left up to the system."

Hrm, so this vaguely explains why halting some machines may just halt and on others it may also shut the system down.  I've not digged into this thoroughly yet, but one suspects that for different processor architectures we get different implementations.  Even for x86 we have variations in CPUs and boards/platforms, so it really it is hard to say if halt will power down a machine.

The best bet is to assume halt just halts and if you want it to power down always use "halt -p".


Read more
Colin Ian King

The Linux PCI core driver provides a useful (and probably overlooked) sysfs interface to read PCI ROM resources.  A PCI device that has a ROM resource will have a "rom" sysfs file associated with it, writing anything other than 0 to it will enable one to then read the ROM image from this file.

For example, on my laptop, to find PCI devices that have ROM images associated with them I used:

find /sys/devices -name "rom"
/sys/devices/pci0000:00/0000:00:02.0/rom

and this corresponds to my Integrated  Graphics Controller:

lspci | grep 02.0
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c)

To dump the ROM I used:

echo 1 | sudo tee /sys/devices/pci0000\:00/0000\:00\:02.0/rom
sudo cat /sys/devices/pci0000\:00/0000\:00\:02.0/rom > vbios.rom

To disassemble this I used ndisasm:

sudo apt-get install nasm
ndisasm -k 0,3 vbios.rom | less

..and just use strings on the ROM image to dump out interesting text, e.g.

strings vbios.rom
000000000000
00IBM VGA Compatible BIOS.
PCIR
(00`
*@0p
H?@0b
..


..and then used a tool like bvi to edit the ROM.


Read more