Canonical Voices

Colin Ian King

I've been using QEMU and KVM for quite a while now for general kernel testing, for example, sanity checking eCryptfs and Ceph.   It can be argued that the best kind of testing is performed on real hardware, however, there are times when it is much more convenient (and faster) to exercise kernel fixes on a virtual machine.

I used to use the command line incantations to run QEMU and KVM, but recently I've moved over to using virt-manager because it so much simpler to use and caters for most of my configuration needs.

Virt-manager provides a very usable GUI and allows one to create, manage, clone and destroy virtual machine instances with ease.

virt-manager view of virtual machines
Each virtual machine can be easy reconfigured in terms of CPU configuration (number and type of CPUs),  memory size, boot options, disk and CD-ROM selection, NIC selection, display server (VNC or Spice), sound device, serial port config, video hardware and USB and IDE controller config.  

One can add and remove additional hardware, such serial port, parallel ports, USB and PCI host devices, watchdog controllers and much more besides.

Configuring a virtual machine

..so reconfiguring a test to run on a single core CPU to multi-core is a simple case of shutting down the virtual machine, bumping up the number of CPUs and booting up again.

By default one can view the virtual machine's console via a VNC viewer in virt-manager and there is provision to scale the screen to the window size, set to full size or resize the virt-manager window to the screen size.  For ease of use, I generally just ssh into the virtual machines and ignore the console unless I can't get the kernel to boot.

virt-manager viewing a 64 bit Natty server (for eCryptfs testing)
Virt-manager is a great tool and well worth giving a spin. For more information on virt-manager visit virt-manager.org

Read more
Colin Ian King

A new Ubuntu portal http://odm.ubuntu.com is a jump-start page containing links to pages and documents useful for Original Design Manufactures (ODMs), Original Equipment Manufacturers (OEMs) and Independent BIOS vendors.

Some of the highlights include:

  • A BIOS/UEFI requirements document that containing recommendations to ensure firmware is compatible with the Linux kernel.
  • Getting started links describing how to download, install, configure and debug Ubuntu.
  • Links to certified hardware, debugging tools, SystemTap guides, packaging guides, kernel building notes.
  • Debugging tips, covering: hotkeys, suspend/resume, sound, X and wireless and an A5 sized Ubuntu Debugging booklet.
  • Link to fwts-live, the Firmware Test Suite live image.
 ..so lots of useful technical resources to call upon.

Kudos to Chris Van Hoof for organizing this useful portal.

Read more
Colin Ian King

I've been fortunate to get my hands on an Intel ® 520 2.5" 240GB Solid State Drive so I thought I'd put it through some relatively simple tests to see how well it performs.

Power Consumption


My first round of tests involved seeing how well it performs in terms of power consumption compared to a typical laptop spinny Hard Disk Drive.  I rigged up a Lenovo X220i (i3-2350M @ 2.30GHz) running Ubuntu Precise 12.04 LTS (x86-64) to a Fluke 8846A precision digital multimeter and then compared the SSD with a 320GB Seagate ST320LT020-9YG142 HDD against some simple I/O tests.  Each test scenario was run 5 times and I based my results of the average of these 5 runs.

The Intel ® 520 2.5" SSD fits into conventional drive bays but comes with a black plastic shim attached to one side that has to be first removed to reduce the height so that it can be inserted into the Lenovo X220i low profile drive bay. This is a trivial exercise and takes just a few moments with a suitable Phillips screwdriver.   (As a bonus, the SSD also comes with a 3.5" adapter bracket and SATA 6.0 signal and power cables allowing it to be easily added into a Desktop too).

In an idle state, the HDD pulled ~25mA more than the SSD, so in overall power consumption terms the SSD saves ~5%, (e.g. adds ~24 minutes life to an 8 hour battery).

I then exercised the ext4 file system with Bonnie++ and measured the average current drawn during the run and using the idle "baseline" calculated the power consumed for the duration of the test.    The SSD draws more current than the HDD, however it ran the Bonnie++ test ~4.5 times faster and so the total power consumed to get the same task completed was less, typically 1/3 of the power of the HDD.

Using dd, I next wrote 16GB to the devices and found the SSD was ~5.3 times faster than the HDD and consumed ~ 1/3 the power of the HDD.    For a 16GB read, the SSD was ~5.6 times faster than the HDD and used about 1/4 the power of the HDD.

Finally, using tiobench I calculated that the SSD was ~7.6 times faster than the HDD and again used about 1/4 the power of the HDD.

So, overall, very good power savings.  The caveat is that since the SSD consumes more power than the HDD per second (but gets way more I/O completed) one can use more power with the SSD if one is using continuous I/O all the time.    You do more, and it costs more; but you get it done faster, so like for like the SSD wins in terms of reducing power consumption.

 

Boot Speed


Although ureadhead tries hard to optimize the inode and data reads during boot, the HDD is always going to perform badly because of seek latency and slow data transfer rates compared to any reasonable SSD.   Using bootchart and five runs the average time to boot was ~7.9 seconds for the SSD and ~25.8 seconds for the HDD, so the SSD improved boot times by a factor of about 3.2 times.  Read rates were topping ~420 MB/sec which was good, but could have been higher for some (yet unknown) reason. 

 

Palimpsest Performance Test


Palimpsest (aka "Disk Utility") has a quick and easy to use drive benchmarking facility that I used to measure the SSD read/write rates and access times.  Since writing to the drive destroys the file system I rigged the SSD up in a SATA3 capable desktop as a 2nd drive and then ran the tests.  Results are very impressive:

Average Read Rate: 535.8 MB/sec
Average Write Rate: 539.5 MB/sec
Average Access Time: sub 0.1 milliseconds.

This is ~7 x faster in read/write speed and ~200-300 x faster in access time compared to the Seagate HDD.

File System Benchmarks


So which file system performs best on the SSD?  Well, it depends on the use case. There are may different file system benchmarking tools available and each one addresses different types of file system behaviour.   Which ever test I use it most probably won't match your use case(!)  Since SSDs have very small latency overhead it is worth exercising various file systems with multiple threaded I/O read/writes and see how well these perform.  I rigged up the threaded I/O benchmarking tool tiobench to exercise ext2, ext3, ext4, xfs and btrfs while varying the number of threads from 1 to 128 in powers of 2.  In theory the SSD can do multiple random seeks very efficiently, so this type of testing should show the point where the SSD has optimal performance with multiple I/O requests.

 

Sequential Read Rates

Throughput peaks at 32-64 threads and xfs performs best followed by ext4, both are fairly close to the maximum device read rate.   Interestingly btrfs performance is always almost level.

Sequential Write Rates


xfs is consistently best, where as btrfs performs badly with the low thread count.

 

Sequential Read Latencies



These scale linearly with the number of threads and all file systems follow the same trend.

 

 Sequential Write Latencies



Again, linear scaling of latencies with number of threads.

Random Read Rates


Again, best transfer rates seem to occur at with 32-64 threads, and btrfs does not seem to perform that well compared to ext2, ext3, ext4 and xfs

Random Write Rates



Interestingly ext2 and ext3 fair well with ext4 and xfs performing very similarly and btrfs performing worst again.

 

Random Read Latencies



Again the linear scaling with latency as thread count increases with very similar performance between all file systems.  In this case, btrfs performs best.

Random Write Latencies


With random writes the latency is consistently flat, apart from the final data point for ext4 at 128 threads which could be just due to an anomaly.

Which I/O scheduler should I use?

 

Anecdotal evidence suggests using the noop scheduler should be best for an SSD.  In this test I exercised ext4, xfs and btrfs with Bonnie++ using the CFQ, Noop and Deadline schedulers.   The tests were run 5 times and below are the averages of the 5 test runs.

ext4:




CFQNoopDeadline
Sequential Block Write (K/sec):506046513349509893
Sequential Block Re-Write (K/sec):213714231265217430
Sequentual Block Read (K/sec):523525551009508774


So for ext4 on this SSD, Noop is a clear winner for sequential I/O.

xfs:




CFQNoopDeadline
Sequential Block Write (K/sec):514219514367514815
Sequential Block Re-Write (K/sec):229455230845252210
Sequentual Block Read (K/sec):526971550393553543


It appears that Deadline for xfs seems to perform best for sequential I/O.

 

btrfs:




CFQNoopDeadline
Sequential Block Write (K/sec):511799431700430780
Sequential Block Re-Write (K/sec):252210253656242291
Sequentual Block Read (K/sec):629640655361659538


And for btrfs, Noop is marginally better for sequential writes and re-writes but Deadline is best for reads.

So it appears for sequential I/O operations, CFQ is the least optimal choice with Noop being a good choice for ext4, deadline for xfs and either for btrfs.   However, this is just based on Sequential I/O testing and we should explore Random I/O testing before drawing any firm conclusions.

Conclusion

 

As can be seen from the data, SSD provide excellent transfer rates, incredibly short latencies as well as a reducing power consumption.   At the time of writing the cost per GB for an SSD is typically slightly more than £1 per GB which is around 5-7 times more expensive than a HDD.    Since I travel quite frequently and have damaged a couple of HDDs in the last few years the shock resistance, performance and power savings of the SSD are worth paying for.

Read more
Colin Ian King

Dell 1525 battery not charging

My wife's Dell 1525 Ubuntu laptop starting having battery problems last year and eventually we ended up with a totally dead Li-ion battery.   Fortunately I was able to acquire a clone replacement for about £25 which charged fine and worked for a week before becoming totally drained.

According to some users, this happens because the charging circuitry has died, which was a little alarming since the machine was way out of warranty.  So I had a machine that runs fine on AC power, but the battery won't charge.   So I slept on the problem and this morning I thought I'd try another spare Dell AC adapter just to factor out the AC power supply.  To my surprise the battery started charging, so I had to conclude the problem is simply due to a broken AC power supply.

So if the AC power supply is not charging, perhaps the original battery wasn't dead after all.  I plugged in the old battery, gave it an hour to charge but found it really was dead and useless.

I've compared the characteristics of the working power supply against the broken one with a multimeter and I cannot see any difference, which strikes me a little curious.   If anyone has any ideas why one works and other other doesn't please let me know!

UPDATE

After a bit of research I found a relevant article at laptop-junction.com [1] that describes the AC adapter battery charging issue.   So it seems that this is a common issue [2]  for a bunch of AC adapters and the author suggests a possible design issue [3].

References:

[1] http://www.laptop-junction.com/toast/content/battery-not-charging
[2] http://www.laptop-junction.com/toast/content/dell-ac-power-adapter-not-recognized
[3] http://www.laptop-junction.com/toast/content/dell-ac-power-adapter-id-chip-died

Read more
Colin Ian King

The Ubuntu Kernel Team has uploaded a new kernel (3.2.0-17.27) which contains an additional fix to resolve the remaining issues seen with the RC6 power saving enabled. For users with Sandy Bridge based hardware we would appreciate them to run the tests described on https://wiki.ubuntu.com/Kernel/PowerManagementRC6 and add their results to that page.

Read more
Colin Ian King

The Ubuntu Kernel Team has released a call for testing for a set of RC6 power saving patches for Ubuntu 12.04 Precise Pangolin LTS. Quoting Leann Ogasawara's email to the ubuntu kernel team and ubuntu-devel mailing lists:

"Hi All,

RC6 is a technology which allows the GPU to go into a very low power consumption state when the GPU is idle (down to 0V). It results in considerable power savings when this stage is activated. When comparing under idle loads with machine state where RC6 is disabled, improved power usage of around 40-60% has been witnessed [1].

Up until recently, RC6 was disabled by default for Sandy Bridge systems due to reports of hangs and graphics corruption issues when RC6 was enabled. Intel has now asserted that RC6p (deep RC6) is responsible for the RC6 related issues on Sandy Bridge. As a result, a patch has recently been submitted upstream to disable RC6p for Sandy Bridge [2].

In an effort to provide more exposure and testing for this proposed patch, the Ubuntu Kernel Team has applied this patch to 3.2.0-17.26 and newer Ubuntu 12.04 Precise Pangolin kernels. We have additionally enabled plain RC6 by default on Sandy Bridge systems so that users can benefit from the improved power savings by default.

We have decided to post a widespread call for testing from Sandy Bridge owners running Ubuntu 12.04. We hope to capture data which supports the the claims of power saving improvements and therefore justify keeping these patches in the Ubuntu 12.04 kernel. We also want to ensure we do not trigger any issues due to plain RC6 being enabled by default for Sandy Bridge.

If you are running Ubuntu 12.04 (Precise Pangolin) and willing to test and provide feedback, please refer to our PowerManagementRC6 wiki for detailed instructions [3]. Additionally, instructions for reporting any issues with RC6 enabled are also noted on the wiki. We would really appreciate any testing and feedback users are able to provide.

Thanks in advance,
The Ubuntu Kernel Team"

So please contribute to this call for testing by visiting https://wiki.ubuntu.com/Kernel/PowerManagementRC6 and follow the instructions.  Thank you!

Read more
Colin Ian King

Firmware Test Suite Live (fwts-live) is a USB live image that will automatically boot and run the Firmware Test Suite (fwts) - it will run on legacy BIOS and also UEFI firmware (x86_64) bit systems.

fwts-live will run a range of fwts tests and store the results on the USB stick - these can be reviewed while running fwts-live or at a later time on another computer if required.

To install fwts-live on to a USB first download either a 32 or 64 bit image from http://odm.ubuntu.com/fwts-live/ and then uncompress the image using:

 bunzip2 fwts-live-*.img.bz2  

Next insert a USB stick into your machine and unmount it. Now one has to copy the fwts-live image to the USB stick - one can find the USB device using:

 dmesg | tail -10 | grep Attached  
 [ 2525.654620] sd 6:0:0:0: [sdb] Attached SCSI removable disk  

..so the above example it is /dev/sdb, and copy using:

 sudo dd if=fwts-live-oneiric-*.img of=/dev/sdb  
 sync  

..and then remove the USB stick.

To run, insert the USB stick into the machine you want to test and then boot the machine.  This will start up fwts-live and then you will be shown a set of options - to either run all the fwts batch tests, to select individual tests to run, or abort testing and shutdown.


If you chose to run all the fwts batch tests then fwts will automatically run through a series of tests which will take a few minutes to complete:


and when complete one can chose to view the results log:


if "Yes" is selected then one can view the results. The cursor up/down and page up/down keys can be used to navigate the results log file.  When you have completed viewing the results log, fwts-live will inform you where the results have been saved on the USB stick (so that one can review them later by plugging the USB stick into a different machine).


A full user guide to fwts-live is available at: https://wiki.ubuntu.com/HardwareEnablementTeam/Documentation/FirmwareTestSuiteLive

To help interpret any errors or warnings found by fwts we recommend visiting  fwts reference guide - this is has comprehensive description of each test and detailed explanations of warnings and error messages.

Below is a demo of fwts-live running inside QEMU:

 
 
Kudos to Chris Van Hoof for producing fwts-live

Read more
Colin Ian King

3G using a Huawei E1552/E1800 (HSPA modem) on Ubuntu

So my internet service provider is rolling out a programme of speed upgrades and over the past few weeks I've suffered from various connectivity issues most probably because of infrastructure upgrades.   I lost connectivity today at 6am and was told to expect to be connected by 9pm, so I popped down town and acquired a 3G USB dongle and a suitable data plan/ contract for my needs.

Typically these USB dongles are designed to appear as USB media devices (e.g. pseudo CD-ROM) and one has to mode switch it to a USB modem.   Unfortunately I had a Huawei E1552/E1800 which required some USB mode switching magic, but to find this I first required internet connectivity.   Fortunately I had a sacrificial laptop which I installed an old version of Windows XP which allowed me to then connect to the internet using the 3G USB dongle and I was able to then track down the appropriate runes.  OK, I feel bad about installing Windows XP, but I was being pragmatic - I needed connectivity!

The procedure to get this device working on Ubuntu wasn't too bad.  First I identified the USB dongle using lsusb to get the vendor and product IDs (12d1:1446):

Bus 002 Device 013: ID 12d1:1446 Huawei Technologies Co., Ltd. E1552/E1800 (HSPA modem)

Then I added the following runes to /etc/usb_modeswitch.conf -

 DefaultVendor= 0x12d1  
 DefaultProduct=0x1446  
 TargetVendor= 0x12d1  
 TargetProductList="1001,1406,140b,140c,141b,14ac"  
 CheckSuccess=20  
 MessageContent="55534243123456780000000000000011060000000000000000000000000000"  

..this appears in many forums on the internet, kudos to whoever figured this out.

Then I ran "sudo usb_modeswitch -c /etc/usb_modeswitch.conf" and this switched the dongle into:

Bus 002 Device 012: ID 12d1:14ac Huawei Technologies Co., Ltd.

..and I was then able to simply connect using network manager.   Result!

** UPDATE **

Mathieu Trudel-Lapierre fixed this (9th Feb 2012) and now Ubuntu Precise works perfectly with the  Huawei E1552/E1800.  Thanks Mathieu!

Read more
Colin Ian King

open() using O_WRONLY | O_RDWR

One of the lesser known Linux features is that one can open a file with the flags O_WRONLY | O_RDWR.   One requires read and write permission to perform the open(), however, the flags indicate that no reading or writing is to be done on the file descriptor.   It is useful for operations such as ioctl() where we also want to ensure we don't actually do any reading or writing to a device.  A bunch of utilities such as LILO seem to use this obscure feature. 

LILO defines these flags as O_NOACCESS as follows:

 #ifdef O_ACCMODE  
 # define O_NOACCESS O_ACCMODE  
 #else  
 /* open a file for "no access" */  
 # define O_NOACCESS 3  
 #endif  

..as in this example, you may find these flags more widely known as O_NOACCESS even though they are not defined in the standard fcntl.h headers.

Below is a very simple example of the use of O_WRONLY | O_RDWR:

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <unistd.h>  
 #include <sys/ioctl.h>  
 #include <fcntl.h>  
 int main(int argc, char **argv)  
 {  
      int fd;  
      struct winsize ws;  
      if ((fd = open("/dev/tty", O_WRONLY | O_RDWR)) < 0) {  
           perror("open /dev/tty failed");  
           exit(EXIT_FAILURE);  
      }  
      if (ioctl(fd, TIOCGWINSZ, &ws) == 0)  
           printf("%d x %d\n", ws.ws_row, ws.ws_col);  
      close(fd);  
      exit(EXIT_SUCCESS);  
 }  

It is a little arcane and not portable but also an interesting feature to know about.

Read more
Colin Ian King

C ternary operator hack

Here is a simple bit of C that sets either x or y to value v depending on the value of c..

if (c)   
    x = v;  
else  
    y = v;  

..but why not "improve" this by using the C ternary operator ? : as follows:

 *(c ? &x : &y) = v;  

Now, how does this shape up when compiled on an x86 with gcc -O2 ?  Well, the first example compiles down to a test and a branch where as the second example uses a conditional move instruction (cmove) and avoids the test and branch and is faster code.  Result!

OK, so this isn't rocket science, but does show that a little bit of abuse of the ternary operator can save me a few cycles if the compiler is clued up to use cmove.

Read more
Colin Ian King

Part of my focus this cycle is to see where we can make power saving improvements for Ubuntu Precise 12.04 LTS. There has been a lot of anecdotal evidence of specific machines or power saving features behaving poorly over the past few cycles.   So, armed with a 6.5 digit precision multimeter from Fluke I've been measuring the power consumption on various laptops in different test scenarios to try and answer some outstanding questions:

* Is it safe to enable Matthew Garrett's PCIe ASPM fix?
* Are the power savings suggested by PowerTop useful and can we reliably enabled any of these in pm-utils?
* How accurate are the ACPI battery readings to estimate power consumption?
* Do the existing pm-utils power.d scripts still make sense?
* Which is better for power saving: i386, i386-pae or amd64?
* How much power does the laptop backlight really use?
* Does halving the mouse input rate really save that much more power?
* Should we re-enable Aggressive Link Power Management (ALPM)?
* Are there any misbehaving applications that are consuming too much power?
* What are the root causes of HDD wake-ups
* Which applications and daemons are creating unnecessary wake events?
* How much does the MSR_IA32_ENERGY_PERF_BIAS save us?

..and many more besides!

From some of the analysis and "crowd sourcing" tests it is clear that the PCIe ASPM fix works well, so we've already incorporated that into Precise.

Aggressive Link Power Management (ALPM) is a mechanism where a SATA AHCI controller can put the SATA link that connects to the disk into a very low power mode during periods of zero I/O activity and into an active power state when work needs to be done. Tests show that this can save around 0.5-1.5 Watts of power on a typical system. However, it has been known in the past to not work on some devices, so I've put a call for testing of ALPM out to the community so we can get a better understanding of the power savings vs reliability.

Some of the PowerTop analysis has shown we can save another 1-2 Watts of power by putting USB and PCI controllers of devices like Webcams, SD card controllers, Wireless, Ethernet and Bluetooth  into a lower power state.  Again, we would like to understand the range of power savings across a large set of hardware and to see how reliable this is, so another crowd sourced call for testing has been also set up.

So, if you want to contribute to the testing, please visit the above links and spend just a few tens of minutes to see we can extend the battery life of your laptop or netbook.  And periodically visit https://wiki.ubuntu.com/Kernel/PowerManagement to see if there any new tests you can participate in.

[UPDATE]

I've written some brief notes on power saving tweaks and also some simple recommendations for application developers to follow too.

The thread continues here (part 2)

Read more
Colin Ian King

Last month I wrote about the investigations being undertaken to identify any suitable power savings for Ubuntu Precise 12.04 LTS.  Armed with a suitably accurate 6.5 digit precision Fluke digital multimeter I worked my way through the Kernel Team Power Management Blueprint measuring many numerous configurations and ways to possibly save power.

A broad range of areas were examined, from kernel tweaks, hardware settings to disk wake-ups and application wakeup events.

Quite a handful of misbehaving applications have been identified ranging from frequent unnecessary wake-ups on poll() and select() calls to rather verbose logging of debug messages that stop the disk from going into power saving states.

We also managed to identify and remove pm-utils power.d scripts that didn't actually save any power and even consumed more power on newer Solid State Drives.    By carefully analysing all the PowerTop recommendations we also identified a subset of device power management savings that are shown to be useful and save power across a wide range of machines.   After crowd-source testing these tweaks we have now added them into pm-utils for Ubuntu Precise 12.04 LTS by default.  I'm very grateful to the Ubuntu community for participating in the testing and feedback.

I've written a brief summary of all the test results, however, the full results can be found in the various subdirectories here.   I've also written a very simple set of recommendations to help application developers avoid mistakes that lead to power wasting applications.

We've also set up a Power Management Wiki page that has links to the following:

* Identifying Power Sucking Applications
* Aggressive Link Power Management call for testing
* PCIe Active State Power Management call for testing (now complete)
* Updates to pm utils scripts call for testing (now complete)

..and probably the most useful:

* Power Saving Tweaks

The Power Saving Tweaks page lists a selection of tweaks that can be employed to save power on various machines.  Unfortunately with some hardware these tweaks cause lock-ups or rendering bugs, so they cannot be rolled out by default unless we can find either a definitive list of the broken hardware or a large enough whitelist to enable these on a useful set of working hardware.  Some of the tweaks cannot be rolled out for all machines as users want specific functionality enabled by default, for example, we need to enable Bluetooth for users with bluetooth keyboards and so it is up to the user to chose to disable Bluetooth to save 1-2 Watts of power.

I've also set-up a PPA with a few tools to help measure power and track down misbehaving wake-up events and CPU intensive applications.  These tools don't replace tools like PowerTop and top, but do allow me to track trends on a system over a long running period.   You may find these useful too.

We also have a Ubuntu Power Consumption Project set up to help us track bugs related to power consumption issues and regressions.  

Last, but no way least,  I'd like to thank Steve Langasek and Martin Pitt for all their help with the pm-utils and various fixes to power sucking applications.

Read more
Colin Ian King

Commodore 64 is 30

The C64 boot screen (running in vice)
30 years ago this week Commodore unveiled the Commodore 64 (C64) - a MOS 6510 based 8 bit microcomputer with a 64K of RAM. I was given a C64 and 1530 C2N cassette deck for Christmas when I was 15 years old and I eventually acquired a 1541 floppy drive. The C64's VIC II graphics chip was a powerful device that had various graphics modes, 8 pixels of smooth scrolling and 8 21x24 pixel sprites. The Sound chip (SID) sported 3 voices with 4 different waveform generators and fine control of the amplitude envelopes as well as filtering and tricks like ring modulation and synchronization.

The lack of a powerful BASIC interpreter directed my attention to learning 6502 assembler so I could start writing 3D wire frame vector graphics. I learned how to write cycle accurate timing code to drive the VIC II to make side borders disappear and with raster interrupts to make the the top and bottom borders disappear too. I also wedged in my own BASIC tokenizer and interpreter to extend the BASIC to provide better structured programming (while/wend, procedures, repeat/until) and sound, graphics and disk support - all this taught me how to structure large projects in assembler and how to write compact and efficient code.

I spent hours pouring over the disassembled C64 BASIC and Kernal ROMs and learned the art of reverse engineering from the object code. I figured out the tape format, analyzed the read/write characteristics of the tape drive head and re-wrote my own tape turbo loaders.
With the aid of an annotated ROM disassembly of the 1541 floppy drive I figured out how to write disk turbos and I hacked up my own fast formatting tools and my own file system.

By the time I was 17 I had acquired the the Super C Compiler and I learned how to write C on a system that had a 15 minute edit-compile-link-run turnaround cycle(!).

Elite on the C64.
All this 1MHz 8 bit goodness taught me valuable lessons in programming efficient code and the trade-off between compact code and fast code. I learned how to twiddle hardware, bit bang data down wires and push a system to squeeze a little more performance out of it.







I was fortunate to have the time and energy and the right hardware available in my formative years, so I am grateful for Commodore for producing the quirky and hackable C64.

See also  http://www.reghardware.com/2012/01/02/commodore_64_30_birthday

Read more
Colin Ian King

Monitoring /proc/timer_stats

The /proc/timer_stats interface allows one to check on timer usage in a Linux system and hence detect any misuse of timers that can cause excessive wake up events (and also waste power).  /proc/timer_stats reports the process id (pid) of a task that initialised the timer, the name of the task, the name of the function that initialised the timer and the name of the timer callback function.  To enable timer sampling, write "1\n" to /proc/timer_stats and to disable write "0\n".

While this interface is simple to use, collecting multiple samples over a long period of time to monitor overall system behaviour takes a little more effort.   To help with this, I've written a very simple tool called eventstat that calculates the rate of events per second and can dump the data in a .csv (comma separated values) format for importing into a spreadsheet such as LibreOffice for further analysis (such as graphing).

In its basic form, eventstat will run ad infinitum and can be halted by control-C. One can also specify the sample period and number of samples to gather, for example:

 sudo eventstat 10 60  

.. this gathers samples every 10 seconds for 60 samples (which equates to 10 minutes).

The -t option specifies an events/second threshold to discard events less than this threshold, for example:  sudo cpustat -t 10 will show events running at 10Hz or higher.

To dump the samples into a .csv file, use the -r option followed by the name of the .csv file.  If you just want to collect just the samples into a .csv file and not see the statistics during the run, use also the -q option, e.g.

 sudo eventstat -q -r event-report.csv


With eventstat you can quickly identify rouge processes that cause a high frequency of wake ups.   Arguably one can do this with tools such as PowerTop, but eventstat was written to allow one to collect the event statistics over a very long period of time and then help to analyse or graph the data in tools such as Libre Office spreadsheet.


The source is available in the following git repository:  git://kernel.ubuntu.com/cking/eventstat.git and in my power management tools PPA: https://launchpad.net/~colin-king/+archive/powermanagement

In an ideal world, application developers should check their code with tools like eventstat or PowerTop to ensure that the application is not misbehaving and causing excessive wake ups especially because abuse of timers could be happening in the supporting libraries that applications may be using.

Read more
Colin Ian King

Google's _nomap SSID madness

Now, I try to write positive comments on my blog, but now and again things really irk me and I need to comment about them.  

Google is using wireless access point SSIDs to construct a database to enable devices to determine their location using wireless and hence not relying on GPS.   If you want to opt out of this database, Google is suggesting that one should simply append _nomap to the SSID.   Google also hopes that this will become a standard SSID opt-out for any location service database.

This basically means that if you want your desired SSID you get opted into Google's database (so much for privacy), otherwise you have to put up with some utterly stupid name that Google mandates.  Thanks for the choice Google.  And there is nothing to stop other location service providers either suggesting a different naming scheme to make it impossible to opt out of one or more schemes.

Now, if the UK government mandated that all SSIDs needed to be named in a specific way to opt out of their special database, there would be uproar.  However, Google just ploughs ahead with more of their data gathering and nobody seems to complain.

Read more
Colin Ian King

UEFI  Compatibility Support Module (CSM) provides compatibility support for traditional legacy BIOS.  This allows allows the booting an operating system that requires a traditional option ROM support, such as BIOS Int 10h video calls.

While looking at boot and runtime misbehaviour on UEFI systems I would like to know if CSM is enabled or not, but the question is how does one detect CSM support?   Well, making the assumption that CSM is generally enabled to support Int 10h video calls, we look for any video option ROMs and see if the real mode Int 10h vector is set to jump to a handler in one of the ROMs.  

Option ROMs are found in the region 0xc0000 to 0xe0000 and normally the video option ROM is found at 0xc0000.  Option ROMs are found on 512 byte boundaries with a header bytes containing 0x55, 0xaa and ROM length (divided by 512) so we just mmap in 0xc0000..xe0000 and then scan the memory for headers to locate option ROM images.  

My assumption for CSM being enabled is that Int 10h vectors into one of these option ROMs, and we can assume it is a video option ROM if it contains the string "VGA" somewhere in the ROM image.  Yes, it is a hack, but it seems to work on the range of UEFI enabled systems I've so far used.

For reference, I've put the code in my debug-code git repository and available for anyone to use.


Read more
Colin Ian King

UEFI Secure Boot and Linux

There has been a lot of (heated) discussion in the past weeks concerning UEFI Secure Boot and how this can impact on the ability of a user to install their operating system of choice.

To address this, today has seen no less than two papers published to address this hot topic.  Canonical along with Red Hat have published a white paper that describes how UEFI Secure Boot will impact users and manufactures.   The paper also provides recommendations on the implementation of UEFI Secure Boot in way that allows users to be in control of their own PC hardware.

Meanwhile the Linux Foundation has also published a paper giving technical guidance on how to implement UEFI Secure Boot to allow operating systems other than Windows 8 to operate on new Windows 8 PCs.

So lots to read and good technical guidance all round.  Let's  hope that these constructive set of papers will push the argument into a positive outcome.


Read more
Colin Ian King

C vararg macros are very useful and I've generally used them a lot for wrapping C vararg functions.  However, at times it would be very useful to be able to determine the number of arguments being passed into the the vararg macro and this is not as straight forward as it first seems.

Anyhow, this problem has been asked many times on the usenet and internet, and I stumbled on a very creative solution by Laurent Deniau posted on comp.std.c back in 2006.

 #define PP_NARG(...) \  
      PP_NARG_(__VA_ARGS__,PP_RSEQ_N())  
 #define PP_NARG_(...) \  
      PP_ARG_N(__VA_ARGS__)  
 #define PP_ARG_N( \  
      _1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \  
      _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \  
      _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \  
      _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \  
      _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \  
      _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \  
      _61,_62,_63,N,...) N  
 #define PP_RSEQ_N() \  
      63,62,61,60,          \  
      59,58,57,56,55,54,53,52,51,50, \  
      49,48,47,46,45,44,43,42,41,40, \  
      39,38,37,36,35,34,33,32,31,30, \  
      29,28,27,26,25,24,23,22,21,20, \  
      19,18,17,16,15,14,13,12,11,10, \  
      9,8,7,6,5,4,3,2,1,0  
   
 /* Some test cases */  
 PP_NARG(A) -> 1  
 PP_NARG(A,B) -> 2  
 PP_NARG(A,B,C) -> 3  
 PP_NARG(A,B,C,D) -> 4  
 PP_NARG(A,B,C,D,E) -> 5   

However, passing no arguments to this macro yields 1, which is not as we expect. So last night I tweaked the macro to fix this problem by checking the length of the stringified macro arguments and adjusting the return value for a empty __VA_ARGS__  - as follows:

 #define PP_NARG(...)  (PP_NARG_(__VA_ARGS__,PP_RSEQ_N()) - \  
     (sizeof(#__VA_ARGS__) == 1))  
 #define PP_NARG_(...)  PP_ARG_N(__VA_ARGS__)  
   
 #define PP_ARG_N( \  
    _1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \  
   _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \  
   _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \  
   _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \  
   _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \  
   _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \  
   _61,_62,_63, N, ...) N  
   
 #define PP_RSEQ_N() \  
     63,62,61,60,          \  
     59,58,57,56,55,54,53,52,51,50, \  
     49,48,47,46,45,44,43,42,41,40, \  
     39,38,37,36,35,34,33,32,31,30, \  
     29,28,27,26,25,24,23,22,21,20, \  
     19,18,17,16,15,14,13,12,11,10, \  
     9,8,7,6,5,4,3,2,1,0  

The purists may point out that PP_NARG() only handles 64 arguments.  For just integer arguments, a better solution for any number of arguments has been proposed by user qrdl on stackoverflow:

 #define NUMARGS(...) (int)(sizeof((int[]){0, ##__VA_ARGS__})/sizeof(int)-1)

..which is appealing as it is more immediately understandable than the PP_NARG() macro, however it is less generic since it only works for ints.

Anyhow, it's great to find such novel solutions even if they may be at first a little bit non-intuitive.


Read more
Colin Ian King

Forcing a CMOS reset from userspace

Resetting CMOS memory on x86 platforms is normally achieved by either removing the CMOS battery or by setting a CMOS clear motherboard jumper in the appropriate position.  However, both these methods require access to the motherboard which is time consuming especially when dealing with a laptop or netbook.

An alternative method is to twiddle specific bits in the CMOS memory so that the checksum is no longer valid and on the next boot the BIOS detects this and this generally forces a complete CMOS reset.

I've read several ways to do this, however the CMOS memory layout varies from machine to machine so some suggested solutions may be unreliable across all platforms.  Apart from the Real Time Clock (which writing to won't affect a CMOS reset), the only CMOS addresses to be consistently used across most machines are 0x10 (Floppy Drive Type), 0x2e (CMOS checksum high byte) and 0x2f (CMOS checksum low byte).  With this in mind, it seems that the best way to force a CMOS reset is to corrupt the checksum bytes, so my suggested solution is to totally invert each bit of the checksum bytes.

To be able to read the contents of CMOS memory we need to write the address of the memory to port 0x70 then delay a small amount of time and then read the contents by reading port 0x71.    To write to CMOS memory we again write the address to port 0x70, delay a little, and then write the value to port 0x71.   A small delay of 1 microsecond (independent of CPU speed)  can be achieved by writing to port 0x80 (the Power-On-Self-Test (POST) code debug port).

 static inline uint8_t cmos_read(uint8_t addr)  
 {  
     outb(addr, 0x70);    /* specify address to read */  
     outb(0, 0x80);       /* tiny delay */  
     return inb(0x71);    /* read value */  
 }  
   
 static inline void cmos_write(uint8_t addr, uint8_t val)  
 {  
     outb(addr, 0x70);    /* specify address to write */  
     outb(0, 0x80);       /* tiny delay */  
     outb(val, 0x71);     /* write value */  
 }  

And hence inverting CMOS memory at a specified address is thus:

 static inline void cmos_invert(uint8_t addr)  
 {  
     cmos_write(addr, 255 ^ cmos_read(addr));  
 }  

To ensure we are the only process accessing the CMOS memory we should also turn off interrupts, so we use iopl(3) and asm("cli") to do this and then asm("sti") and iopl(0) to undo this.   We also need to use ioperm() to get access to ports 0x70, 0x71 and 0x80 for cmos_read() and cmos_write() to work and we need to run the program with root privileges.  The final program is as follows:

 #include <stdio.h>  
 #include <stdlib.h>  
 #include <stdint.h>  
 #include <unistd.h>  
 #include <sys/io.h>  
   
 #define CMOS_CHECKSUM_HI (0x2e)  
 #define CMOS_CHECKSUM_LO (0x2f)  
   
 static inline uint8_t cmos_read(uint8_t addr)  
 {  
     outb(addr, 0x70);    /* specify address to read */  
     outb(0, 0x80);       /* tiny delay */  
     return inb(0x71);    /* read value */  
 }  
   
 static inline void cmos_write(uint8_t addr, uint8_t val)  
 {  
     outb(addr, 0x70);    /* specify address to write */  
     outb(0, 0x80);       /* tiny delay */  
     outb(val, 0x71);     /* write value */  
 }  
   
 static inline void cmos_invert(uint8_t addr)  
 {  
     cmos_write(addr, 255 ^ cmos_read(addr));  
 }  
   
 int main(int argc, char **argv)  
 {  
     if (ioperm(0x70, 2, 1) < 0) {  
         fprintf(stderr, "ioperm failed on ports 0x70 and 0x71\n");  
         exit(1);  
     }  
     if (ioperm(0x80, 1, 1) < 0) {  
         fprintf(stderr, "ioperm failed on port 0x80\n");  
         exit(1);  
     }  
     if (iopl(3) < 0) {  
         fprintf(stderr, "iopl failed\n");  
         exit(1);  
     }  
   
     asm("cli");  
     /* Invert CMOS checksum, high and low bytes*/  
     cmos_invert(CMOS_CHECKSUM_HI);  
     cmos_invert(CMOS_CHECKSUM_LO);  
     asm("sti");  
   
     (void)iopl(0);  
     (void)ioperm(0x80, 1, 0);  
     (void)ioperm(0x70, 2, 0);  
   
     exit(0);  
 }  

You can find this source in by debug code git repo.

Before you run this program, make sure you know which key should be pressed to jump into the BIOS settings on reboot (such as F2, delete, backspace,ESC, etc.) as some machines may just display a warning message on reboot and need you to press this key to progress further.

So to reset, simple run the program with sudo and reboot.  Easy.  (Just don't complain to me if your machine isn't easily bootable after running this!)


Read more
Colin Ian King

Some problems are a little challenging to debug and require sometimes a bit of lateral thinking to solve.   One particular issue is when suspend/resume locks up and one has no idea where or why because the console has is suspended and any debug messages just don't appear.

In the past I've had to use techniques like flashing keyboard LEDs, making the PC speaker beep or even forcible rebooting the machine at known points to be able to get some idea of roughly where a hang has occurred.   This is fine, but it is tedious since we can only emit a few bits of state per iteration.   Saving state is difficult since when a machine locks up one has to reboot it and one looses debug state.   One technique is to squirrel away debug state in the real time clock (RTC) which allows one to store twenty or so bits of state, which is still quite tough going.

One project I've been working on is to use the power of system tap to instrument the entire suspend/resume code paths - every time a function is entered a hash of the name is generated and stored in the RTC.  If the machine hangs, one can then grab this hash out of the RTC can compare this to the known function names in /proc/kallsyms, and hopefully this will give some idea of where we got to before the machine hung.

However, what would be really useful is the ability to print out more debug state during suspend/resume in real time.   Normally I approach this by using a USB/serial cable and capturing console messages via this mechanism.  However, once USB is suspended, this provides no more information.

One solution I'm now using is with Kamal Mostafa's minimodem.  This wonderful tool is an implementation of a software modem and can send and receive data by emulating a Bell-type or RTTY FSK modem.  It allows me to transmit characters at 110 to 300 baud over a standard PC speaker and reliably receive them on a host machine.  If the wind is in the right direction, one can transmit at higher speeds with an audio cable plugged in the headphone jack of the transmitter and into the microphone socket on the receiver if hardware allows.

The 8254 Programmable Interval-timer on a PC can be used to generate a square wave at a predefined frequency and can be connected to the PC speaker to emit a beep.  Sending data using the speaker to minimodem is a case of sending a 500ms leader tone, then emitting characters.  Each character has a 1 baud space tone, followed by 8 bits (least significant bit first) with a zero being a 1 baud space tone and a 1 being represented by a 1 baud mark tone, and the a trailing bunch of stop bits.

So using a prototype driver written by Kamal, I tweaked the code and put it into my suspend/resume SystemTap script and now I can dump out messages over the PC speaker and decode them using minimodem.  300 baud may not be speedy, but I am able to now instrument and trace through the entire suspend/resume path.

The SystemTap scripts are "work-in-progress" (i.e. if it breaks you keep the pieces), but can be found in my pmdebug git repo git://kernel.ubuntu.com/cking/pmdebug.git.  The README file gives a quick run down of how to use this script and I have written up a full set of instructions.

The caveat to this is that one requires a PC where one can beep the PC speaker using the PIT.  Lots of modern machines seem to either have this disabled, or the volume somehow under the control of the Intel HDA audio driver.  Anyhow, kudos to Kamal for providing minimodem and giving me the prototype kernel driver to allow me to plug this into a SystemTap scrip.


Read more