Canonical Voices

Colin Ian King

Fragmentation on ext2/ext3/ext4 filesystems

If you want know how fragmented a file is on an ext2/ext3/ext4 filesystem there are a couple of methods of finding out.  One method is to use hdparm, but one needs CAP_SYS_RAWIO capability to do so, hence run with sudo:

sudo hdparm --fibmap /boot/initrd.img-2.6.38-9-generic

/boot/initrd.img-2.6.38-9-generic:
 filesystem blocksize 4096, begins at LBA 24000512; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0   35747840   35764223      16384
     8388608   39942144   39958527      16384
    16777216   40954664   40957423       2760

Alternatively, one can use the filefrag utility (part of the e2fsprogs package) for reporting the number of extents:

filefrag /boot/initrd.img-2.6.38-9-generic
/boot/initrd.img-2.6.38-9-generic: 3 extents found

..or more verbosely:

filefrag -v /boot/initrd.img-2.6.38-9-generic
Filesystem type is: ef53
File size of /boot/initrd.img-2.6.38-9-generic is 18189027 (4441 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  1468416            2048
   1    2048  1992704  1470463   2048
   2    4096  2119269  1994751    345 eof
/boot/initrd.img-2.6.38-9-generic: 3 extents found

Well, that's useful. So, going one step further, how many free extents are available on the filesystem? Well, e2freefrag is the tool for this:

sudo e2freefrag /dev/sda1
Device: /dev/sda1
Blocksize: 4096 bytes
Total blocks: 2999808
Free blocks: 1669815 (55.7%)

Min. free extent: 4 KB
Max. free extent: 1370924 KB
Avg. free extent: 6160 KB
Num. free extent: 1084

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
    4K...    8K-  :           450           450    0.03%
    8K...   16K-  :            97           227    0.01%
   16K...   32K-  :            99           527    0.03%
   32K...   64K-  :           134          1475    0.09%
   64K...  128K-  :            96          2193    0.13%
  128K...  256K-  :            50          2235    0.13%
  256K...  512K-  :            30          2643    0.16%
  512K... 1024K-  :            36          6523    0.39%
    1M...    2M-  :            36         13125    0.79%
    2M...    4M-  :            14          9197    0.55%
    4M...    8M-  :            15         18841    1.13%
    8M...   16M-  :             8         17515    1.05%
   16M...   32M-  :             7         38276    2.29%
   32M...   64M-  :             3         39409    2.36%
   64M...  128M-  :             3         67865    4.06%
  512M... 1024M-  :             4        802922   48.08%
    1G...    2G-  :             2        646392   38.71%

..thus reminding me to do some housekeeping and remove some junk from my file system... :-)


Read more
Colin Ian King

Making my 2nd Webcam the default for Empathy

It just so happens that I have two Webcams on my machine, one being a rather poor one built into the laptop and a 2nd better quality Logitech webcam.

Using the 2nd webcam by default in Empathy for video conference calls required a little bit of hackery with gconf-editor by changing /system/gstreamer/0.10/default/videosrc from v4l2src to vl4l2src device="/dev/video1"


This wasn't entirely the most user friendly way to configure the default. Ho hum..


Read more
Colin Ian King

The semantics of halt.

It appears that the semantics of halt mean it will stop the machine but it may or may not shut it down.  Back when I used UNIX boxes, halt basically stopped the machine but never powered it down; to power it down one had to explicitly use "halt -p".

So things change. With upstart, halt is a symbolic link to reboot and reboot calls shutdown -h.  The man page to shutdown states for the -h option:

"Requests that the system be either halted or powered off after it has been brought down, with the choice as to which left up to the system."

Hrm, so this vaguely explains why halting some machines may just halt and on others it may also shut the system down.  I've not digged into this thoroughly yet, but one suspects that for different processor architectures we get different implementations.  Even for x86 we have variations in CPUs and boards/platforms, so it really it is hard to say if halt will power down a machine.

The best bet is to assume halt just halts and if you want it to power down always use "halt -p".


Read more
Colin Ian King

The Linux PCI core driver provides a useful (and probably overlooked) sysfs interface to read PCI ROM resources.  A PCI device that has a ROM resource will have a "rom" sysfs file associated with it, writing anything other than 0 to it will enable one to then read the ROM image from this file.

For example, on my laptop, to find PCI devices that have ROM images associated with them I used:

find /sys/devices -name "rom"
/sys/devices/pci0000:00/0000:00:02.0/rom

and this corresponds to my Integrated  Graphics Controller:

lspci | grep 02.0
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c)

To dump the ROM I used:

echo 1 | sudo tee /sys/devices/pci0000\:00/0000\:00\:02.0/rom
sudo cat /sys/devices/pci0000\:00/0000\:00\:02.0/rom > vbios.rom

To disassemble this I used ndisasm:

sudo apt-get install nasm
ndisasm -k 0,3 vbios.rom | less

..and just use strings on the ROM image to dump out interesting text, e.g.

strings vbios.rom
000000000000
00IBM VGA Compatible BIOS.
PCIR
(00`
*@0p
H?@0b
..


..and then used a tool like bvi to edit the ROM.


Read more
Colin Ian King

Editing binaries using bvi

I don't want to start a vi vs emacs holy war, but I have found a wonderful binary editor based on the vi command set.  The bvi editor is great for tweaking binary files - one can tab between the hex and ASCII representations of the binary data to re-edit the data.

To install use:

sudo apt-get install bvi

As well as basic vi commands, bvi contains bit-level operations to and/or/xor/not/shift bits too.   One has to use ':set memmove' to insert/delete data as this is not enabled by default.

Anyhow, install, consult the man page and get editing binary files!


Read more
Colin Ian King

Dumping the contents of the Embedded Controller (EC) can be useful when debugging some x86 BIOS/kernel related issues.  At a hardware level to get access to the EC memory one goes via the EC command/status and data port.   As a side note, one can determine these ports as follows:

cat /proc/ioports  | grep EC

The preferred way to access these is via the ACPI EC driver in drivers/acpi/ec.c which is used by the ACPI driver to handle read/write operations to the EC memory region.

In addition to this driver, there the ec_sys module that provides a useful debugfs interface to allow one to read + write to the EC memory.  Write support is enabled with the ec_sys module parameter 'write_support' but it is generally discouraged as one may be poking data into memory may break things in an unpredictable manner, hence by default write support is disabled.

So, to dump the contents of the first EC (assuming debugfs is mounted), do:

sudo modprobe ec_sys
sudo od -t x1 /sys/kernel/debug/ec/ec0/io

Simple!

As a bonus, the General Purpose Event bits are also readable from /sys/kernel/debug/ec/ec0/gpe.

Before I stumbled upon ec_sys.c I used a SystemTap script to execute ec_read() in ec.c to do the reading directly.  Yes it's ugly and stupid, but it does prove SystemTap is a very useful tool.


Read more
Colin Ian King

SystemTap is a very useful and powerful tool that enables one to insert kernel debug into a running kernel.  Today I wanted to inspect the I/O read/write operations occurring when running some ACPI AML, so it was a case of hacking up a few lines of system tap to dump out the relevant state (e.g. which port being accessed, width of the I/O operation in bits and value being written or read).

So instead of spinning a bespoke kernel with a few lines of debug in, I use SystemTap to quickly write a re-usable script.  Simple and easy!

I've put the SystemTap script in my git repository for any who are interested.


Read more
Colin Ian King

The ACPI engine in the kernel can be debugged by building with CONFIG_ACPI_DEBUG and configuring /sys/module/acpi/parameters/debug_layer and /sys/module/acpi/parameters/debug_level appropriately.   This can provide a wealth of data and is generally a very powerful debug state tracing mechanism.  However, there are times when one wants to get a little more debug data out or perhaps just drill down on a specific core area of functionality without being swamped by too much ACPI debug. This is where tools like SystemTap are useful.

SystemTap is a very powerful tool that allows one to add extra debug instrumentation into a running kernel without the hassle and overhead of rebuilding a kernel with debug printk() statements in. It allows very quick turnaround in writing debug and one does not have to reboot a machine to load a new kernel since the debug is loaded and unloaded dynamically.

SystemTap has its own scripting language for writing debug scripts, but for specialised hackery it provides a mechanism ('guru mode') to embed C directly which can be called from the SystemTap script.   The SystemTap language is fairly small and easy to understand and one easily becoming proficient with the language in a day.

The only downside is that one requires a .ddeb kernel package which is huge since it contains all the necessary kernel debug information. 

Over the past week  I have been looking at debugging various aspects of the ACPI core, such as fulling tracing suspend/resume and dumping out executed AML code at run time.   I was able to quickly prototype a SystemTap script that dumps out AML opcodes on the Oneiric kernel - this saved me the usual build of a debug kernel with CONFIG_ACPI_DEBUG enabled and then capturing the appropriate debug and wading through copious amounts of debug data.

Conclusion: Some initial investment in time and effort is required to understand SystemTap (and to get to grips with the more useful features in 'guru mode'). However, one can be far more productive because the debug cycle is made far more efficient. Also, SystemTap provides plenty of functionality to allow very detailed and targeted debugging scripts.


Read more
Colin Ian King

PC Speaker weirdnesses

Today we were trying to do some debugging by getting some tones out of a laptop speaker by frobbing bit 1 of port 0x61 on the keyboard controller.  Rather unexpectedly I got no sound whatsoever out of the speaker, yet I had managed to do so the day before.   So I double checked what had changed since the day before:

1. Was it because I upgraded my kernel?
2. Did I unexpected disabled the speaker when tweaking BIOS settings?
3. Was it something interfering with my port 0x61 bit twiddling?
4. Was the hardware now broken?

As per usual, I first assumed that the most complex parts of the system were to blame as they normally can go wrong in the most subtle way.  After a lot of fiddling around I discovered that the PC speaker only worked when I plugged the AC power into the laptop.  Now that wasn't obvious.

I suspect I should have applied Occam's Razor to this problem to begin with. We live and learn...


Read more
Colin Ian King

Back in February I wrote about turning off a PC using the Intel I/O Controller Hub and had some example code to do this, and it was a dirty hack.  I've revised this code to be more aligned with how the Linux kernel does this, namely:

1. Cater for the possible existence of PM1b_EVT_BLK and PM1b_CNT_BLK registers.
2. Clear WAKE_STATUS before transitioning to S5
3. Instead of setting SLP_TYP and SLP_EN on, one sets SLP_TYPE and then finally sets SLP using separate writes.

The refined program requires 4 arguments, namely the port address of the PM1a_EVT_BLK, PM1b_EVT_BLK, PM1a_CNT_BLK and PM1b_CNT_BLK.  If the PM1b_* ports are not defined, then these should be 0.

To find these ports run either:

cat /proc/ioports | grep PM1

or do:

sudo fwts acpidump - | grep PM1 | grep Block:

For example, on a Sandybridge laptop, I have PM1a_EVT_BLK = 0x400, PM1a_CNT_BLK = 0x404 and the PM1b_* ports are not defined, so I use:

sudo ./halt 0x400 0 0x404 0

..and this will transition the laptop to the S5 state very quickly. Needless to say, make sure you have sync'd and/or unmounted your filesystems before doing this.

The PM1_* port addresses from /proc/ioport and the fwts acpidump come from the PM1* configuration data from the ACPI FACP, so if this is wrong, then powering down the machine won't work.  So, if you can't shutdown your machine using this example code then it's possible the FACP is wrong. At this point, one should sanity check the port addresses using the appropriate Southbridge data sheet for your machine - generally speaking look for:

PM1_STS—Power Management 1 Status Register (aka PM1a_EVT_BLK)
PM1_CNT—Power Management 1 Control  (aka PM1a_CNT_BLK)

..however these are offsets from PMBASE which are defined in the LPC Interface PCI Register Address Map so you may require a little bit of work to figure out the addresses of these registers on your hardware.


Read more
Colin Ian King

Normally when I see a problem to do with CPU, I/O, network or memory resource hogging I turn to my trusty tools such as vmstat, iostate or ifstat to check system behaviour.

I stumbled upon dstat today, which boasts to be "a versatile replacement for vmstat, iostat and ifstat.".  So let's see what it can do. First, install with:

sudo apt-get install dstat

Running dstat with no arguments displays CPU stats (user,system,idle,wait,hardware interrupts, software interrupts), Disk I/O stats (read/write), Network stats (receive,send), Paging stats (in/out) and System stats (interrupts, context switches).   Not bad at all. 

Dstat even highlights in colour the active values, so you don't miss relevant deltas in the statistics being churned out by the tool.  Colours can be disabled with the --nocolor option.

But there is more. There are a plethora of options to show various system activities, such as:

-l       load average
-m       memory stats (used, buffers, cache, free)
-p       process stats (runnable, uninterruptible, new)
--aio    aio stats (asynchronous I/O)
--fs     filesystem stats (open files, inodes)
--ipc    ipc stats (message queue, semaphores, shared memory)
--socket socket stats (total, tcp, udp, raw, ip-fragments)
--tcp    TCP stats (listen, established, syn, time_wait, close)
--udp    UDP stats (listen, active)
--vm          virtual memory stats (hard pagefaults, soft pagefaults, allocated, free)

..to name but a few.  As it is, this already is more functional that vmstat, iostat and ifstat.   But that's not all - dstat is extensible by the use of dstat plugins and the packaged version of dstat comes shipped with a few already, for example:

--battery          battery capacity
--cpufreq          CPU frequency
--top-bio-adv  top block I/O activity by process
--top-cpu-adv  most expensive CPU process
--top-io      most expensive I/O process
--top-latency  process with highest latency

..and many more besides - consult the manual page for dstat to see more.

So, dstat really is a Swiss army knife - a useful tool to get to know and use whenever you need to quickly spot misbehaving processes or devices.


Read more
Colin Ian King

GNU EFI lib and Hello World

Writing UEFI applications takes a little bit of effort to get started, fortunately the GNU EFI library provides enough infrastructure to make it far less painful.  The crunch points are:

1. Understanding which UEFI protocols to use, fortunately these are well documented in the UEFI specification. Unfortunately the specification is a couple of thousand pages long, so be prepared to spend some time working out how it all hangs together.

2. Figuring out how to call the protocols using the GNU EFI wrapper shim.

3. Generating a UEFI PE COFF image using some objcopy runes.

So, the best way to start was to write a "Hello World" program and to work up from there.  First, I installed the gnu-efi development library using:

sudo apt-get install gnu-efi

Next I hacked up the classic "Hello World", this can be found here.  Then considerable amount of Googling was required to figure out how to use objcopy to help create the desired executable image.  The Makefile containing these runes can be found here.

So what next?  Well, this little adventure is my first step in a quest to write a bunch of tests so ensure UEFI firmware has functional protocols to allow grub to load a kernel.  I was hoping to re-use the firmware test suite framework but it may take some effort since I don't have any POSIX support on UEFI, so I'm going to have to scope out how much effort is required to port the framework to UEFI.

Anyhow, it's a start.


Read more
Colin Ian King

The ext4 file system has a bunch of per-device /sys entries in /sys/fs/ext4/ that can be used to inspect and change ext4 tuning parameters.   One of the read-only values available is lifetime_write_kbytes which shows the number of kilobytes of data written to the file system since it was created.   

For example, to see how much data was written on an ext4 filesystem on /dev/sda5, one uses:

cat /sys/fs/ext4/sda5/lifetime_write_kbytes

To see how much data in has been written in kilobytes since mount time, read  session_write_kbytes, for example:

cat /sys/fs/ext4/sda5/session_write_kbytes

For a full description of all the tunables, consult Documentation/ABI/testing/sysfs-fs-ext4 in the Linux source.


Read more
Colin Ian King

The hwloc (hardware locality) package contains the useful tool lstopo. To install use:

sudo apt-get install hwloc

By default, lstopo will display a logical view of the system caches and CPU cores, for example:


To get a non-graphical output use:

lstopo -

Machine (1820MB) + Socket #0 + L3 #0 (3072KB)
  L2 #0 (256KB) + L1 #0 (32KB) + Core #0
    PU #0 (phys=0)
    PU #1 (phys=2)
  L2 #1 (256KB) + L1 #1 (32KB) + Core #1
    PU #2 (phys=1)
    PU #3 (phys=3)

lstopo is also able to output the toplogy image in a variety of formats (Xfig, PDF, Postscript, PNG, SVG and XML) by specifying the output filename and extension, e.g.

lstopo topology.pdf

For more information, consult the manual for hwloc and lstopo.


Read more
Colin Ian King

Making sense of PCIe ASPM

PCI Express based serial linked devices can be managed by Active State Power Management (ASPM) to extend battery life on mobile devices such as laptops and netbooks.  ASPM is a power management protocol that allows an operating system's power management to place the link physical layer into a low power mode and it has the ability to instruct other devices on the link to go into a lower power mode too.

The plus side is that we save power with ASPM, however, it will introduce some latency as the bus needs time to be woken up when in a low power state.
The PCIe specification (version 2.0) defines two power modes:

  • L0s, which set low power mode in in direction on the link (usually from the physical link layer controller downstream)
  • L1, which sets low power mode in both directions on the link, however there is greater wakeup latency.
The Linux ASPM driver implements the nitty-gritty details of ASPM - by default it reads the ASPM config from the PCI configuration space. The ASPM config is contained in one of the Capabilities List items, the head of this is pointed to by CapPntr (offset 0x34 in the Configuration Header).

However, the firmware (BIOS) may have misconfigured the PCI configuration space, and apparently this can cause issues, such as hangs if the link is put into low power and the device cannot handle this.   I managed to dig up some information concerning the ASPM implementation on Windows Vista and this states that the BIOS can indicate that ASPM should be disabled by setting the PCIe ASPM Controls bit in the IACP_BOOT_ARCH flag in the Fixed ACPI Description Table (FADT).
Poking around a bit deeper, it seems that this is a requirement in the ACPI specification to set this bit appropriately.   Version 4.0a of the ACPI specification, Table 5-11 Fixed ACPI Description Table Boot Architecture Flags states for following for bit 4 (PCIe ASPM Controls) of the BOOT_ARCH flag:

"If set, indicates to OSPM that it must not enable OSPM ASPM control on this platform."

Linux checks for this in acpi_pci_init() and will disable ASPM if the bit 4 is set.  But what happens if the BIOS is misconfigured, how can one disable ASPM?
Well, Linux does provide some ASPM driver kernel parameters to allow some level of tweakability.  The following kernel parameters can be used:

  • "pcie_aspm=off" - disables ASPM
  • "pcie_aspm=default" - use default firmware configuration as set in the PCI Express Capabalities list item with ID 0x10
  • "pcie_aspm=performance"  - disables ASPM and clock power management
  • "pcie_aspm=powersave" - highest power saving mode, enable ASPM and clock power management
"pcie_aspm=off" has seemed to help some users with some PCIe devices that case Linux hang at boot time.  I deeply suspect that the IACP_BOOT_ARCH flag may be telling Linux to do this but it's being ignored.

I measured "pcie_aspm=powersave" on one of my netbooks and I believe I get a small amount of power saving, powertop seemed to indicate about 0.2 Watts being saved, but this could be within the margin of error in my measurements.

Anyhow, if you want to tinker with more power savings, maybe overriding the firmware configured defaults with "pcie_aspm=powersave" may help. It's worth a try.
 
p { margin-bottom: 0.28cm; }


Read more
Colin Ian King

Firmware Test Suite for Ubuntu 11.04

The  Firmware Test Suite (fwts) is a tool I've been working on over the past few months that does automated testing of PC firmware.  The main aim is to check for errors in BIOS code and ACPI tables.  I have put most effort into ACPI testing as my analysis shows that it constitutes over 90% of firmware related errors reported against the kernel in LaunchPad.

Most of the key features for Ubuntu 11.04 are now in the Firmware Test Suite, so now seems an appropriate time to mention them.

Automated ACPI Method Testing.

The ACPI DSDT and SSDT tables contain ACPI Machine Language (AML) bytecode.  This code is executed by the Linux kernel by the ACPI driver and is an abstract interface between the operating system and the underlying machine hardware, such as the embedded controller, the closed proprietary BIOS and various hardware registers.   The AML can describe hardware layout and contains methods that operate in a manner that should conform to the ACPI specification.   For example, there are methods to handle power management,  thermal control, battery status and much more besides.   The main problem we face is when these implementations are buggy and may not conform to the specification.   The kernel can execute the AML code and so if the AML is broken you may get a machine that does not work correctly.

I've added a method test to fwts that will extract the AML byte code from the DSDT and then using the ACPICA execution engine run these in user space and check methods are returning the expected return types and in some cases (where appropriate) check expected return values.  Over 70 standard methods described in version 4.0a of the ACPI specification are sanity checked.  The code also checks mutex acquire/release counts to see if methods are incorrectly not releasing locks.

To run these test using 11.04 fwts, use:


sudo fwts method

..and the results are appended to the file results.log.  (One has to use sudo to be able to read the ACPI tables).

Another way to sanity check the AML code is to extract it, dis-assemble and then re-assemble it using the ACPICA iasl compiler.  A lot of AML is compiled using the Microsoft AML compiler and this appears to be less strict. Hence, recompiling with a more strict compiler can pick up errors.   The syntaxcheck test existed in fwts for Ubuntu 10.10, but I've re-worked this by integrating in the ACPICA core into the tool and then enhancing the error reporting.  For most error messages produced by the compiler fwts now also outputs 8 lines of disassembled code around the error (to provide context) and also outputs extra feedback advice.  This advice is based on looking at all the reasons why this error can be emitted by the compiler and explaining these with more context.

To run this test, use:

sudo fwts syntaxcheck

..again, output is appended to results.log

Since the ACPICA core is integrated in, we can also use this to easily dump out the disassembled AML byte code from all the ACPI tables.  To do so use:

sudo fwts --disassemble-aml

and the AML code for each table is dumped into files DSDT.dsl and SSDTn.dsl (where n is the Nth SSDT).  This is much easier than running acpixtract, acpidump and the iasl -d on the DSDT and SSDT tables.

The final ACPI table test added to fwts in Ubuntu 11.04 performs about 40 sanity checks on configuration data in common ACPI tables.  To run this test use:

sudo fwts acpitables
 
..and this checks various configurations, e.g. of HPET, SCI interrupt, GPE blocks, reset register + reset values, APIC flags, etc.

Although it's strictly not a test, fwts now includes a tool that will dump out the ACPI tables in an annotated form so that one can examine the configuration settings in a more human readable form than just looking at the raw data in a hex editor. To do use:

sudo fwts acpidump

UEFI

The Firmware Test Suite also needs to work on UEFI firmware based machines which are now becoming far more common.  I've re-worked some of the underlying code in fwts to work with UEFI and standard BIOS.  My next step is to extend fwts to add in more UEFI only tests, but this is still "work-in-progress".

Other goodness

As it is an evolving tool, fwts includes a lot more refinements (and bug fixes) compared the first cut that came out in Ubuntu 10.10.   The log format to be less verbose and there is the new "-" option that dumps the log straight to stdout rather than appending it to a log file.

To see all the tests built into fwts, you can use the "--show-tests-full" option, rather than the more terse option "--show-tests".  

Originally fwts ran silently and this caused some confusion as it was not obvious what was happening.  For Ubuntu 11.04  fwts now runs in verbose mode and reports a brief per test progress information.  The silent mode can be enabled using the "-q" or "--quiet" options.

One can also run the tool on dumped ACPI tables from other machines allowing to remotely diagnose bugs just from raw table data.  Dump the tables using:

sudo fwts --dump

..this produces a dump of kernel log, dmidecode, lspci and the ACPI tables. One can read in the ACPI tables into fwts and run appropriate ACPI specific tests on this data, e.g.

fwts --dumpfile=acpidump.log syntaxcheck method acpitables

There are many more refinements and minor new features.  I recommend reading the manual page to get more familiar with the tool.

To install fwts in Ubuntu 11.04 use:

sudo apt-get install fwts

Enjoy!

[ Update ]

I've written a fwts reference guide, available here: https://wiki.ubuntu.com/Kernel/Reference/fwts


Read more
Colin Ian King

Shutting down a PC is normally performed by transitioning the machine into an ACPI S5 state which is achieved by writing some magic value into an ACPI PM control register.  Implementing this is very straight forward once one has navigated a couple of rather information dense documents.

Armed with a copy of an Intel ICH manual one see that section 8.8.3.3 the PM1_CNT (Power Management 1 Control) register is the direct implementation of the ACPI PM1a_CNT_BLK register as described in section 4.7.3.2.1 of version 4.0 of the ACPI specification.


Bits 10:12 control the sleep state type (SLP_TYP), and setting these to 111 selects soft power off. To signal this, one has to set bit 13 to transition into the required sleep state.

On an Intel system, finding the PM1_CNT register is simple, use:

cat /proc/ioports | grep PM1a_CNT_BLK

..on my Dell 1525 this results in:

1004-1005 : ACPI PM1a_CNT_BLK

..so a soft shutdown can be invoked running the following code with root privileges:

#include <unistd.h>
#include <sys/io.h>
#include <stdint.h>
#include <stdio.h>

/* PM1 Sleep types */
#define SLP_ON       0
#define SLP_ST_PCLK  1
#define SLP_S3       5
#define SLP_S5       6
#define SLP_SOFT_OFF 7

int main(int argc, char **argv)
{
        uint16_t val;

        if (ioperm(0x1004, 2, 1) < 0) {
                printf("Cannot access port 0x1004\n");
                exit(0);
        }
        val = inw(0x1004);
        val &= ~(7 << 10); /* Clear SLP_TYPE */
        val |= (SLP_SOFT_OFF << 10); /* Soft power off */
        val |= (1 << 13);  /* Trigger SLP_EN */
        outw(val, 0x1004);
}

..this is akin to the Vulcan death grip, so make sure that one has sync'd and unmounted file systems first! It's quite impressive to see how quickly can shutdown a machine using this method.

Of course, more rugged implementation would be to determine the PM1a_CNT_BLK register from the Fixed ACPI Description Table (FADT), but happily for us Linux reports this register in /proc/ioports, so it's simple to do from user space.


Read more
Colin Ian King

Int 0x15 e820 and libx86

One of the more obscure libraries I've looked at recently is libx86 which provides some x86 real mode functionality to do tricks such as BIOS calls from user space. The library provides routines to do real mode malloc/free and also a structure to pass/return CPU register to/from an Interrupt call. One uses:

sudo apt-get install libx86-dev

..to install the libx86 devel library.

I experimented with libx86 by using the Int 0x15 AX=0xe820 call to query the system address map. One loads EAX with 0xe820, EBX initially with zero (for the first address map structure), ES:DI with a pointer a 20 byte real mode buffer, ECX with 20 (size of the buffer) and EDX with the signature 'SMAP'. If the carry is set or EBX is zero after the call, then either an error occurred or one has reached the end of the table. After the call, EBX contains a continuation value that is passed back in EBX on the next call. The 20 byte real mode buffer is populated as follows:

 uint32     BaseAddrLo   Lo 32 Bits of Base Address
 uint32     BaseAddrHi   Hi 32 Bits of Base Address
 uint32     LengthLo     Lo 32 Bits of Length in Bytes
 uint32     LengthHi     Hi 32 Bits of Length in Bytes
 uint32     Type         Address type 
 
My implementation to dump out the entire system address map can be found here. Compile and link with libx86 and then run the executable using sudo:

sudo ./e820
Base Address Length Type
0000000000000000 000000000009fc00 RAM
000000000009fc00 0000000000000400 Reserved
00000000000e0000 0000000000020000 Reserved
0000000000100000 000000003f6b0000 RAM
000000003f7b0000 000000000000e000 ACPI Reclaim
000000003f7be000 0000000000032000 ACPI NVS memory
000000003f7f0000 0000000000010000 Reserved
00000000fee00000 0000000000001000 Reserved
00000000fff00000 0000000000100000 Reserved

Anyhow, I hope this example shows how to do BIOS interrupts using libx86 from userspace in Linux.


Read more
Colin Ian King

Video Mode Interrogation via BIOS calls.

While investigating the video modes on my laptop to do some sanity checking I thought I'd have a go at using libx86 to allow me to call into the BIOS directly from userspace.  After short consultation with the trusty Ralph Brown Interrupt List, I hacked up some code to get the list of all the video modes and interrogate each one to get the resolution.

I was getting random crashes before I realised that clearing the LRMI_regs struct before populating it with the desired x86 register settings fixed the problem.

Anyhow, the code is available in my git repository here - and it requires one to install libx86-dev and run with root privileges.   I'm still amazed that the code can jump from userspace into the BIOS, gather data and return without totally messing up the system.

Interestingly, a lot of UEFI systems still provide a legacy mode BIOS, so this code still works on these newer systems.


Read more
Colin Ian King

Resetting a PC using the Reset Control Register

Last night I was reading a data sheet about the ICH10 I/O Controller Hub (Section 13.7.5) and got a little more insight on the workings of the Reset Control Register (port 0xcf9).

Bits 1 and 3 determine the type of reset being requested and bit 2 initiates the reset.   When bit 2 (SYS_RST) transitions from 0 to 1 a reset is initiated as determined by the policy of bits 1 and 3.

Bit 1, System Reset (SYS_RST) determines a soft reset (0) or hard reset (1).
Bit 3, Full Reset (FULL_RST) if set to 1 causes a full power cycle.

On some systems, the ACPI FACP RESET_REG and RESET_VALUE are set 0xcf9 and 0x06 respectively which essentially triggers a hard system reset when doing a reboot using the reboot=acpi kernel option.

The kernel also has a reboot=pci option that will force a reset via the Reset Control Register and does this in three stages. First it sets bit 1 (SYS_RST=hard reset), waits 50 microseconds and then transitions SYS_RST from 0 to 1 to initiate the reset. 

However, a full system reset can be initiated by also setting bit three by writing 0x0e to port 0xcf9.  I've noticed that some newer laptops on the market seem to be doing a full system reset on reboot, so I wonder if this is the mechanism they are using nowadays.

The beauty of Linux is that one can string a bunch of commands together to do this from user space:

echo -e '\xe' | sudo dd of=/dev/port bs=1 seek=3321

..so make sure you data is sync'd before hand as this is destructive as power cycling the machine while it's on.


Read more