Canonical Voices

Posts tagged with 'cpu'

Colin Ian King

Linaro's idlestat is another useful tool in the arsenal of CPU monitoring utilities.  Idlestat monitors and captures CPU C-state and P-state transitions using the kernel Ftrace tracer and outputs statistics based on entering/exiting each state for each CPU.  Idlestat  also captures IRQ activity as well which ones caused a CPU to exit an idle state -  knowing why a processor came out of a deep C state is always very useful way to help diagnose power consumption issues.

Using idlestat is easy, to capture 20 seconds of activity into a log file called example.log, run:

 sudo idlestat --trace -f example.log -t 20    
..and this will display the per CPU C-state and P-state and IRQ statistics for that run.

One can also take the saved log file and parse it again to calculate the statistics again using:
 idlestat --import -f example.log  

One can get the source from here and I've packaged version 0.3 (plus a bunch of minor fixes that will land in 0.4) for Ubuntu 14.10 Utopic Unicorn.

Read more
Colin Ian King

Keeping cool with thermald

The push for higher performance desktops and laptops has inevitably lead to higher power dissipation.  Laptops have also shrunk in size leading to increasing problems with removing excess heat and thermal overrun on heavily loaded high end machines.

Intel's thermald prevents machines from overheating and has been recently introduced in the Ubuntu Trusty 14.04 LTS release.  Thermald actively monitors thermal sensors and will attempt to keep the hardware cool by modifying a variety of cooling controls:
 

* Active or passive cooling devices as presented in sysfs
* The Running Average Power Limit (RAPL) driver (Sandybridge upwards)
* The Intel P-state CPU frequency driver (Sandybridge upwards)
* The Intel PowerClamp driver

Thermald has been found to be especially useful when using the Intel P-state CPU frequency scaling driver since this can push the CPU harder than other CPU frequency scaling drivers.

Over the past several weeks I've been working with Intel to shake out some final bugs and get thermald included into Ubuntu 14.04 LTS, so kudos to Srinivas Pandruvada for handling my patches and also providing a lot of timely fixes too.

By default, thermald works without any need for configuration, however, if one has incorrect thermal trip settings or other firmware related thermal zone bugs one can write one's own thermald configuration. 

For further details, consult the Ubuntu thermald wiki page.

Read more
Colin Ian King

Simple performance test of rdrand

My new Lenovo X230 laptop is equipped with a Intel(R) i5-3210M CPU (2.5 GHz, with 3.1 GHz Turbo) which supports the new Digital Random Number Generator (DRNG) - a high performance entropy and random number generator.  The DNRG is read using the new Intel rdrand instruction which can return 64, 32 or 16 bit random numbers.

The DRNG is described in detail in this article and provides very useful code examples in assembler and C which I used to write a simple and naive test to see how well the rdrand performs on my i5-3210M.

For my test, I simply read 100 million 64 bit random numbers on a single thread. The Intel literature states one can get up to about 70 million rdrand invocations per second on 8 threads, so my simple test is rather naive as it only exercises rdrand on one thread.  For a set of 10 iterations on my test, I'm getting around 40-45 nanoseconds per rdrand, or about 22-25 million rdrands per second, which is really impressive.   The test is a mix of assembler and C, and is not totally optimal, so I am sure I can squeeze a little more performance out with some extra work.

The next test I suspect is to see just random the data is and to see how well it compares to other software random number generators... but I will tinker with that after my vacation.

Anyhow, for reference, the test can be found here in my git repository.

Read more
Colin Ian King

The hwloc (hardware locality) package contains the useful tool lstopo. To install use:

sudo apt-get install hwloc

By default, lstopo will display a logical view of the system caches and CPU cores, for example:


To get a non-graphical output use:

lstopo -

Machine (1820MB) + Socket #0 + L3 #0 (3072KB)
  L2 #0 (256KB) + L1 #0 (32KB) + Core #0
    PU #0 (phys=0)
    PU #1 (phys=2)
  L2 #1 (256KB) + L1 #1 (32KB) + Core #1
    PU #2 (phys=1)
    PU #3 (phys=3)

lstopo is also able to output the toplogy image in a variety of formats (Xfig, PDF, Postscript, PNG, SVG and XML) by specifying the output filename and extension, e.g.

lstopo topology.pdf

For more information, consult the manual for hwloc and lstopo.


Read more
Colin Ian King

Reading the TSC from userspace

Recently I've been poking around looking at some Time Stamp Counter (TSC) anomalies when coming out of suspend. So what is the TSC? It's a 64 bit high resolution tick counter found on X86 processors (since Pentiums) and can be read using the rdtsc instruction.

It is intended to be a fast method of getting a high resolution timer. However it is known to problematic on multi-core and hyperthreaded CPUs - one needs to be locked to one CPU to get reliable results since the TSC may be different on each CPU. It is also known to reset when coming out of resume which means time can look like it goes backwards in a huge jump.

If the CPU speed is changed then the TSC rate can change too. If you have a more recent Intel CPU where the constant_tsc flag is set (see /proc/cpuinfo) then the TSC will run at a constant rate no matter the CPU speed - but this means that benchmarking with the constant TSC may make programs look like they use more CPU cycles than in reality!

Anyhow, getting the 64 bit TSC value is a simple case of using the rdtsc instruction. I've got some example code to do this here with the necessary inlined assembler magic to handle this correctly for 32 and 64 bit builds.


Read more
Colin Ian King

Atom Z530 identity crisis

Last week I peeked at /proc/cpuinfo on a Atom Z530 netbook and got the following model name information:

model name : Intel(R) Core (TM) CPU Z530 @ 1.60GHz

Is the kernel mistaken? It's not a Core CPU, it's an Atom! In fact the Z530 is mistaken. If one examines page 29 of http://download.intel.com/design/processor/specupdt/319536.pdf you will see errata AAE29:

"AAE29 CPUID Instruction Returns Incorrect Brand String

When a CPUID instruction is executed with EAX = 80000002H, 80000003H and 80000004H on an Intel® Atom(TM) processor, the return value contains the brand string Intel(R) Core(TM)2 CPU when it should have Intel(R) Atom(TM) CPU."

Doh! That's a rather poor mistake in the silicon.

Apparently this affects Intel® Atom(TM) processors Z550, Z540, Z530, Z520, Z515, Z510, and Z500 on 45-nm process technology. It is fixable with a microcode fix, which normally involves getting a BIOS upgrade.

The errata makes interesting reading - especially errata AAE44 and AAE46 - for older kernels I suggest booting with kernel boot option mem=nopentium to work around any bizarre kernel oopses caused by these particular processor bugs. In fact, I recommend this for any Atom processor as these bugs seem to also apply to the Atom Nxxx series to.


Read more
Colin Ian King

Installing Intel Microcode Updates

It is not unknown for personal computers to be shipped with subtle and rarely occurring bugs in the x86 processor. Normally these bugs can be fixed with microcode updates that get loaded by the BIOS at boot time. However, re-flashing a BIOS may be deemed to risky (since it can brick a machine) or perhaps one can no longer get BIOS updates to an older machine.


This is where the Intel microcode updates come in useful. To install these on Ubuntu use:

sudo apt-get install intel-microcode

These may then fix subtle bugs, so it's always worth a try when you see strange processor related issues such inexplicable memory related oopses.

The caveat is that the microcode is loaded late in boot time, so you may not be able to workaround bugs in the early boot phase. For example, when coming out of hibernate you may hit a processor related bug that's fixed with the microcode update - however, the microcode is loaded late into the resume from hibernate phase, so it cannot be fixed this way.


Read more
Colin Ian King

Sensors reporting hot CPU in Maverick

My clunky old Lenovo has and Intel(R) Core(TM) Duo CPU (model 15) which coretemp in Lucid 10.04 assumed TjMax was 85 degrees C but now in Maverick 10.10 believes TjMax is 100 degrees C.

Kernel commit a321cedb12904114e2ba5041a3673ca24deb09c9 attempts to get TjMax from msr 0x1a2. If it fails to read this msr it defaults TjMax to 100 degrees C for CPU models 14, 15, 22 and 26, and one will see the following warning message:

[ 9.650025] coretemp coretemp.0: TjMax is assumed as 100 C!
[ 9.650322] coretemp coretemp.1: TjMax is assumed as 100 C!

For CPU models 23 and 28 (Atoms) TjMax will be 90 or 100 depending if it's a nettop or a netbook. Otherwise the patch will default TjMax to 100 degrees C.

One can check the value of TjMax using:

cat devices/platform/coretemp*/temp1_crit

Coretemp calculates the core temperature of the CPU by subtracting the thermal status from TjMax. Since the default has been increased from 85 to 100 degrees between Lucid and Maverick, the apparent core temperature now reads 15 degrees higher.

Now, if my machine really was running 15 degrees hotter between Lucid and Maverick I would see more power consumption. I checked the power consumption for Lucid and Maverick kernels on my Lenovo in idle and fully loaded CPU states with a power meter and observed that Maverick uses less power, so that's encouraging.

As for the correct value, why did the default change? Well, from what I can understand from several forums that discuss the setting of TjMax is that this is not well documented and not disclosed by Intel, hence the values are rule-of-thumb guesswork.

So, the bottom line is that if your CPU appears to run hot from the core temp readings between Lucid 10.04 and Maverick 10.10 first check to see if TjMax has changed on your hardware.


Read more
Colin Ian King

Hot Laptop

My Lenovo 3000N200 laptop has been playing me up. When I've been fully loading the processor or driving video hard it's been shutting down because of overheating. I suspect periodic SMIs are detecting an overheated CPU and the BIOS just stops the machine to avoid it turning into toast.

Suspecting that the latest 2.6.35 Maverick kernel was the cause I booted with a 2.6.32 Lucid kernel and that didn't help, so it didn't look like an obvious kernel regression.

Well, perhaps it's getting old and cranky - it's nearly 3 years old. Perhaps the thermal paste between the CPU and the heatsink is not working like it should. Since it was most probably a hardware issue I downloaded the service manual and got out the trusty screwdriver and opened it up. Lo and behold 5mm of dust had accumulated over the fan grill which wasn't going to help the poor machine offload all that heat out of the laptop case. I removed the fan, gave it a good clean and removed all the dust from the fan outlet grill.

After reassembly the laptop was good as new. Instead of rebooting at 95+ degrees Celsius the Lenovo now runs happily.

The moral of the story is that I should regularly service the fans on my machines. Cooking the CPU is something I would like to avoid in the future.


Read more
Colin Ian King

There are times when I drive my laptop CPU really hard, for example compressing Gigs of data or running QEMU, and it would be useful to see how hot my processor is actually getting. This is where sensors-applet is useful - it has the ability to show the core temperature of the CPU and HDD if one has the appropriate hardware sensors and drivers installed. However, getting it configured requires a little bit of hand-holding to get it working.

Firstly, install sensors-applet using:

sudo apt-get install sensors-applet

..this will also install the lm-sensors tools.

Next, one needs to probe the H/W to find the appropriate drivers required to be able to sense CPU and HDD temperatures. To do this use:

sudo sensors-detect

This will ask you if you want to probe and scan various I2C, PCI and SMBus adaptors, so answer the probing questions with respect to the hardware you have in your machine. On my machine I answered "YES" to every question, your mileage may vary.

At the end of the probing, sensors-detect will print out some lines that you need to add to /etc/modules. Using sudo, edit /etc/modules and add these lines. Then reboot your machine.

Once you are logged in again, right click on the top Gnome panel and select "Add to Panel.." and scroll down and select the "Hardware Sensors Monitor". Once it's added to the panel, right click on it and select "Preferences". On the Sensors Applet Preferences panel, select the "Sensors" tab and then select the appropriate CPU and HDD devices to monitor.


Once this is done, you will hopefully be able to see your CPU and HDD temperatures rise and fall as you work on your machine:


From the command line one can also get the current sensor data using the sensors command, e.g.:

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +58.0°C (high = +85.0°C, crit = +85.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1: +57.0°C (high = +85.0°C, crit = +85.0°C)

Hopefully I won't see my CPU get to 85 degrees C, but now at least I can keep my eye on how hot it's getting :-)

Read more