Canonical Voices

Posts tagged with 'arm'

abeato

Occasionally I find myself processing input data which arrives as a stream, like data from files or from a socket, but that has a known structure that can be modeled with C types. For instance, let’s say we are receiving from a socket a parcel that consists on a header of one byte, and a payload that is an integer. A naive way to handle this is the following (simplified for readability) code snippet:

int main(void)
{
    int fd;
    char *buff;
    struct sockaddr_in addr;
    int vint;
    char vchar;

    fd = socket(AF_INET, SOCK_STREAM, 0);
    buff = malloc(BUFF_SIZE);
    /* Init socket address */
    ...
    connect(fd, (struct sockaddr *) &addr, sizeof(addr));

    read(fd, buff, BUFF_SIZE);

    vchar = buff[0];
    vint  = *(int *) &buff[1];
    /* Do something with extracted data, free resources */
    ...
    return 0;
}

Here we get the raw data with a read() call, we read the first byte, then we read an integer by taking a pointer to the second read byte and casting it to a pointer to an integer. (for this example we are assuming that the integer inserted in the stream has the same size and endianness as the CPU ones).

There is a big issue with this: the cast to int *, which is undefined behavior according to the C standard 1. And it is because things can go wrong in at least two ways, first due to pointer aliasing rules, second due to type alignment.

Strict pointer aliasing tells the compiler that it can assume that pointers to different types point to different places in memory. This allows some optimizations, like reordering. Therefore, we could be in trouble if, say, we take &buff[1] into a char * and use it to write a byte in that location, as reordering could hit us. So just do not do that. Let’s also hope that we have a compiler that is not completely insane and does not move our reading by int pointer before the read() system call. We could also disable strict aliasing if we are using GCC with option -fno-strict-aliasing, which by the way is something that the Linux kernel does. At any rate, this is a complex subject and I will not dig into it this time.

We will concentrate in this article on how to solve the other problem, that is, how to access safely types that are not stored in memory in their natural alignment.

The C Standard-Compliant Solution

Before moving further, keep in mind that it is always possible to be strictly compliant with the standard and access safely memory without breaking language rules or using compiler or machine specific tricks. In the example, we could retrieve vint by doing

    vint  =   buff[1] + (buff[2] << 8)
            + (buff[3] << 16) + (buff[4] << 24);

(supposing stored data is little endian).

The issue here is performance: we are implicitly transforming four bytes to integers, then we have to bit-shift three of them, and finally we have to add them up. Note however that this is what we want if data and CPU have different endianness.

Doing Unaligned Memory Accesses

In all machine architectures there is a natural alignment for the different data types. This alignment is usually the size of the types, for instance in 32 bits architectures the alignment for integers is 4, for doubles it is 8, etc. If instances of these types are not stored in memory positions that are multiple of their alignment, we are talking about unaligned access. If we try to access unaligned data either of these can happen:

  • The hardware let’s us access it – but always at a performance penalty.
  • An exception is triggered by the CPU. This type of exception is called bus error 2.

We might be willing to accept the performance penalty 3, which is mitigated by CPU caches and not that noticeable in certain architectures like x86-64 , but we certainly do not want our program to crash. How possible is this? To be honest it is not something I have seen that often. Therefore, as a first analysis step, I checked how easy it was to get bus errors. To do so, I created the following C++ program, access1.cpp (I could not resist to use templates here to reduce the code size):

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
void print_unaligned(char *ptr)
{
    T *val = reinterpret_cast<T *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << val << " *val: " << *val << endl;
}

int main(void)
{
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem);
    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;
}

The program allocates memory using new char[], which as malloc() in C is guaranteed to allocate memory with the same alignment as the strictest fundamental type. After zeroing the memory, we access mem and mem + 1 by casting to different pointer types, knowing that the second address is odd, and therefore unaligned except for char * access.

I compiled the file with g++ on my laptop, ran it, and got

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=09d0fb19340a10941eef4c3dc4d6eb29383e717d, not stripped
$ ./access1
Type is "i" with size 4
0x16c3c20 *val: 0
Type is "i" with size 4
0x16c3c21 *val: 0
Type is "x" with size 8
0x16c3c20 *val: 0
Type is "x" with size 8
0x16c3c21 *val: 0
Type is "e" with size 16
0x16c3c20 *val: 0
Type is "e" with size 16
0x16c3c21 *val: 0

No error for x86-64. This was expected as Intel architecture is known to support unaligned access by hardware, at a performance penalty (which is apparently quite small these days, see 4).

The second try was with an ARM CPU, compiling for arm-32:

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=8c3c3e7d77fddd5f95d18dbffe37d67edc716a1c, not stripped
$ ./access1
Type is "i" with size 4
0x47b008 *val: 0
Type is "i" with size 4
0x47b009 *val: 0
Type is "x" with size 8
0x47b008 *val: 0
Type is "x" with size 8
Bus error (core dumped)

Now we get what we were searching for, a legitimate bus error, in this case when accessing a long long from an unaligned address. Commenting out the offending line and letting the program run further showed the error also when accessing a long double from mem + 1.

Fixing Unaligned Memory Accesses

After proving that this could be a real problem, at least for some architectures, I tried to find a solution that would let me do unaligned memory accesses in the most generic way. I could not find anything safe that was strictly following the C standard. However, all C/C++ compilers have ways to define packed structures, and that came to the rescue.

Packed structures are intended to minimize the padding that is introduced by alignment needed by the structure members. They are used when minimizing storage is a big concern. But what is interesting for us is that its members can be unaligned due to the packing, so dereferencing them must take that into account. Therefore, if we are accessing a type in a CPU that does not support unaligned access for that type the compiler must synthesize code that handles this transparently from the point of view of the C program.

To test that this worked as expected, I wrote access2.cpp, which uses GCC attribute __packed__ to define a packed structure:

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
struct __attribute__((__packed__)) struct_safe
{
    T val;
};

template <typename T>
void print_unaligned(char *ptr)
{
    struct_safe<T> *safe = reinterpret_cast<struct_safe<T> *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << safe << " safe->val: " << safe->val << endl;
}

int main(void)
{
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem);
    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;
}

In this case, instead of directly casting to the type, I cast to a pointer to the packed struct and access the type through it.

Compiling and running for x86-64 got the expected result: no error, all worked as before. Then I compiled and ran it in an ARM device:

$ g++ access2.cpp -o access2
$ file access2
access2: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=9a1ee8c2fcd97393a4b53fe12563676d9f2327a3, not stripped
$ ./access2
Type is "i" with size 4
0x391008 safe->val: 0
Type is "i" with size 4
0x391009 safe->val: 0
Type is "x" with size 8
0x391008 safe->val: 0
Type is "x" with size 8
0x391009 safe->val: 0
Type is "e" with size 8
0x391008 safe->val: 0
Type is "e" with size 8
0x391009 safe->val: 0

No bus errors anymore! It worked as expected. To gain some understanding of what was happening behind the curtains, I disassembled the generated ARM binaries. For both access1 and access2, the same instruction was being used when I was getting a value after casting to int: LDR, which unsurprisingly loads a 32-bit word into a register. But for the long long, I found that access1 was using LDRD, which loads double words (8 bytes) from memory, while access2 was using two LDR instructions instead.

This all made a lot of sense, as ARM states that LDR supports access to unaligned data, while LDRD does not 5. Indeed the later is faster, but has this restriction. It was also good to check that there was no penalty for using the packed structure for integers: GCC does a good job to discriminate when the CPU really needs to handle differently unaligned accesses.

GCC cast-align Warning

GCC has a warning that can help to identify points in the code when we might be accessing unaligned data, which is activated with -Wcast-align. It is not part of the warnings that are activated by options -Wall or -Wextra, so we will have to add it explicitly if we want it. The warning is only triggered when compiling for architectures that do not support unaligned access for all types, so you will not see it if compiling only for x86.

When triggered, you will see something like

file.c:28:23: warning: cast increases required alignment of target type [-Wcast-align]
   int *my_int_ptr = (int *) &buf[i];
                     ^

Conclusion

The moral of this post is that you need to be very careful when casting pointers to a type different to the original one 6. When you need to do that, think about alignment issues first, and also think on your target architectures. There are programs that we want to run on more than one CPU type and too many times we only test in our reference.

Unfortunately the C standard does not give us a standard way of doing efficient access to unaligned data, but most if not all compilers seem to provide ways to do this. If we are using GCC, __attribute__((__packed__)) can help us when we might be doing unaligned accesses. The ARM compiler has a __packed attribute for pointers 7, and I am sure other compilers provide similar machinery. I also recommend to activate -Wcast-align if using GCC, which makes easier to spot alignment issues.

Finally, a word of caution: in most cases you should not do this type of casts. Some times you can define structures and read directly data onto them, some times you can use unions. Bear in mind always the strict pointer aliasing rules, which can hit back. To summarize, think twice before using the sort of trick showed in the post, and use them only when really needed.

Read more
abeato

So there I was. I did have to use a proprietary library, for which I had no sources and no real hope of support from the creators. I built my program against it, I ran it, and I got a segmentation fault. An exception that seemed to happen inside that insidious library, which was of course stripped of all debugging information. I scratched my head, changed my code, checked traces, tried valgrind, strace, and other debugging tools, but found no obvious error. Finally, I assumed that I had to dig deeper and do some serious debugging of the library’s assembly code with gdb. The rest of the post is dedicated to the steps I followed to find out what was happening inside the wily proprietary library that we will call libProprietary. Prerequisites for this article are some knowledge of gdb and ARM architecture.

Some background on the task I was doing: I am a Canonical employee that works as developer for Ubuntu for Phones. In most, if not all, phones, the BSP code is not 100% open and we have to use proprietary libraries built for Android. Therefore, these libraries use bionic, Android’s libc implementation. As we want to call them inside binaries compiled with glibc, we resort to libhybris, an ingenious library that is able to load and call libraries compiled against bionic while the rest of the process uses glibc. This will turn out to be critical in this debugging. Note also that we are debugging ARM 32-bits binaries here.

The Debugging Session

To start, I made sure I had installed glibc and other libraries symbols and started to debug by using gdb in the usual way:

$ gdb myprogram
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
...
Starting program: myprogram
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xf49de460 (LWP 7101)]
[New Thread 0xf31de460 (LWP 7104)]
[New Thread 0xf39de460 (LWP 7103)]
[New Thread 0xf41de460 (LWP 7102)]
[New Thread 0xf51de460 (LWP 7100)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf49de460 (LWP 7101)]
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0xf520bd06 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info proc mappings
process 7097
Mapped address spaces:

	Start Addr   End Addr       Size     Offset objfile
	   0x10000    0x17000     0x7000        0x0 /usr/bin/myprogram
	...
	0xf41e0000 0xf49df000   0x7ff000        0x0 [stack:7101]
	...
	0xf51f6000 0xf5221000    0x2b000        0x0 /android/system/lib/libProprietary.so
	0xf5221000 0xf5222000     0x1000        0x0 
	0xf5222000 0xf5224000     0x2000    0x2b000 /android/system/lib/libProprietary.so
	0xf5224000 0xf5225000     0x1000    0x2d000 /android/system/lib/libProprietary.so
	...
(gdb)

We can see here that we get the promised crash. I execute a couple of gdb commands after that to see the backtrace and part of the process address space that will be of interest in the following discussion. The backtrace shows that a segment violation happened when the CPU tried to execute instructions in address zero, and we can see by checking the process mappings that the previous frame lives inside the text segment of libProprietary.so. There is no backtrace beyond that point, but that should come as no surprise as there is no DWARF information in libProprietary, and also noting that usage of frame pointer is optimized away quite commonly these days.

After this I tried to get a bit more information on the CPU state when the crash happened:

(gdb) info reg
r0             0x0	0
r1             0x0	0
r2             0x0	0
r3             0x9	9
r4             0x0	0
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x0	0
r11            0x0	0
r12            0xffffffff	4294967295
sp             0xf49dde70	0xf49dde70
lr             0xf520bd07	-182403833
pc             0x0	0x0
cpsr           0x60000010	1610612752
(gdb) disassemble 0xf520bd02,+10
Dump of assembler code from 0xf520bd02 to 0xf520bd0c:
   0xf520bd02:	b	0xf49c9cd6
   0xf520bd06:	movwpl	pc, #18628	; 0x48c4	<UNPREDICTABLE>
   0xf520bd0a:	andlt	r4, r11, r8, lsr #12
End of assembler dump.
(gdb) 

Hmm, we are starting to see weird things here. First, in 0xf520bd02 (which probably has been executed little before the crash) we get an unconditional branch to some point in the thread stack (see mappings in previous figure). Second, the instruction in 0xf520bd06 (which should be executed after returning from the procedure that provokes the crash) would load into the pc (program counter) an address that is not mapped: we saw that the first mapped address is 0x10000 in the previous figure. The movw instruction has also a “pl” suffix that makes the instruction execute only when the operand is positive or zero… which is obviously unnecessary as 0x48c4 is encoded in the instruction.

I resorted to doing objdump -d libProprietary.so to disassemble the library and compare with gdb output. objdump shows, in that part of the file (subtracting the library load address gives us the offset inside the file: 0xf520bd02-0xf51f6000=0x15d02):

   15d02:	f7f3 eade 	blx	92c0 <__android_log_print@plt>;
   15d06:	f8c4 5304 	str.w	r5, [r4, #772]	; 0x304
   15d0a:	4628      	mov	r0, r5
   15d0c:	b00b      	add	sp, #44	; 0x2c
   15d0e:	e8bd 8ff0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}

which is completely different from what gdb shows! What is happening here? Taking a look at addresses for both code chunks, we see that instructions are always 4 bytes in gdb output, while they are 2 or 4 in objdump‘s. Well, you have guessed, don’t you? We are seeing “normal” ARM instructions in gdb, while objdump is decoding THUMB-2 instructions. Certainly objdump seems to be right here as the output is more sensible: we have a call to an executable part of the process space in 0x15d02 (it is resolved to a known function, __android_log_print), and the following instructions seems like a normal function epilogue in ARM: a return value is stored in r0, the sp (stack pointer) is incremented (we are freeing space in the stack), and we restore registers.

If we get back to the register values, we see that cpsr (current program status register [1]) does not have the T bit set, so gdb thinks we are using ARM instructions. We can change this by doing

(gdb) set $cpsr=0x60000030
(gdb) disass 0xf520bd02,+15
Dump of assembler code from 0xf520bd02 to 0xf520bd11:
   0xf520bd02:	blx	0xf51ff2c0
   0xf520bd06:	str.w	r5, [r4, #772]	; 0x304
   0xf520bd0a:	mov	r0, r5
   0xf520bd0c:	add	sp, #44	; 0x2c
   0xf520bd0e:	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc}
End of assembler dump.

Ok, much better now [2]. The thumb bit in cpsr is determined by last bx/blx call: if the address is odd, the procedure to which we are calling contains THUMB instructions, otherwise they are ARM (a good reference for these instructions is [3]). In this case, after an exception the CPU moves to arm mode, and gdb is unable to know which is the right mode when disassembling. We can search for hints on which parts of the code are arm/thumb by looking at the values in registers used by bx/blx, or by looking at the lr (link register): we can see above that the value after the crash was 0xf520bd07, which is odd and indicates that 0xf520bd06 contains a thumb instruction. However, for some reason gdb is not able to take advantage of this information.

Of course this problem does not happen if we have debugging information: in that case we have special symbols that let gdb know if the section where the code is contains thumb instructions or not [4]. As those are not found, gdb uses the cpsr value. Here objdump seems to have better heuristics though.

After solving this issue with instruction decoding, I started to debug __android_log_print to check what was happening there, as it looked like the crash was happening in that call. I spent quite a lot of time there, but found nothing. All looked fine, and I started to despair. Until I inserted a breakpoint in address 0xf520bd06, right after the call to __android_log_print, run the program… and it stopped at that address, no crash happened. I started to execute the program instruction by instruction after that:

(gdb) b *0xf520bd06
(gdb) run
...
Breakpoint 1, 0xf520bd06 in ?? ()
(gdb) si
0xf520bd0a in ?? ()
(gdb) si
0xf520bd0c in ?? ()
(gdb) si
0xf520bd0e in ?? ()
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0x0

Something was apparently wrong with instruction ldmia, which restores registers, including the pc, from the stack. I took a look at the stack in that moment (taking into account that ldmia had already modified the sp after restoring 9 registers == 36 bytes):

(gdb) x/16xw $sp-36
0xf49dde4c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde5c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde6c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde7c:	0x00000000	0x00000000	0x00000000	0x00000000

All zeros! At this point it is clear that this is the real point where the crash is happening, as we are loading 0 into the pc. This looked clearly like a stack corruption issue.

But, before moving forward, why are we getting a wrong backtrace from gdb? Well, gdb is seeing a corrupted stack, so it is not able to unwind it. It would not be able to unwind it even if having full debug information. The only hint it has is the lr. This register contains the return address after execution of a bl/blx instruction [3]. If the called procedure is non-leaf, it is saved in the prologue, and restored in the epilogue, because it gets overwritten when branching to other procedures. In this case, it is restored on the pc and sometimes it is also saved back in the lr, depending on whether we have arm-thumb interworking built in the procedure or not [5]. It is not overwritten if we have a leaf procedure (as there are no procedure calls inside these).

As gdb has no additional information, it uses the lr to build the backtrace, assuming we are in a leaf procedure. However this is not true and the backtrace turns out to be wrong. Nonetheless, this information was not completely useless: lr was pointing to the instruction right after the last bl/blx instruction that was executed, which was not that far away from the real point where the program was crashing. This happened because fortunately __android_log_print has interworking code and restores the lr, otherwise the value of lr could have been from a point much far away from the point where the real crash happens. Believe or not, but it could have been even worse!

Having now a clear idea of where and why the crash was happening, things accelerated. The procedure where the crash happened, as disassembled by objdump, was (I include here only the more relevant parts of the code)

00015b1c <ProprietaryProcedure@@Base>:
   15b1c:	e92d 4ff0 	stmdb	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   15b20:	b08b      	sub	sp, #44	; 0x2c
   15b22:	497c      	ldr	r1, [pc, #496]	; (15d14 <ProprietaryProcedure@@Base+0x1f8>)
   15b24:	2500      	movs	r5, #0
   15b26:	9500      	str	r5, [sp, #0]
   15b28:	4604      	mov	r4, r0
   15b2a:	4479      	add	r1, pc
   15b2c:	462b      	mov	r3, r5
   15b2e:	f8df 81e8 	ldr.w	r8, [pc, #488]	; 15d18 <ProprietaryProcedure@@Base+0x1fc>
   15b32:	462a      	mov	r2, r5
   15b34:	f8df 91e4 	ldr.w	r9, [pc, #484]	; 15d1c <ProprietaryProcedure@@Base+0x200>
   15b38:	ae06      	add	r6, sp, #24
   15b3a:	f8df a1e4 	ldr.w	sl, [pc, #484]	; 15d20 <ProprietaryProcedure@@Base+0x204>
   15b3e:	200f      	movs	r0, #15
   15b40:	f8df b1e0 	ldr.w	fp, [pc, #480]	; 15d24 <ProprietaryProcedure@@Base+0x208>
   15b44:	f7f3 ef76 	blx	9a34 <prctl@plt>
   15b48:	44f8      	add	r8, pc
   15b4a:	4629      	mov	r1, r5
   15b4c:	44f9      	add	r9, pc
   15b4e:	2210      	movs	r2, #16
   15b50:	44fa      	add	sl, pc
   15b52:	4630      	mov	r0, r6
   15b54:	44fb      	add	fp, pc
   15b56:	f7f3 ea40 	blx	8fd8 <memset@plt>
   15b5a:	a807      	add	r0, sp, #28
   15b5c:	f7f3 ef70 	blx	9a40 <sigemptyset@plt>
   15b60:	4b71      	ldr	r3, [pc, #452]	; (15d28 <ProprietaryProcedure@@Base+0x20c>)
   15b62:	462a      	mov	r2, r5
   15b64:	9508      	str	r5, [sp, #32]
   15b66:	4631      	mov	r1, r6
   15b68:	447b      	add	r3, pc
   15b6a:	681b      	ldr	r3, [r3, #0]
   15b6c:	200a      	movs	r0, #10
   15b6e:	9306      	str	r3, [sp, #24]
   15b70:	f7f3 ef6c 	blx	9a4c <sigaction@plt>
   ...
   15d02:	f7f3 eade 	blx	92c0 <__android_log_print@plt>
   15d06:	f8c4 5304 	str.w	r5, [r4, #772]	; 0x304
   15d0a:	4628      	mov	r0, r5
   15d0c:	b00b      	add	sp, #44	; 0x2c
   15d0e:	e8bd 8ff0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}

The addresses where this code is loaded can be easily computed by adding 0xf51f6000 to the file offsets shown in the first column. We see that a few calls to different external functions [6] are performed by ProprietaryProcedure, which is itself an exported symbol.

I restarted the debug session, added a breakpoint at the start of ProprietaryProcedure, right after stmdb saves the state, and checked the stack values:

(gdb) b *0xf520bb20
Breakpoint 1 at 0xf520bb20
(gdb) cont
...
Breakpoint 1, 0xf520bb20 in ?? ()
(gdb) p $sp
$1 = (void *) 0xf49dde4c
(gdb) x/16xw $sp
0xf49dde4c:	0xf49de460	0x0007df00	0x00000000	0xf49dde70
0xf49dde5c:	0xf49de694	0x00000000	0xf77e9000	0x00000000
0xf49dde6c:	0xf75b4491	0x00000000	0xf49de460	0x00000000
0xf49dde7c:	0x00000000	0xfd5b4eba	0xfe9dd4a3	0xf49de460

We can see that the stack contains something, including a return address that looks valid (0xf75b4491). Note also that the procedure must never touch this part of the stack, as it belongs to the caller of ProprietaryProcedure.

Now it is a simply a matter of bisecting the code between the beginning and the end of ProprietaryProcedure to find out where we are clobbering the stack. I will save you of developing here this tedious process. Instead, I will just show, that, in the end, it turned out that the call to sigemptyset() is the culprit [7]:

(gdb) b *0xf520bb5c
Breakpoint 1 at 0xf520bb5c
(gdb) b *0xf520bb60
Breakpoint 2 at 0xf520bb60
(gdb) run
Breakpoint 1, 0xf520bb5c in ?? ()
(gdb) x/16xw 0xf49dde4c
0xf49dde4c:	0xf49de460	0x0007df00	0x00000000	0xf49dde70
0xf49dde5c:	0xf49de694	0x00000000	0xf77e9000	0x00000000
0xf49dde6c:	0xf75b4491	0x00000000	0xf49de460	0x00000000
0xf49dde7c:	0x00000000	0xfd5b4eba	0xfe9dd4a3	0xf49de460
(gdb) cont
Continuing.
Breakpoint 2, 0xf520bb60 in ?? ()
(gdb) x/16xw 0xf49dde4c
0xf49dde4c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde5c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde6c:	0x00000000	0x00000000	0x00000000	0x00000000
0xf49dde7c:	0x00000000	0x00000000	0x00000000	0x00000000

Note here that I am printing the part of the stack not reserved by the function (0xf49dde4c is the value of the sp before execution of the line at offset 0x15b20, see the code).

What is going wrong here? Now, remember that at the beginning of the article I mentioned that we were using libhybris. libProprietary assumes a bionic environment, and the libc functions it calls are from bionic’s libc. However, libhybris has hooks for some bionic functions: for them bionic is not called, instead the hook is invoked. libhybris does this to avoid conflicts between bionic and glibc: for instance having two allocators fighting for process address space is a recipe for disaster, so malloc() and related functions are hooked and the hooks call in the end the glibc implementation. Signals related functions were hooked too, including sigemptyset(), and in this case the hook simply called glibc implementation.

I looked at glibc and bionic implementations, in both cases sigemptyset() is a very simple utility function that clears with memset() a sigset_t variable. All pointed to different definitions of sigset_t depending on the library. Definition turned out to be a bit messy when looking at the code as it depended on build time definitions, so I resorted to gdb to print the type. For a executable compiled for glibc, I saw

(gdb) ptype sigset_t
type = struct {
    unsigned long __val[32];
}

and for one using bionic

(gdb) ptype sigset_t
type = unsigned long

This finally confirms where the bug is, and explains it: we are overwriting the stack because libProprietary reserves in the stack memory for bionic’s sigset_t, while we are using glibc’s sigemptyset(), which uses a different definition for it. As this definition is much bigger, the stack gets overwritten after the call to memset(). And we get the crash later when trying to restore registers when the function returns.

After knowing this, the solution was simple: I removed the libhybris hooks for signal functions, recompiled it, and… all worked just fine, no crashes anymore!

However, this is not the final solution: as signals are shared resources, it makes sense to hook them in libhybris. But, to do it properly, the hooks have to translate types between bionic in glibc, thing that we were not doing (we were simply calling glibc implementation). That, however, is “just work”.

Of course I wondered why the heck a library that is kind of generic needs to mess around with signals, but hey, that is not my fault ;-).

Conclusions

I can say I learned several things while debugging this:

  1. Not having the sources is terrible for debugging (well, I already knew this). Unfortunately not open sourcing the code is still a standard practice in part of the industry.
  2. The most interesting technical bit here is IMHO that we need to be very cautious with the backtrace that debuggers shows after a crash. If you start to see things that do not make sense it is possible that registers or stack have been messed up and the real crash happens elsewhere. Bear in mind that the very first thing to do when a program crashes is to make sure that we know the exact point where that happens.
  3. We have to be careful in ARM when disassembling, because if there is no debug information we could be seeing the wrong instruction set. We can check evenness of addresses used by bx/blx and of the lr to make sure we are in the right mode.
  4. Some times taking a look at assembly code can help us when debugging, even when we have the sources. Note that if I had had the C sources I would have seen the crash happening right when returning from a function, and it might not have been that immediate to find out that the stack was messed up. The assembly clearly pointed to an overwritten stack.
  5. Finally, I personally learned some bits of ARM architecture that I did not know, which was great.

Well, this is it. I hope you enjoyed the (lengthy, I know) article. Thanks for your reading!

[1] http://www.heyrick.co.uk/armwiki/The_Status_register
[2] We can get the same result by executing in gdb set arm fallback-mode thumb, but changing the register seemed more pedagogical here.
[3] http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf
[4] http://reverseengineering.stackexchange.com/questions/6080/how-to-detect-thumb-mode-in-arm-disassembly
[5] http://www.mcternan.me.uk/ArmStackUnwinding/
[6] In fact the calls are to the PLT section, which is inside the library. The PLT calls in turn, by using addresses in the GOT data section, either directly the function or the dynamic loader, as we are doing lazy loading. See https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html, for instance.
[7] I had to use two breakpoints between consecutive instructions because the “ni” gdb command was not working well here.

Read more
Dustin Kirkland


I happen to have a full mirror of the entire Ubuntu Xenial archive here on a local SSD, and I took the opportunity to run a few numbers...
  • 6: This is our 6th Ubuntu LTS
    • 6.06, 8.04, 10.04, 12.04, 14.04, 16.04
  • 7: With Ubuntu 16.04 LTS, we're supporting 7 CPU architectures
    • armhf, arm64, i386, amd64, powerpc, ppc64el, s390x
  • 25,671: Ubuntu 16.04 LTS is comprised of 25,671 source packages
    • main, universe, restricted, multiverse
  • 150,562+: Over 150,562 (and counting!) cloud instances of Xenial have launched to date
    • and we haven't even officially released yet!
  • 216,475: A complete archive of all binary .deb packages in Ubuntu 16.04 LTS consists of 216,475 debs.
    • 24,803 arch independent
    • 27,159 armhf
    • 26,845 arm64
    • 28,730 i386
    • 28,902 amd64
    • 27,061 powerpc
    • 26,837 ppc64el
    • 26,138 s390x
  • 1,426,792,926: A total line count of all source packages in Ubuntu 16.04 LTS using cloc yields 1,426,792,926 total lines of source code
  • 250,478,341,568: A complete archive all debs, all architectures in Ubuntu 16.04 LTS requires 250GB of disk space
Yes, that's 1.4 billion lines of source code comprising the entire Ubuntu 16.04 LTS archive.  What an amazing achievement of open source development!

Perhaps my fellow nerds here might be interested in a breakdown of all 1.4 billion lines across 25K source packages, and throughout 176 different programming languages, as measured by Al Danial's cloc utility.  Interesting data!


You can see the full list here.  What further insight can you glean?

:-Dustin

Read more
Dustin Kirkland


Forget about The Year of the Linux Desktop...This is The Year of the Linux Countertop!

I'm talking about Linux on every form of Internet-connected embedded devices.  The Internet-of-Things is already upon us.  Sensors, smart watches, TVs, thermostats, security cameras, drones, printers, routers, switches, robots -- you name it.  

And with that backdrop, we are thrilled to introduce Snappy Ubuntu for Devices.  Ubuntu is now a possibility, on almost any device, anywhere.  Now that's exciting!

This is the same Snappy Ubuntu, with its atomic, transactional updates that we launched on each major public cloud last month -- extended and updated for 64-bit Intel, AMD and ARM devices.


Now, if you want a detailed, developer's look at building a Snappy Ubuntu image and running it on a BeagleBone, you're in luck!  I shot this little instructional video (using Cheese, GTK-RecordMyDesktop, and OpenShot).  Enjoy!


A transcript of the video follows...


  1. What is Snappy Ubuntu?
    • A few weeks ago, we introduced a new flavor of Ubuntu that we call “Snappy” -- an atomically, transactionally updated Operating System -- and showed how to launch, update, rollback, and install apps in cloud instances of Snappy Ubuntu in Amazon EC2, Microsoft Azure, and Google Compute Engine public clouds.
    • And now we’re showing how that same Snappy Ubuntu experience is the perfect operating system for today’s Cambrian Explosion of smart devices that some people are calling “the Internet of Things”!
    • Snappy Ubuntu Core bundles only the essentials of a modern, appstore powered Linux OS stack and hence leaves room both in size as well as flexibility to build, maintain and monetize very own device solution without having to care about the overhead of inventing and maintaining your own OS and tools from scratch. Snappy Ubuntu Core comes right in time for you to put your very own stake into stake into still unconquered worlds of things
    • We think you’ll love Snappy on your smart devices for many of the same reasons that there are already millions of Ubuntu machine instances in hundreds of public and private clouds, as well as the millions of your own Ubuntu desktops, tablets, and phones!
  2. Unboxing the BeagleBone
    • Our target hardware for this Snappy Ubuntu demo is the BeagleBone Black -- an inexpensive, open platform for hardware and software developers.
    • I paid $55 for the board, and $8 for a USB to TTL Serial Cable
    • The board is about the size of a credit card, has a 1GHz ARM Cortex A8 processor, 512MB RAM, and on board ethernet.
    • While Snappy Ubuntu will run on most any armhf or amd64 hardware (including the Intel NUC), the BeagleBone is perhaps the most developer friendly solution.
  3. The easiest way to get your Snappy Ubuntu running on your Beaglebone
    • The world of Devices has so many opportunities that it won’t be possible to give everyone the perfect vertical stack centrally. Hence Canonical is trying to enable all of you and provide you with the elements that get you started doing your innovation as quickly as possible. Since there will be many devices that won’t need a screen and input devices, we have developed “webdm”. webdm gives you the ability to manage your snappy device and consume apps without any development effort.
    • To installl you simply download our prebuilt WEB .img and dd it to your sd card.
    • After that all you ahve to do is to connect your beaglebone to a DHCP enabled local network and power it on.
    • After 1-2 minutes you go to http://webdm.local:8080 and can get onto installing apps from the snappy appstore without any further effort
    • Of course, we are still in beta and will continue give you more features and a greater experience over time; we will not only make the UI better, but also work on various customization options that allow you to deliver your own app store powered product without investing your development resources in something that already got solved.
  4. Downloading Snappy and writing to an sdcard
    • Now we’re going to build a Snappy Ubuntu image to run on our device.
    • Soon, we’ll publish a library of Snappy Ubuntu images for many popular devices, but for this demo, we’re going to roll our own using the tool, ubuntu-device-flash.
    • ls -halF mysnappy.img
    • sudo dd if=mysnappy.img of=/dev/mmblk0 bs=1M oflag=dsync
  5. Hooking up the BeagleBone
    • Insert the microsd card
    • Network cable
    • USB debug
    • Power/USB
  6. Booting Snappy and command line experience
    • Okay, so we’re ready for our first boot of Snappy!
    • Let’s attach to the USB/serial console using screen
    • Now, I’ll attach the power, and if you watch very carefully, you might get to see some a few boot messages.
    • snappy help
    • ifconfig
    • ssh ubuntu@10.0.0.105
  7. WebDM experience
    • snappy info
    • Shows we have the webdm framework installed
    • point browser to http://10.0.0.105:8080
    • Configuration
    • Store
  8. Conclusion
    • Hey how cool is that!  Snappy Ubuntu running on devices :-)
    • I’ve spent plenty of time and money geeking out over my Nest and Dropcam and Netatmo and WeMo lightswitches, playing with their APIs and hooking them up to If-This-Then-That.
    • But I’m really excited about a world where those types of devices are as accessible to me as my Ubuntu servers and desktops!
    • And from what I’ve shown you here, with THIS, I think we can safely say that that we’ve blown right past the year of the Linux desktop.
    • This is the year of the Linux countertop!

Cheers,
Dustin

Read more
mandel

In the ubuntu download manager we are using the new connection style syntax so that if there are errors in the signal connections we will be notified at compile time. However, in recent versions of udm we have noticed that the udm tests that ensure that the qt signals are emitted correctly have started failing randomly in the build servers.

As it can be seen in the following build logs the compilation does finish with no errors but the tests raise errors at runtime (an assert was added for each of the connect calls in the project):

Some of the errors between the diff archs are the same but this feels like a coincidence. The unity-scope-click package project has had the same issue and has solved it in the following way:

123
124
125
126
127
128
129
130
131
132
 
    // NOTE: using SIGNAL/SLOT macros here because new-style
    // connections are flaky on ARM.
    c = connect(impl->systemDownloadManager.data(), SIGNAL(downloadCreated(Download*)),
                this, SLOT(handleDownloadCreated(Download*)));
 
    if (!c) {
        qDebug() << "failed to connect to systemDownloadManager::downloadCreated";
 
    }

I am not the only one that have encoutered this bug within canonical (check out this bug). Apprently -Bsymbolic breaks PMF (Pointer to Member Function) comparison under ARM as it was reported in linaro. As it is explained in the Linaro mailing list a workaround to this (since the correct way would be to fix the linker) is to build with PIE support. The Qt guys have decided to drop -Bsymbolic* on anything but x86 and x86-64. I hope all this info help others that might find the same problem.

Read more
Federico Lucifredi

Today we’re introducing some new features into Ubuntu’s systems management and monitoring tool, Landscape. Organisations will now be able to use Landscape to manage Hyperscale environments ranging from ARM to x86 low-power designs, adding to Landscape’s existing coverage of Ubuntu in the cloud, data centre server, and desktop environments. There’s an update to the Dedicated Server too, bringing SAAS and Dedicated Server versions in alignment.

Calxeda 'Serial Number 0' in Canonical's lab

Hyperscale is set to address today’s infrastructure challenges by providing compute capacity with less power for lower cost. Canonical is at the forefront of the trend. Ubuntu already powers scale-out workloads on a new wave of low-cost ultradense hardware based on x86 and ARM processors including Calxeda EnergyCore and Intel Atom designs. Ubuntu is also the default OS for HP’s Project Moonshot servers.

Calxeda 01 - Wide

This update includes support for ARM processors and allows organisations to manage thousands of Hyperscale machines as easily as one, making it more cost-effective to run growing networks spanning tens of thousands of devices. The same patch management and compliance features are available for ARM as they are for x86 environments, making Landscape the first systems management tool of a leading Linux vendor to introduce ARM support – and we are doing so on a level of feature parity across architectures.

Calxeda is the leading innovator engaged in bringing ARM chips to servers and partnered with us early on to bring Ubuntu to their new platform. “Landscape system management support for ARM is a huge step forward”, said Larry Wikelius, co-founder and Vice President at Calxeda. “Adding datacenter-class management to the Ubuntu platform for ARM demonstrates Canonical’s commitment to innovation for Hyperscale customers, who are looking to Calxeda to help improve their power efficiency.”

Calxeda 'Serial number 0' in Canonical's Boston lab

“Landscape’s support for the ARM architecture extends to all ARM SoCs supported by Ubuntu, but we adopted the Calxeda EnergyCore systems in our labs as the reference design in light of both their early arrival to market and their maturity”, said Federico Lucifredi, Product Manager for Landscape and Ubuntu Server at Canonical, adding “we are excited to be bringing Landscape to Hyperscale systems on both ARM and x86 Atom architectures.” CIOs and System Administrators considering implementing Hyperscale environments on Ubuntu will now have access to the same enterprise-grade systems management and monitoring capabilities they enjoy in their data centres today with Landscape.

Kurt Keville, HPC Researcher at Massachusetts Institute of Technology (MIT) commented: “MIT’s interest in low power computing designs aims to achieve the execution of production HPC codes at the same level of numerical performance, yet within a smaller power envelope.”  He added: “With Landscape, we can manage our ARM development clusters with the same kind of granularity we are accustomed to on x86 systems. We are able to manage ARM compute clusters without affecting our production network bandwidth in any way”.

Parallella Gen0 prototypes stack

The Parallella Board project aims to make parallel computing ubiquitous through an affordable Open Hardware platform equipped with Open Source tools. Andreas Olofsson, CEO, Adapteva said: “We selected Ubuntu as our default platform because of its popularity with the developer Community and relentless pace of updating, regularly providing our users with the newest builds for any project.”  He added: “ The availability of a management and monitoring platform like Landscape is essential to managing complexity as the scale of Parallella clusters rapidly reaches into the hundreds or even thousands of nodes.”

Parallella 01 - Processes

As we talk to customers building cloud infrastructure or big data computing environments, it’s clear that power consumption and efficient scaling are key drivers to their architectural decisions. When these considerations are coupled with Landscape’s efficiency and scalable management characteristics, we believe enterprises will be able to achieve a significant shift in both scalability and manageability in their data centre through Hyperscale architecture.

Ubuntu is the default OS for HP’s project Moonshot cartridges, ships or is available for download to every Moonshot customer, with direct support from HP backed by Canonical’s worldwide support organization.  The Landscape update today also means that the full bundle of Ubuntu Advantage support and services becomes available to Moonshot customers.

“Canonical continues to lead the way in the Hyperscale OS arena introducing full enterprise-grade support services for Ubuntu on Hyperscale hardware”, remarked Martin Stadtler, Director of Support Services at Canonical.

Landscape’s Dedicated Server edition has also been refreshed in this update. This means that those businesses choosing to keep the service onsite (rather than hosted) will benefit from the same functionality and a series of updates already available to SAAS customers, including the new audit log facility and performance enhancements, while retaining full local control of their management infrastructure.

Read more
Prakash

Samsung announced an 8 core! yes an 8 core ARM processor which may power the Samsung Galaxy S4

Rather than a single eight-core chip, it has two quad-cores inside – one being a quad-core ARM Cortex A15 and the other a quad-core Cortex A7. The Cortex A15 deals with the tough stuff but passes off the easy tasks to the Cortex A7, or they can both be fired up to really show off. This means it’s strong enough to provide all the power you may need, while at the same time being smart enough to conserve energy when it can. If you’re wondering just how much difference the Exynos 5 Octa and other big.LITTLE chips will make when used in a device, ARM’s CEO Warren East said he expects “twice the performance and half the power consumption” compared to today’s best offerings.

Read More.

Read more
Prakash

While ARM is gaining a lot of momentum, the challenge with ARM until now was that every architecture is very different from different vendors and requires a separate kernel and entire OS stack.

With Linux Kernel 3.7, this has changed for the better.

ARM’s problem was that, unlike the x86 architecture, where one Linux kernel could run on almost any PC or server, almost every ARM system required its own customized Linux kernel. Now with 3.7, ARM architectures can use one single vanilla Linux kernel while keeping their special device sauce in device trees.

The end result is that ARM developers will be able to boot and run Linux on their devices and then worry about getting all the extras to work. This will save them, and the Linux kernel developers, a great deal of time and trouble.

Just as good for those ARM architects and programmers who are working on high-end, 64-bit ARM systems, Linux now supports 64-bit ARM processors. 64-bit ARM CPUs won’t ship until in commercial quantities until 2013. When they do arrive though programmers eager to try 64-bit ARM processors on servers will have Linux ready for them.

Read More.

Read more
pitti

I just released Apport 2.7.

The main new feature is supporting foreign architectures in apport-retrace. If apport-retrace works in sandbox mode and works on a crash that was not produced on the same architecture as apport-retrace is running on, it will now build a sandbox for the report’s architecture and invoke gdb with the necessary magic options to produce a proper stack trace (and the other gdb information).
Right now this works for i386, x86_64, and ARMv7, but if someone is interested in making this work for other architectures, please ping me.

This is rolled out to the Launchpad retracers, see for example Bug #1088428. So from now on you can report your armhf crashes to Launchpad and they ought to be processed. Note that I did a mass-cleanup of old armhf crash bugs this morning, as the existing ones were way too old to be retraced.

For those who are running their own retracers for their project: You need to add an armhf specific apt sources list your per-release configuration directory, e. g. Ubuntu 12.04/armhf/sources.list as armhf is on ports.ubuntu.com instead of archive.ubuntu.com. Also, you need to add an armhf crash database to your crashdb.conf and add a cron job for the new architecture. You can see how all this looks like in the configuration files for the Launchpad retracers.

The other improvement concerns package hooks. So far, when a package hook crashed the exception was only printed to stderr, where most people would never see them when using the GTK or KDE frontend. With 2.7 these exceptions are also added to the report itself (HookError_filename), so that they appear in the bug reports.

The release also fixes a couple of bugs, see the release notes for details.

Read more

The Ubuntu Developer Summit was held in Copenhagen last week, to discuss plans for the next six-month cycle of Ubuntu. This was the most productive UDS that I've been to — maybe it was the shorter four-day schedule, or the overlap with Linaro Connect, but it sure felt like a whirlwind of activity.

I thought I'd share some details about some of the sessions that cover areas I'm working on at the moment. In no particular order:

Improving cross-compilation

Blueprint: foundations-r-improve-cross-compilation

This plan is a part of a mutli-cycle effort to improve cross-compilation support in Ubuntu. Progress is generally going well — the consensus from the session was that the components are fairly close to complete, but we still need some work to pull those parts together into something usable.

So, this cycle we'll be working on getting that done. While we have a few bugfixes and infrastructure updates to do, one significant part of this cycle’s work will be to document the “best-practices” for cross builds in Ubuntu, on wiki.ubuntu.com. This process will be heavily based on existing pages on the Linaro wiki. Because most of the support for cross-building is already done, the actual process for cross-building should be fairly straightforward, but needs to be defined somewhere.

I'll post an update when we have a working draft on the Ubuntu wiki, stay tuned for details.

Rapid archive bringup for new hardware

Blueprint: foundations-r-rapid-archive-bringup

I'd really like for there to be a way to get an Ubuntu archive built “from scratch”, to enable custom toolchain/libc/other system components to be built and tested. This is typically useful when bringing up new hardware, or testing rebuilds with new compiler settings. Because we may be dealing with new hardware, doing this bootstrap through cross-compilation is something we'd like too.

Eventually, it would be great to have something as straightforward as the OpenEmbedded or OpenWRT build process to construct a repository with a core set of Ubuntu packages (say, minbase), for previously-unsupported hardware.

The archive bootstrap process isn't done often, and can require a large amount of manual intervention. At present, there's only a couple of folks who know how to get it working. The plan here is to document the bootstrap process in this cycle, so that others can replicate the process, and possibly improve the bits that are a little too janky for general consumption.

ARM64 / ARMv8 / aarch64 support

Blueprint: foundations-r-aarch64

This session is an update for progress on the support for ARMv8 processors in Ubuntu. While no general-purpose hardware exists at the moment, we want to have all the pieces ready for when we start seeing initial implementations. Because we don't have hardware yet, this work has to be done in a cross-build environment; another reason to keep on with the foundations-r-improve-cross-compilation plan!

So far, toolchain progress is going well, with initial cross toolchains available for quantal.

Although kernel support isn’t urgent at the moment, we’ll be building an initial kernel-headers package for aarch64. There's also a plan to get a page listing the aarch64-cross build status of core packages, so we'll know what is blocked for 64-bit ARM enablement.

We’ve also got a bunch of workitems for volunteers to fix cross-build issues as they arise. If you're interested, add a workitem in the blueprint, and keep an eye on it for updates.

Secure boot support in Ubuntu

Blueprint: foundations-r-secure-boot

This session covered progress of secure boot support as at the 12.10 Quantal Quetzal release, items that are planned for 13.04, and backports for 12.04.2.

As for 12.10, we’ve got the significant components of secure boot support into the release — the signed boot chain. The one part that hasn't hit 12.10 yet is the certificate management & update infrastructure, but that is planned to reach 12.10 by way of a not-too-distant-future update.

The foundations team also mentioned that they were starting the 12.04.2 backport right after UDS, which will bring secure boot support to our current “Long Term Support” (LTS) release. Since the LTS release is often preferred Ubuntu preinstall situations, this may be used as a base for hardware enablement on secure boot machines. Combined with the certificate management tools (described at sbkeysync & maintaining uefi key databases), and the requirement for “custom mode” in general-purpose hardware, this will allow for user-defined trust configuration in an LTS release.

As for 13.04, we're planning to update the shim package to a more recent version, which will have Matthew Garrett's work on the Machine Owner Key plan from SuSE.

We're also planning to figure out support for signed kernel modules, for users who wish to verify all kernel-level code. Of course, this will mean some changes to things like DKMS, which run custom module builds outside of the normal Ubuntu packages.

Netboot with secure boot is still in progress, and will require some fixes to GRUB2.

And finally, the sbsigntools codebase could do with some new testcases, particularly for the PE/COFF parsing code. If you're interested in contributing, please contact me at jeremy.kerr@canonical.com.

Read more

Cross-compile-o-naut wookey has recently posted a set of Debian & Ubuntu packages for an aarch64 (ie, 64-bit ARMv8) cross compiler:

The repositories here contain sources and binaries for the arm64 bootstrap in Debian (unstable) and Ubuntu (quantal). There are both toolchain and tools packages for amd64 build machines and arm64 binaries built with them. And corresponding sources.

For the lazy:

[jk@pecola ~]$ wget -O - http://people.debian.org/~wookey/bootstrap/bootstrap-archive.key |
    sudo apt-key add -
[jk@pecola ~]$ sudo apt-add-repository 'deb http://people.debian.org/~wookey/bootstrap/ubunturepo/ quantal-bootstrap main universe'
[jk@pecola ~]$ sudo apt-get update
[jk@pecola ~]$ sudo apt-get install --install-recommends gcc-4.7-aarch64-linux-gnu

... which gives you a cross compiler for aarch64:

[jk@pecola ~]$ cat helloworld.c 
#include <stdio.h>
int main(void) { printf("hello world\n"); return 0; }
[jk@pecola ~]$ aarch64-linux-gnu-gcc-4.7 -Wall -O2 -o helloworld helloworld.c
[jk@pecola ~]$ aarch64-linux-gnu-objdump -f helloworld 
helloworld:     file format elf64-littleaarch64
architecture: aarch64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000400450

Read more

Most of the components of the 64-bit ARM toolchain have been released, so I've put together some details on building a cross compiler for aarch64. At present, this is only binutils & compiler (ie, no libc), so is probably not useful for applications. However, I have a 64-bit ARM kernel building without any trouble.

Update: looking for an easy way to install a cross-compiler on Ubuntu or debian? Check out aarch64 cross compiler packages for Ubuntu & Debian.

pre-built toolchain

If you're simply looking to download a cross compiler, here's one I've built earlier: aarch64-cross.tar.gz (.tar.gz, 85MB). It's built for an x86_64 build machine, using Ubuntu 12.04 LTS, but should work with other distributions too.

The toolchain is configured for paths in /opt/cross/. To install it, do a:

[jk@pecola ~]$ sudo mkdir /opt/cross
[jk@pecola ~]$ sudo chown $USER /opt/cross
[jk@pecola ~]$ tar -C /opt/cross/ -xf aarch64-x86_64-cross.tar.gz

If you'd like to build your own, here's how:

initial setup

We're going to be building in ~/build/aarch64-toolchain/, and installing into /opt/cross/aarch64/. If you'd prefer to use other paths, simply change these in the commands below.

[jk@pecola ~]$ mkdir -p ~/build/arm64-toolchain/
[jk@pecola ~]$ cd ~/build/arm64-toolchain/
[jk@pecola ~]$ prefix=/opt/cross/aarch64/

We'll also need a few packages for the build:

[jk@pecola ~]$ sudo apt-get install bison flex libmpfr-dev libmpc-dev texinfo

binutils

I have a git repository with a recent version of ARM's aarch64 support, plus a few minor updates at git://kernel.ubuntu.com/jk/arm64/binutils.git (or browse the gitweb view). To build:

Update: arm64 support has been merged into upstream binutils, so you can now use the official git repository. The commit 02b16151 builds successfully for me.

[jk@pecola arm64-toolchain]$ git clone git://gcc.gnu.org/git/binutils.git
[jk@pecola arm64-toolchain]$ cd binutils
[jk@pecola binutils]$ ./configure --prefix=$prefix --target=aarch64-none-linux
[jk@pecola binutils]$ make
[jk@pecola binutils]$ make install
[jk@pecola binutils]$ cd ..

kernel headers

Next up, the kernel headers. I'm using Catalin Marinas' kernel tree on kernel.org here. We don't need to do a full build (we don't have a compiler yet..), just the headers_install target.

[jk@pecola arm64-toolchain]$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64.git
[jk@pecola arm64-toolchain]$ cd linux-aarch64
[jk@pecola linux-aarch64]$ git reset --hard b6fe1645
[jk@pecola linux-aarch64]$ make ARCH=arm64 INSTALL_HDR_PATH=$prefix headers_install
[jk@pecola linux-aarch64]$ cd ..

gcc

And now we should have things ready for the compiler build. I have a git tree up at git://kernel.ubuntu.com/jk/arm64/gcc.git (gitweb), but this is just the aarch64 branch of upstream gcc.

[jk@pecola arm64-toolchain]$ git clone git://kernel.ubuntu.com/jk/arm64/gcc.git
[jk@pecola arm64-toolchain]$ cd gcc/aarch64-branch/
[jk@pecola aarch64-branch]$ git reset --hard d6a1e14b
[jk@pecola aarch64-branch]$ ./configure --prefix=$prefix \
    --target=aarch64-none-linux --enable-languages=c \
    --disable-threads --disable-shared --disable-libmudflap \
    --disable-libssp --disable-libgomp --disable-libquadmath
[jk@pecola aarch64-branch]$ make
[jk@pecola aarch64-branch]$ make install
[jk@pecola aarch64-branch]$ cd ../..

That's it! You should have a working compiler for arm64 kernels in /opt/cross/aarch64.

Read more
rsalveti

This week I’m proudly participating at the Ubuntu Developer Summit to help planning and defining what will the Quantal Quetzal (12.10) release be in the next following months.

As usual I’m wearing not only the Linaro hat, but also my Ubuntu and Canonical ones, interested and participating actively at most topics that are related with ARM in general.

And what can I say after the first 3 days at UDS-Q? Well, busy as never before and with great opportunities to help getting Ubuntu to rock even more at ARM, with current devices/platforms and with the exciting new ones that will be coming in the next few months.

Here are a few highlights from the first days:

Monday – May 7th

  • Introduction and Keynote
    • Great start as usual by Mark, showing the great opportunities for both Canonical and Ubuntu, describing the new target and use cases, and also showing how important Cloud is now for Ubuntu. After that we had, finally, the announcement of a real hardware availability from Calxeda, proving that ARM server are indeed real! (which is a quite important accomplishment)
  • Schedule displays all working with our member’s boards
    • This was the first time that all the schedule displays available at UDS were all covered by the ARM boards provided by Linaro. This time we got Pandaboard, Origen and also Snowball constantly showing the schedule through all the day. Low power and powerful devices all around :-)
  • Plans for a minimum filesystem for embedded devices
    • Discussion to cover all the possible embedded related use cases for Ubuntu, and trying to understand the real requirements for a minimum filesystem (rootfs) for those devices. While we didn’t decide to generate the smallest-still-apt/dpkg-compatible rootfs for our users (as ubuntu-core is already covering most of the cases), we’ll provide enough tools and documentation on how to easily generate them. At Linaro side the Ubuntu Nano image should probably reflect such suggestions.
  • Identify impact of the switch to pure live images for ARM platforms
    • Here the focus was basically to review and understand if we would really continue providing pre-installed based images instead of just supporting live based ones. Having the images provided only at the SD cards are very useful to make the bootstrap and install quite easy, but it hurts badly the performance. As we’re now getting ARM boards that are very powerful in many ways, the I/O bound shouldn’t limit what the users would be able to get from them. The decision for Quantal is to drop support for the pre-installed images, and provide live based ones at the SD cards (think like the live-sd image as we have with CD on other archs), where the user would install Ubuntu the same way as done with x86, and using USB/Sata based devices as rootfs by default.
  • OpenStack Deployment on ARM Server
    • The focus of this session was basically to better understand what might be the missing pieces for a proper OpenStack support at ARM. Quite a few open questions still, but the missing pkgs enablement, LXC testing and support and KVM for a few platforms will help making sure the support is at least correctly in place. After initial support, continuous test and validation should happen to make sure the ARM platforms keeps well supported over the time (which will be better stressed and tested once MAAS/Juju is also supported properly at ARM).

Tuesday – May 8th

  • Detail and begin the arm64/aarch64 port in Ubuntu
    • Clearly the most important session of the day for ARM. Great discussion on how to prepare and start the ARMv8 port at Ubuntu and Debian, by starting with cross-build support with multiarch and later support with Fast Models and Qemu. A lot is still to be covered once ARM is able to publish the ARMv8 support for Toolchain and Kernel, and session will be reviewed again at Linaro Connect at the end of this month.
  • Ubuntu Kernel Delta Review
    • Usual review of the patches the Ubuntu Kernel team is maintaining at the Ubuntu Kernel tree. At Linaro this is important as we also enable the Ubuntu specific patch-set at the packages provided by the LEB, for proper kernel and user-space support. Luckily this time it seems the delta is really minimum, which should probably also start to be part of Linux Linaro in the following month.
  • Integrate Linaro hwpacks for ARM with the Ubuntu image build infrastructure
    • Usual discussion about trying to avoid replicated work that is strictly related with each ARM board we support at both Ubuntu and Linaro. Decision is to finally sync with the latest flash-kernel available at Debian and try to get the common project/package with the hardware specific bits in place, so it can be used by linaro-image-tools, flash-kernel and debian-cd.

Wednesday – May 9th

  • MAAS Next Steps
    • Session to review and plan what are the next steps for the MAAS project, which is also missing proper ARM support for now. Great discussions on understanding all the requirements, as they will not necessarily match entirely with the usual ARM devices we have at the moment. Here the goal for ARM is to continue improving the PXE support at U-Boot (even with UEFI chainload later), and understanding what might be missing to also have IPMI support (even if not entirely provided by the hardware).
  • System Compositor
    • Great session covering what might be the improvements and development on the graphics side for next release. Goal is to use a system compositor that would be started right at the beginning at the boot, which will then be controlled and used properly once lightdm is up (with X11). This will improve a lot the user experience on normal x86 based desktops, and luckily on ARM we’re also in a quite nice situation with the work done by Linaro helping getting the proper DRM/KMS support for the boards we support, so I hope ARM will be in a great shape here :-)
  • ARM Server general enhancements (for ARMv7 and perhaps v8)
    • At this session we could cover what seems to be the most recurrent and problematically thing at supporting ARM servers, which is the lack of a single and supported boot method and boot loader. UEFI should be able to help on this front soon, but until then the focus will be to keep checking and making sure the current PXE implementation at u-boot works as expected (chainloading UEFI on u-boot is also another possibility Linaro is investigating). There is also the request for IPMI support, which is still unclear in general how it’ll be done generically speaking.
  • Integration testing for the bootloader
    • As Ubuntu is also moving to the direction of continuous validating and testing all important components available, there’s the need for a proper validation of the bootloader, and the effect at the user experience while booting the system. For ARM it’s also a special case, as U-Boot is still the main bootloader used across the boards. Test case descriptions in place, and discussion will probably continue at Linaro Connect as this is also an area where we also want to help validating/testing.
  • ARM Server Benchmarking and Performance
    • Here the Ubuntu Server Team presented how they are benchmarking and checking performance at the server level at x86, and covering what might still be needed to run and validate the ARM boards the same way. For ARM the plan is to run the same test cases on the available scenarios, and also try to get Linaro involved by making sure this is also part of the continuous validation and testing done with LAVA. Another important topic that will probably be extended at Linaro Connect is finding a way to get the power consumption data when running the test cases/benchmarks, so it can be further optimised later on.
  • Compiz GLES2 Handover
    • Last session of the day, trying to find the missing gaps to finally get the OpenGL ES2.0 support merged at the Compiz and Unity upstream branches used by the entire Ubuntu desktop (across all archs). Following work and actions will basically be to fix the remaining and important plugins after merging the changes, and also getting a few test cases to properly validate the support at Ubuntu. Once all done, it should be merged ASAP.

These are just a few topics which I was able to participate. There are a lot of more exciting work coming on, which can all be found at http://summit.ubuntu.com/uds-q/. Remember that you’re still able to participate in a few of them tomorrow and friday, as remote access is provided for all the sessions we have.

I’m sure a lot of more exciting stuff will be discussed for ARM support until the end of this week, and at Linaro Connect, at the end of the month, we’ll be able to review and get our hands dirty as well :-)

Exciting times for ARM!


Read more
Mark Baker

On Monday, Calxeda, one of the leading innovators bringing revolutionary efficiency to the datacenter, unveiled their new EnergyCore reference server live onstage with Mark Shuttleworth at the Ubuntu Developer Summit (UDS) in Oakland California.

 

Calxeda CTO and Founder Larry Wikelius with Mark Shuttleworth at UDS

The choice of UDS at the venue to unveil the new hardware to the world was flattering and underlines how the innovators in next generation computing are building out a compelling platform together. Ubuntu and Calxeda have been working together for several years to bring Ubuntu on Calxeda to market in the form now being shown at UDS. The collaboration of Canonical and the Ubuntu community with Calxeda has been vital to be able to deliver a solution that can very easily deploy OpenStack based cloud using MAAS and Juju on hardware that is so innovative.

The EnergyCore reference server unveiled at UDS can house up to 48 Quadcore nodes at under 300 Watts with up to 24 SATA drives. In this configuration it is possible to house 1000 server instances in a single rack and other server form factors being developed by OEMs may enable several times this volume. It is precisely this type of power efficient technology that will accelerate the adoption of next generation hyperscale services such as cloud and we are proud to be at the very core of it.

So congratulations to Calxeda on the arrival of the EnergyCore and congratulations to Canonical and the Ubuntu Community for providing the platform that will power it.

Read more
rsalveti

For those following the development of the next Ubuntu release (12.04 – Precise Pangolin), you all know that we’re quite close to the release date already, and to make sure Precise rocks since day 0, we all need to work hard to get most of the bugs sorted out during the next few weeks.

At Linaro, the Linaro Developer Platform team will be organizing an ARM porting Jam this Friday, with the goal of getting all developers interested in fixing and working on bugs and portability issues related with the Ubuntu ARM port (mostly issues with ARMHF at the moment).

The idea of having the Porting Jam at Friday is to have it as a joint effort with Ubuntu’s Fix Friday and Ubuntu Global Jam, so expect quite a few other developers helping improving Ubuntu as well!

It’s quite easy to participate:

Remember that for ARM this release will be a quite huge milestone, as it’ll be the first LTS release supporting ARM, besides delivering support for ARM servers and ARMHF as default, so let’s make sure it rocks!

Looking forward for a great porting Jam!

Happy bug fixing!


Read more
rsalveti

Yesterday Canonical announced the first UI concept for the Ubuntu TV. Together with the announcement, the first code drop was released, so we could read and understand better the technologies used, and how this will behave on an ARM environment, mostly at a Pandaboard (that we already have OpenGL ES 2 and video decode working).

Getting Ubuntu TV to work

If are still using Oneiric, you can just follow the guide presented at https://wiki.ubuntu.com/UbuntuTV/Contributing, where you’ll find all needed steps to try Ubuntu TV at your machine.

As it’s quite close with Unity 2D (similar code base), and also based on Qt, I decided to follow the steps described at wiki page and see if it should work correctly.

First issue we found with Qt, was that it wasn’t rendering at full screen when using with latest PowerVR SGX drivers, so any application you wanted to use with Qt Opengl would just show itself on a small part of the screen. Luckily TI (Nicolas Dechesne and Xavier Boudet) quickly provided me a new release of the driver, fixing this issue (version that should be around later today at the Linaro Overlay), so I could continue my journey :-)

Next problem was that Qt was enabling brokenTexSubImage and brokenFBOReadBack for the SGX drivers based on the old versions available for Beagle, and seems this is not needed anymore with the current version available at Pandaboard (still to be reviewed with TI, so a proper solution can be forwarded to Qt).

Code removed, patch applied and package built (after many hours), and I was finally able to successfully open the Ubuntu TV interface at my Panda :-)

UI Navigation on a Pandaboard, with Qt and OpenGL ES2.0

Running Ubuntu TV is quite simple if you’re already running the Unity 2D interface. All you need to do is to make sure you kill all unity-2d components and that you’re running metacity without composite enabled. Other than that you just run ”unity-2d-shell -opengl” and voilà ;-)

Here’s a video of the current interface running on my Panda:

As you can see from the video, I didn’t actually play any video, and that’s because currently we’re lacking a generic texture handler for OpenGL ES with Gstreamer at Qtmobility (there’s only one available, but specifically for Meego). Once that’s fixed, the video playback should behave similarly as with XBMC (but with less hacks, as it’s a native GST backend).

Next steps, enabling proper video decode

Looking at what would be needed to finally be able to play the videos, and to make it something useful at your Pandaboard, the first thing is that we need to improve Qtmobility to have a more generic (but unfortunately still specific to Omap) way handle texture streaming with Gstreamer and OpenGL ES. Rob Clark added a similar functionality at XBMC, creating support for ”eglImage”, so we just need to port the work and make sure it works properly with Qtmobility.

Once that’s ported, the video should be streamed as a texture at the video surface, making it also work transparently with QML (the way it’s done with Ubuntu TV).

If you know Qt and Gstreamer, and also want to help getting it to work properly on your panda, here follows a few resources:

As soon video decoding is working properly, a new blog post should be around explaining the details and how to reproduce it at your own Panda with Ubuntu LEB :-)

Cheers!


Read more
rsalveti

As described on my previous post about Ubuntu TV support on a Pandaboard, we were still missing proper support for texture streaming on a Pandaboard, to have the video playback also working and fully accelerated.

This weekend Rob Clark managed to create the first version of the TI’s specific eglImage support at Qtmobility, posting the code at his gitorious account, and for the first time we’re fully able to use Ubuntu TV on a ARM device, using a Pandaboard.

Demo video with the Ubuntu TV UI (accelerated with Qt and OpenGL ES 2.0) and with video decode support of 720p and 1080p:

The code support for TI’s eglImage still needs a few clean-ups, but we hope to be able to push the support at Ubuntu in the following weeks (make it good enough to try at least a package patch).

For people wanting to try it out, a few packages are already available at Linaro’s Overlay PPA, and the remaining ones should be available later today (Qt and Qtmobility), so people can easily run it with our images.

Hope you enjoy, and we’ll make sure we’re always working on keeping and improving the current support, so Ubuntu TV also rocks with ARM :-)

Cheers!


Read more
rsalveti

During the end of October and beginning of November we had the last Linaro Connect for the year. This time we also had it together with the Ubuntu Developer Summit, giving us the opportunity to better discuss the roadmap with both Linaro and the Ubuntu team.

From the Developer Platform team perspective, we had a quite nice week, with demos happening at Monday and Friday (showing people what we’ve been working on), and also sharing some great news with the Ubuntu team, now that Mark Shuttleworth announced that Ubuntu will go to Tablets, TVs and Phones (and ARM for sure will be a huge part of that).

Some nice links and videos of what happened during that week (related with our team):
* Sessions related with the Developer Platform Team (Ubuntu)
* Linaro Demo: Ubuntu Unity with OpenGL ES on Pandaboard
* Linaro Developer Platform Tech Lead Ricardo Salveti Interview at Linaro Connect
* Linaro Connect Q4.11 – Ubuntu LEB tutorial
* Linaro Connect Q4.11 – Interview with Marcin Juszkiewicz

Linaro 11.11 Release

Another quite good achievement for us during November was the 11.11 release.

During this release we had a quite a few great highlights, including some that we were planning for quite a while already:
* Ability to cross build Firefox using Multiarch
* OMAP4 SPL USB Booting, enabling USB boot at Pandaboard
* ARM DS-5 support for the 5.8 release
* CI Builds for Linaro GCC both for cross and native
* And a lot of bug fixes

Now it’s time to get ready to develop the blueprints we’re planning for 11.12, to also make December another great and solid month :-) (will do another post about the 11.12 planning later this one).


Read more
Chris Kenyon

HP logo

 

Today HP announced Project Moonshot  - a programme to accelerate the use of low power processors in the data centre.

The three elements of the announcement are the launch of Redstone – a development platform that harnesses low-power processors (both ARM & x86),  the opening of the HP Discovery lab in Houston and the Pathfinder partnership programme.

Canonical is delighted to be involved in all three elements of HP’s Moonshot programme to reduce both power and complexity in data centres.

The HP Redstone platform unveiled in Palo Alto showcases HP’s thinking around highly federated environments and Calxeda’s EnergyCore ARM processors. The Calxeda system on chip (SoC) design is powered by Calxeda’s own ARM based processor and combines mobile phone like power consumption with the attributes required to run a tangible proportion of hyperscale data centre workloads.

HP Redstone Platform

The promise of server grade SoC’s running at less than 5W and achieving per rack density of 2800+ nodes is impressive, but what about the software stacks that are used to run the web and analyse big data – when will they be ready for this new architecture?

Ubuntu Server is increasingly the operating system of choice for web, big data and cloud infrastructure workloads. Films like Avatar are rendered on Ubuntu, Hadoop is run on it and companies like Rackspace and HP are using Ubuntu Server as the foundation of their public cloud offerings.

The good news is that Canonical has been working with ARM and Calexda for several years now and we released the first version of Ubuntu Server ported for ARM Cortex A9 class  processors last month.

The Ubuntu 11.10 release (download) is an functioning port and over the next six months and we will be working hard to benchmark and optimize Ubuntu Server and the workloads that our users prioritize on ARM.  This work, by us and by upstream open source projects is going to be accelerated by today’s announcement and access to hardware in the HP Discovery lab.

As HP stated today, this is beginning of a journey to re-inventing a power efficient and less complex data center. We look forward to working with HP and Calxeda on that journey.

 

Read more
Dustin Kirkland

for Ubuntu Servers

I've long had a personal interest in the energy efficiency of the Ubuntu Server.  This interest has manifested in several ways.  From founding the PowerNap Project to using tiny netbooks and notebooks as servers, I'm just fascinated with the idea of making computing more energy efficient.

It wasn't so long ago, in December 2008 at UDS-Jaunty in Mountain View that I proposed just such a concept, and was nearly ridiculed out of the room.  (Surely no admin in his right mind would want enterprise infrastructure running on ARM processors!!! ...  Um, well, yeah, I do, actually....)  Just a little over two years ago, in July 2009, I longed for the day when Ubuntu ARM Servers might actually be a reality...

My friends, that day is here at last!  Ubuntu ARM Servers are now quite real!


The affable Chris Kenyon first introduced the world to Canonical's efforts in this space with his blog post, Ubuntu Server for ARM Processors a little over a week ago.  El Reg picked up the story quickly, and broke the news in Canonical ARMs Ubuntu for microserver wars.   And ZDNet wasn't far behind, running an article this week, Ubuntu Linux bets on the ARM server.  So the cat is now officially out of the bag -- Ubuntu Servers on ARM are here :-)

Looking for one?  This Geek.com article covers David Mandala's 42-core ARM cluster, based around TI Panda boards.  I also recently came across the ZT Systems R1801e Server, boasting 8 x Dual ARM Cortex A9 processors.  The most exciting part is that this is just the tip of the iceberg.  We've partnered with companies like Calxeda (here in Austin) and others to ensure that the ARM port of the Ubuntu Server will be the most attractive OS option for quite some time.

A huge round of kudos goes to the team of outstanding engineers at Canonical (and elsewhere) doing this work.  I'm sure I'm leaving off a ton of people (feel free to leave comments about who I've missed), but the work that's been most visible to me has been by:


So I'm looking forward to reducing my servers' energy footprint...are you?

:-Dustin

Read more