Malloc and Linux

If you read discussions on the Internet about memory allocation (and who doesn’t, really), one surprising tidbit that always comes up is that in Linux, malloc never returns null because the kernel does a thing called memory overcommit. This is easy to verify with a simple test application.

#include<stdio.h>
#include<malloc.h>

int main(int argc, char **argv) {
  while(1) {
    char *x = malloc(1);
    if(!x) {
      printf("Malloc returned null.\n");
      return 0;
    }
    *x = 0;
  }
  return 1;
}

This app tries to malloc memory one byte at a time and writes to it. It keeps doing this until either malloc returns null or the process is killed by the OOM killer. When run, the latter happens. Thus we have now proved conclusively that malloc never returns null.

Or have we?

Let’s change the code a bit.

#include<stdio.h>
#include<malloc.h>

int main(int argc, char **argv) {
  long size=1;
  while(1) {
    char *x = malloc(size*1024);
    if(!x) {
      printf("Malloc returned null.\n");
      printf("Tried to alloc: %ldk.\n", size);
      return 0;
    }
    *x = 0;
    free(x);
    size++;
  }
  return 1;
}

In this application we try to allocate a block of ever increasing size. If the allocation is successful, we release the block before trying to allocate a bigger one. This program does receive a null pointer from malloc.

When run on a machine with 16 GB of memory, the program will fail once the allocation grows to roughly 14 GB. I don’t know the exact reason for this, but it may be that the kernel reserves some part of the address space for itself and trying to allocate a chunk bigger than all remaining memory fails.

Summarizing: malloc under Linux can either return null or not and the non-null pointer you get back is either valid or invalid and there is no way to tell which one it is.

Happy coding.

Is malloc slow?

We had a discussion some time ago about the speed of malloc and whether you should malloc/free small objects as needed or reuse them with e.g. memory pools. There were strong opinions on either side which made me search the net for some hard numbers on malloc speed.

It turns out there weren’t any. Search engines only threw up tons of discussions about what the speed of malloc would theoretically be.

So I wrote a test program that allocates chunks of different sizes, holds on to them for a while and frees them at random times in a random order. Then I made it multithreaded to add lock contention in the mix. I used 10 threads.

On quad core laptop with 4GB of memory glibc can do roughly 1.5 million malloc/free pairs per second.

On a crappy ARM board with a single core glibc can do 300 000 malloc/free pairs per second.

What does this mean in practice? If you are coding any kind of non-super-high-performance app, the only reasons you would care about reducing mallocs are:

  • you do hundreds of malloc/free calls per second for long periods at a time (can lead to severe memory fragmentation)
  • you have absolute latency requirements in the sub-millisecond range (very rare)
  • your app is used actively for hours on end (e.g. Firefox)
  • you know, through measurement, that the mallocs you are removing constitute a notable part of all memory allocations

But, as Knuth says, over 97% percent of the time malloc is so fast that you don’t have to care.

No, really! You don’t!

Update: The total memory pool size I had was relatively small to reflect the fact that the working set is usually small. I re-ran the test with 10x and 100x pool size. The first was almost identical to the original test, the latter about 10 times slower. That is still ~175 000 allocations per second, which should be plenty fast. I have also uploaded the code here for your enjoyment.