Is malloc slow?

We had a discussion some time ago about the speed of malloc and whether you should malloc/free small objects as needed or reuse them with e.g. memory pools. There were strong opinions on either side which made me search the net for some hard numbers on malloc speed.

It turns out there weren’t any. Search engines only threw up tons of discussions about what the speed of malloc would theoretically be.

So I wrote a test program that allocates chunks of different sizes, holds on to them for a while and frees them at random times in a random order. Then I made it multithreaded to add lock contention in the mix. I used 10 threads.

On quad core laptop with 4GB of memory glibc can do roughly 1.5 million malloc/free pairs per second.

On a crappy ARM board with a single core glibc can do 300 000 malloc/free pairs per second.

What does this mean in practice? If you are coding any kind of non-super-high-performance app, the only reasons you would care about reducing mallocs are:

  • you do hundreds of malloc/free calls per second for long periods at a time (can lead to severe memory fragmentation)
  • you have absolute latency requirements in the sub-millisecond range (very rare)
  • your app is used actively for hours on end (e.g. Firefox)
  • you know, through measurement, that the mallocs you are removing constitute a notable part of all memory allocations

But, as Knuth says, over 97% percent of the time malloc is so fast that you don’t have to care.

No, really! You don’t!

Update: The total memory pool size I had was relatively small to reflect the fact that the working set is usually small. I re-ran the test with 10x and 100x pool size. The first was almost identical to the original test, the latter about 10 times slower. That is still ~175 000 allocations per second, which should be plenty fast. I have also uploaded the code here for your enjoyment.

4 thoughts on “Is malloc slow?

  1. Just out of curiosity, could you publish the benchmark program?

  2. Seems like there’s a lot of vector use and .erase()ing elements in the middle of a vector. What percentage of the running time is actually spent in the allocator?

  3. Changing calloc to malloc+memset gives a speed of 172 000 allocs/sec, so it is pretty much the same.

    I did not measure which fraction is taken by the allocator, because that would only make the measured allocation speed faster and the current speed is already fast enough.

Comments are closed.