A Trip Down Memory Management Lane

June 11, 2007

Recently I was working on some AJAX code in Firefox and ran across an old friend, deleting 10,000 DOM nodes at once takes a long, long time: memory management strikes again!

A long time ago I wrote a commercial C/C++ memory allocator library and testing tool for the MacOS 7-9 (i.e. not OSX) called HeapManager, long since abandoned. I competed with SmartHeap, and eventually released it to open source. At that point a lot of people, especially in the Mac game community, adopted it as it was free, fast, and came complete with a great memory error debugging tool. Some people even ported it to other platforms. In the end it was only suitable for single threaded applications, and that era passed. I don't even have the source any more.

Most people think little of what goes on under the hood of new() or malloc(). You ask for memory and you get it. When you are done you toss it (or in the case of languages like Java or Javascript the runtime tosses it for you at some point). Out of site, out of mind.

Until it bites you in the ass that is.

Memory corruption and crashes (in pointer based languages anyway), poor performance, and the bugaboo of modern life, arbitrary code execution, all can chomp into your application's reputation and execution speed. One of the joys of Java is that the memory management avoids many of these issues, but performance still becomes an issue.

Responding to requests for blocks of memory and returning them for reuse is surprisingly complicated. Applications (both client and server) ask for memory in varying sizes and at unpredictable times, usually in response to the user's interactions. Add in the additional complexity of multiple threads (which I didn't have to support in the single threaded MacOS) and building a general purpose memory allocator (and memory debugging allocator) is more art form than you might think.

Half of the work I did on HeapManager was writing torturous memory testing applications, simulating different kinds of usages under all sorts of scenarios. Lots of tweak, test, tweak, test, etc. I actually wrote the allocator in C++, which was a clever feat itself and did make it a tiny bit easier to work with.

The most surprising thing about memory allocation is that the key isn't so much "allocation", it's deletion. A lot of strong memory allocators have existing in the last 50 years, but many perform miserably when you deleted a lot of objects. I remember working with the malloc() in HP/UX some years back (I'm sure it's different now) and the free() calls were so pathetic, I adapted some HeapManager code to suballocate so I could avoid them completely.

The problem with deletion is that it creates "holes" in the memory, in between allocated blocks that are still in use. The simplest way to track these holes for reuse is to put them on a list, and scan the list to see if you can coalesce them into bigger chunks. If your memory block data structure doesn't support offsets (i.e. can I jump from one block to the next directly from a given block start address) then lists are necessary to find the unused blocks. Scanning lists to find potential coalescable blocks is really really slow. In HeapManager my block headers were a bit larger than the common malloc() implementations, and I could quickly look at a newly deleted block's immediate neighbors, and combine them with very few instructions. Being able to keep the free areas completely combined makes allocation a whole lot easier.

It's not the only trick you can do. Small allocations tend to be very common, so I had a separate set of allocators, one for each blocksize of some multiple (I forget what I picked, like 4,8,12,16 ... 256 bytes). These allocations where provided by an array allocator: a block of memory where each potential chunk was exactly the same size. This made deletion and allocation way fast and used less overhead. Allocations beyond the max size were handled by the normal allocator. Allocations of a huge size were handled directly by the underlying OS (as was allocating my memory pools).

Another trick was to used memory pools, which maintained another set of information about the blocks inside the pool. Finding a space for a newly requested block was faster by keeping the pools in a priority list so the allocation would wind up in the best possible place, eliminating a lot of manual searching.

Today's modern memory allocators are more sophisticated, especially if you can transparently move data out from under the application to rearrange memory. Java (especially in later versions) uses a generational allocator, which starts all objects out in a new pool where the allocation is basically just return the end of open space at the end (assuming it fits). Eventually as memory is used up, objects which are still active are moved en masse (or partially in later versions) to a new "mature" pool, and the original pool is simply started over. Of course there are many variations on this, and especially in Java6, I barely understand all the varied choices you have now.

There are almost as many ways to do this as there are programming languages. For something most people never think about, it is a rocket science of sorts. The allocator has to be designed to not only work with the language is supports, but the type of application that will run on it and how it will use the memory. That's why a lot of languages and application types typically use a custom allocator. Even frameworks (like STL) support custom suballocators.

Of course there is a whole additional topic, debugging memory allocators/tools, whose job is to help you catch memory overruns, underruns, stack and random memory overwriting, hopefully during testing before the user finds them (or a clever hacked exploits them). Thankfully in Java I don't worry about these but not every language has that option.

So back to Firefox. I don't know why it's so slow (maybe it simply calls the OSX allocator under the hood) to delete nodes en masse (this was one of my many memory tests for HeapManager). Safari(2 or 3) on the other hand has little trouble with the 10,000 nodes but oddly enough has trouble allocating new Option objects as fast as Firefox does. Of course using 10,000 DOM elements is pretty stupid anyway but it's interesting to see how the applications react.

I'm glad I don't worry as much about memory management as I used to. My brain needs all the memory it can for more important things.