David Bakin’s programming blog.


A Memory-Efficient Queue for Primitive Types (in Java)

Data structures in Java can take more memory than equivalent structures in C++ or C#, for various reasons, including general per-object overhead, and the dichotomy between primitive types and objects. For many applications that doesn’t really matter, but for some the excess memory usage in Java is critical and can mean the difference between success and failure.

I’m studying combinatorial search techniques now, using (for some reason) Java. At this point I’m using retrograde analysis to compute pattern databases for the 15-puzzle. (Retrograde analysis = searching backward from the goal state.) The easiest algorithm for this uses a breadth-first search of all positions from the goal.

Breadth-first search is done by enqueuing states-next-to-search onto a queue, and processing them one by one off the queue. For a lot of problems the queue size gets prohibitively large and can’t be used, which is why IDA* and other algorithms that go depth-first have been developed.

For building pattern databases for the 15-puzzle the queue can get quite long, but should still be tractable if care is taken. For Java, in particular, you can’t just blindly use the standard collection classes.

The two main Java classes that implement the interface Queue<E> are ArrayDeque, which is based on a Java array, and LinkedList, which is a typical linked list with external links.

Suppose, for the sake of argument, that we’re working with a 32-bit OS, and with queues of 100 million elements.

Well, if you’re talking 100 million distinct objects you’re already in trouble. Each object takes 16 bytes, minimum, and your 100M objects will need at least 1.6G of memory, more heap than you can get (with the Sun Hotspot VM). But maybe you’re talking primitive types, like long. (A 15-puzzle state will fit into a long – a 64-bit word: 16 tile positions of 4 bits each.) 100M longs will fit in 800M bytes of memory, if placed in an array, so your queue could work.

Except for a few things. First, both ArrayDeque and LinkedList use as their representation type the type that they’re instantiated with. Generics in Java can’t be instantiated with primitive types, so you need to use, e.g., Long instead of long. This means boxing all of the longs you want to put in the queue, which means you’re back to separate objects of 16 bytes instead of array slots of 8 bytes. (And of course, things are worse, proportionally, if you want a queue of int or byte. Why would you want a queue of 100M bytes given that there are only 256 unique values for a byte? Answer: If you really want to have a queue of a tuple of long and byte, and you’re going to run it as two queues of primitive objects, rather than one queue of a reference type. Which is the case for the breath-first search in the 15-puzzle.)

In addition to your boxed elements, the LinkedList has a 16-byte object for every element in the queue. So each element in the queue actually takes 32-bytes, and furthermore, is a separate object to manage.

That point is also important: In an experiment I ran with a 1Gb heap space, the test program started thrashing in the garbage collector and made no further progress after only 33,740,000 Longs were allocated and put into an ArrayDeque. (That’s only 540Mb, there should have been plenty of space left.)

Anyway, to make a long story short, I implemented a class, CompactQueue, that works with either primitive types or reference types, and, if using a primitive type, stores the enqueued elements directly into an array without boxing them. The operations of add() and remove() are constant time. (By the way, this isn’t true of ArrayDeque where add() is only amortized constant time because of the need to occasionally reallocate the array holding the elements, if the queue size increases.)

Using this class, and the heap size set to 1000Mb, I was able to put 126M longs, or 1012M bytes, into the queue. And there’s no GC performance problem caused by having 126M separate objects to manage (252M for the LinkedList!).

The class uses two techniques: First, it is parameterized by a array wrapper class that provides a factory for arrays of primitive (or reference) type, and array-like get() and set() operations. And second, it uses many smaller arrays (“blocks”) to store the elements of the queue, instead of one large array (as in ArrayDeque) or an object-per-element (as in LinkedList). This means that it can smoothly expand to take all necessary (available?) memory without hitting a roadblock when the array size doubles (as in ArrayDeque).

Code is provided here. (Note that only the array wrapper classes for byte and long are provided, array wrapper classes for the other primitive types are left as an exercise for the reader.)