~pperalta

Thoughts on software development and other stuff

Archive for August, 2009

How to get an OutOfMemoryError without trying

with 2 comments

In the past few months, I’ve seen customers run into mysterious OutOfMemoryErrors that seem to come out of nowhere. For the most part their apps are working fine, then out of the blue the heap blows up, and it is never reproducible. In each case, the culprit turned out to be something like the following:

public static void main(String[] asArgs)
    {
    final int    nCount = 5;
    final int    nRange = 1000;
    final Map    map    = new HashMap();
    final Random random = new Random();
 
    final Runnable r = new Runnable()
        {
        public void run()
            {
            while (true)
                {
                int nKey = random.nextInt(nRange);
                if (random.nextBoolean())
                    {
                    map.put(nKey, System.currentTimeMillis());
                    }
                else
                    {
                    map.remove(nKey);
                    }
                }
            }
        };
 
    Thread[] threads = new Thread[nCount];
 
    System.out.println("Starting " + nCount +
            " threads, range = " + nRange);
 
    for (int i = 0; i < threads.length; i++)
        {
        threads[i] = new Thread(r, "Thread " + i);
        threads[i].start();
        }
    }

See the bug? The problem here is with multiple threads using a java.util.HashMap in the absence of synchronization. One would imagine that at worst this usage would result in inaccurate data in the map. However, this turns out not to be the case.

Running under Java 1.5 in OS X, this runs for a few seconds before it gets stuck in an infinite loop (evidenced by the CPU spiking to 100%):

"Thread 4" prio=5 tid=0x0100c350 nid=0x853000 runnable [0xb0e8e000..0xb0e8ed90]
        at java.util.HashMap.put(HashMap.java:420)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:30)
        at java.lang.Thread.run(Thread.java:613)
 
"Thread 3" prio=5 tid=0x0100bde0 nid=0x852200 runnable [0xb0e0d000..0xb0e0dd90]
        at java.util.HashMap.removeEntryForKey(HashMap.java:614)
        at java.util.HashMap.remove(HashMap.java:584)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:34)
        at java.lang.Thread.run(Thread.java:613)
 
"Thread 2" prio=5 tid=0x0100ba20 nid=0x851200 runnable [0xb0d8c000..0xb0d8cd90]
        at java.util.HashMap.removeEntryForKey(HashMap.java:614)
        at java.util.HashMap.remove(HashMap.java:584)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:34)
        at java.lang.Thread.run(Thread.java:613)
 
"Thread 1" prio=5 tid=0x0100b610 nid=0x850400 runnable [0xb0d0b000..0xb0d0bd90]
        at java.util.HashMap.removeEntryForKey(HashMap.java:614)
        at java.util.HashMap.remove(HashMap.java:584)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:34)
        at java.lang.Thread.run(Thread.java:613)
 
"Thread 0" prio=5 tid=0x0100b430 nid=0x84f600 runnable [0xb0c8a000..0xb0c8ad90]
        at java.util.HashMap.removeEntryForKey(HashMap.java:614)
        at java.util.HashMap.remove(HashMap.java:584)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:34)
        at java.lang.Thread.run(Thread.java:613)

Under 1.6, it runs for about a minute before I get:

java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.resize(HashMap.java:462)
        at java.util.HashMap.addEntry(HashMap.java:755)
        at java.util.HashMap.put(HashMap.java:385)
        at com.tangosol.examples.misc.HashMapTest$1.run(HashMapTest.java:30)
        at java.lang.Thread.run(Thread.java:637)

I configured the VM to generate a heap dump upon OutOfMemoryError. Here are some screenshots from Eclipse MAT:

MAT 1

MAT 2

Both of these behaviors can be explained by race conditions that corrupt the HashMap internal data structures causing infinite loops, the latter case resulting in an OOME. This behavior is described in this Stack Overflow thread, which links to this blog post describing one of the possible race conditions in detail.

The lessons to be learned here are:

  • When using non thread safe data structures, make sure that only one thread will access them at a time, or switch to a thread safe data structure.
  • Configure JVMs in production to generate a heap dump upon an OutOfMemoryError (this has helped us track down various OOMEs for customers), and consider configuring the JVM to shut down if this error is thrown. The Coherence production checklist provides information on how to configure these settings on various JVMs.

Written by Patrick Peralta

August 10th, 2009 at 10:16 am

Posted in Development