~pperalta

Thoughts on software development and other stuff

Coherence 3.5: Service Guardian (Deadlock Detection)

without comments

This article is part 2 of a 3 part series on my favorite new features in Coherence 3.5.

One of the great benefits of using a modern JVM is deadlock detection. At my previous job I remember helping to track down an intermittent issue with our Swing desktop client that was eventually solved by providing instructions to our support/QA staff on how to generate a thread dump when the issue surfaced (which is much harder on Windows than on Unix/Linux based OSes.) Once they sent us the thread dump (which so conveniently printed the threads that were deadlocked at the bottom), fixing the issue was trivial.

Deadlocks can and do happen in distributed systems, and unfortunately there isn’t a good mechanism to detect distributed deadlocks. However, Oracle Coherence 3.5 does bring us closer with a new feature we call the Service Guardian. The concept behind the guardian is to ensure that each of the threads under our control are responsive; and when they’re not then the cluster node should take action. Out of the box you can configure it to remove the node from the cluster (default) or shut down the JVM. You can also provide an implementation of ServiceFailurePolicy to provide custom handling of detected deadlocks.

Deadlocks can have especially bad consequences in a distributed system since there are inherent dependencies between nodes. In my experience, I’ve seen deadlocks in clusters due to one of three reasons:

Bugs in customer code

Concurrent programming is difficult enough; mix it in with distributed computing and you can get into some sticky situations. Several times in the past I’ve seen deadlocks occur within event handling code. Here’s one way that event handlers can deadlock:

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/**
 * @author pperalta Jul 20, 2009
 */
public class GuardianDemo
        implements MapListener
    {
    public static void main(String[] args)
        {
        NamedCache cache = CacheFactory.getCache("test");
        cache.addMapListener(new GuardianDemo());
        while (true)
            {
            int nKey = RANDOM.nextInt(10);
            try
                {
                cache.lock(nKey, -1);
                List listValue = (List) cache.get(nKey);
                if (listValue == null)
                    {
                    listValue = new ArrayList();
                    }
                listValue.add(System.currentTimeMillis());
                cache.put(nKey, listValue);
                }
            finally
                {
                cache.unlock(nKey);
                }
            }
        }
 
    public void entryInserted(MapEvent evt)
        {
        }
 
    public void entryUpdated(MapEvent evt)
        {
        NamedCache cache = (NamedCache) evt.getSource();
        Object     nKey  = evt.getKey();
 
        try
            {
            cache.lock(nKey, -1);
            List listValue = (List) cache.get(nKey);
            if (listValue.size() > 0)
                {
                Object lValue = listValue.remove(0);
                cache.put(nKey, listValue);
                System.out.println("Removed " + lValue + " from " + nKey);
                }
            }
        finally
            {
            cache.unlock(nKey);
            }
        }
 
    public void entryDeleted(MapEvent evt)
        {
        }
 
    private static Random RANDOM = new Random(System.currentTimeMillis());
    }

When registering a map listener with Coherence, a background thread will be spawned to handle events. Upon receiving an event, Coherence will queue it up for the event handler (the customer provided implementation of MapListener) to process. If we notice that events are being handled slower than they are being generated, then we will attempt to throttle the creation of new events so as to not allow the event queue to grow unbounded (and eventually exhaust the heap.)

A bit of a digression: the event throttling is not a new feature of Coherence; it has been around since at least 3.2.

When I ran this code with Coherence 3.4, it ran for a while but eventually stopped:

Oracle Coherence Version 3.4.2/411p7
 Grid Edition: Development mode
Copyright (c) 2000-2009 Oracle. All rights reserved.
...
Removed 1248134614674 from 9
Removed 1248134614692 from 9
Removed 1248134614697 from 9
Removed 1248134614699 from 9
Removed 1248134614703 from 9
Removed 1248134614706 from 9
Removed 1248134614717 from 9
Removed 1248134614708 from 6
Removed 1248134614713 from 3
Removed 1248134614719 from 6
Removed 1248134614727 from 6
Removed 1248134614723 from 3
Removed 1248134614701 from 5
Removed 1248134614709 from 8
Removed 1248134614732 from 8
Removed 1248134614736 from 3
Removed 1248134614725 from 7
Removed 1248134614729 from 5
Removed 1248134614745 from 3
Removed 1248134614733 from 8
...

When it stopped running, I captured a thread dump:

"DistributedCache:EventDispatcher" daemon prio=5 tid=0x01019e40 nid=0x83e200 in Object.wait() [0xb1113000..0xb1113d90]
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:474)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:31)
	- locked <0x295b8d88> (a com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$LockRequest$Poll)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.lock(DistributedCache.CDB:37)
	at com.tangosol.util.ConverterCollections$ConverterConcurrentMap.lock(ConverterCollections.java:2024)
	at com.tangosol.util.ConverterCollections$ConverterNamedCache.lock(ConverterCollections.java:2539)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.lock(DistributedCache.CDB:1)
	at com.tangosol.coherence.component.util.SafeNamedCache.lock(SafeNamedCache.CDB:1)
	at com.tangosol.examples.guardian.GuardianDemo.entryUpdated(GuardianDemo.java:56)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:164)
	at com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
	at com.tangosol.coherence.component.util.SafeNamedCache.translateMapEvent(SafeNamedCache.CDB:7)
	at com.tangosol.coherence.component.util.SafeNamedCache.entryUpdated(SafeNamedCache.CDB:1)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.dispatch(DistributedCache.CDB:22)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.entryUpdated(DistributedCache.CDB:1)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.coherence.component.util.CacheEvent.run(CacheEvent.CDB:18)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onNotify(Service.CDB:19)
	at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:37)
	at java.lang.Thread.run(Thread.java:613)
 
...
 
"main" prio=5 tid=0x01001480 nid=0xb0801000 waiting on condition [0xb07ff000..0xb0800148]
	at java.lang.Thread.sleep(Native Method)
	at com.tangosol.coherence.component.util.Daemon.sleep(Daemon.CDB:9)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$EventDispatcher.drainOverflow(Grid.CDB:15)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.post(Grid.CDB:17)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.send(Grid.CDB:1)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:12)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.unlock(DistributedCache.CDB:32)
	at com.tangosol.util.ConverterCollections$ConverterConcurrentMap.unlock(ConverterCollections.java:2032)
	at com.tangosol.util.ConverterCollections$ConverterNamedCache.unlock(ConverterCollections.java:2555)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.unlock(DistributedCache.CDB:1)
	at com.tangosol.coherence.component.util.SafeNamedCache.unlock(SafeNamedCache.CDB:1)
	at com.tangosol.examples.guardian.GuardianDemo.main(GuardianDemo.java:40)

We can see that the event dispatcher thread is waiting to acquire a lock for a key. However, the main thread has that key locked and (in a bit of an ironic twist) is attempting to release the lock. However, the throttling mechanism has kicked in, and it won’t allow for any more operations on the cache until the event queue is drained, which will never happen since the queue responsible for draining the event queue is stuck waiting for a lock to be released.

Now, let’s run it with Coherence 3.5:

Oracle Coherence Version 3.5/459
 Grid Edition: Development mode
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
...
Removed 1248136418346 from 2
Removed 1248136418361 from 2
Removed 1248136418363 from 6
Removed 1248136418365 from 3
Removed 1248136418366 from 6
Removed 1248136418369 from 2
Removed 1248136418367 from 3
Removed 1248136418371 from 6
Removed 1248136418376 from 6
Removed 1248136418389 from 2
Removed 1248136418383 from 6
Removed 1248136418384 from 3
...
Removed 1248136419975 from 3
Removed 1248136420113 from 2
Removed 1248136420114 from 7
Removed 1248136420116 from 2
2009-07-20 20:33:40.473/6.683 Oracle Coherence GE 3.5/459 <Warning> (thread=main, member=1): The event queue appears to be stuck.
Removed 1248136420076 from 12009-07-20 20:33:40.475/6.685 Oracle Coherence GE 3.5/459 <Error> (thread=main, member=1): Full Thread Dump
...
Thread[DistributedCache:EventDispatcher,5,Cluster]
        java.lang.Object.wait(Native Method)
        java.lang.Object.wait(Object.java:474)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:31)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.lock(DistributedCache.CDB:37)
        com.tangosol.util.ConverterCollections$ConverterConcurrentMap.lock(ConverterCollections.java:2024)
        com.tangosol.util.ConverterCollections$ConverterNamedCache.lock(ConverterCollections.java:2539)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.lock(DistributedCache.CDB:1)
        com.tangosol.coherence.component.util.SafeNamedCache.lock(SafeNamedCache.CDB:1)
        com.tangosol.examples.guardian.GuardianDemo.entryUpdated(GuardianDemo.java:56)
        com.tangosol.util.MapEvent.dispatch(MapEvent.java:210)
        com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
        com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
        com.tangosol.coherence.component.util.SafeNamedCache.translateMapEvent(SafeNamedCache.CDB:7)
        com.tangosol.coherence.component.util.SafeNamedCache.entryUpdated(SafeNamedCache.CDB:1)
        com.tangosol.util.MapEvent.dispatch(MapEvent.java:210)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.dispatch(DistributedCache.CDB:22)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.entryUpdated(DistributedCache.CDB:1)
        com.tangosol.util.MapEvent.dispatch(MapEvent.java:210)
        com.tangosol.coherence.component.util.CacheEvent.run(CacheEvent.CDB:18)
        com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onNotify(Service.CDB:26)
        com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
        java.lang.Thread.run(Thread.java:613)
...
Thread[main,5,main]
        java.lang.Thread.dumpThreads(Native Method)
        java.lang.Thread.getAllStackTraces(Thread.java:1460)
        sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke(Method.java:585)
        com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
        com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.drainOverflow(Service.CDB:45)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$EventDispatcher.drainOverflow(Grid.CDB:9)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.post(Grid.CDB:17)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.send(Grid.CDB:1)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:12)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.unlock(DistributedCache.CDB:32)
        com.tangosol.util.ConverterCollections$ConverterConcurrentMap.unlock(ConverterCollections.java:2032)
        com.tangosol.util.ConverterCollections$ConverterNamedCache.unlock(ConverterCollections.java:2555)
        com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.unlock(DistributedCache.CDB:1)
        com.tangosol.coherence.component.util.SafeNamedCache.unlock(SafeNamedCache.CDB:1)
        com.tangosol.examples.guardian.GuardianDemo.main(GuardianDemo.java:40)

Here the guardian took the following actions:

  1. Detected the stuck event dispatcher thread: <Warning> (thread=main, member=1): The event queue appears to be stuck.
  2. Printed the stacks of each thread
  3. Restarted the cluster threads in order to keep the node up and running

Much nicer than having a deadlocked node!

Bugs in Coherence code

Despite our best efforts, bugs (including deadlocks) do occasionally appear in Coherence (just like any other product.) In particular, the kind of deadlock that has the worse consequences is a deadlock that involves a Service thread. Everything in Coherence (the clustering logic, replicated caches, distributed caches, remote invocations, statistics, etc) is implemented internally using queues and threads that are responsible for processing messages in queues. These are the service threads, and they are the lifeblood of Coherence. If this type of defect should slip into any future versions of Coherence, the guardian will detect this condition and take corrective action to allow the node (and the cluster) to continue to function.

Bugs in the JVM/Operating System

In the absence of bugs in customer or Coherence code, we do occasionally see bugs in the JVM and/or the operating system that result in locked up service threads. Perhaps the most notorious of these is with early versions of NPTL on Linux. In a nutshell, we saw that threads occasionally missed notifications (in other words, threads that were in Object.wait() would never receive the Object.notify() or Object.notifyAll() that we sent to it.) I’ve also seen older JVMs with buggy implementations of the wait/notify mechanism with the same results.

One of our goals with Coherence is to keep your application up and running at all times, even when components of the system fail (hardware, databases, etc.) This is yet one more tool in our arsenal to bring us closer to that goal.

Written by Patrick Peralta

July 20th, 2009 at 9:49 pm

Coherence 3.5: Improved Partition Distribution Algorithm

without comments

This article is part 1 of a 3 part series on my favorite new features in Coherence 3.5.

A partition is a unit of storage and transfer in a Coherence partitioned/distributed cache. Out of the box we ship with 257 partitions, which means that clusters with 16 or fewer storage enabled nodes do not need to have this setting modified. Larger clusters require more partitions, as this will ensure even distribution of data across the grid. Prior to Coherence 3.5, our guideline for setting the partition count was to take the number of storage enabled nodes, square it, and round up to the next prime number. (See partition-count setting here).

When new nodes are added to a cluster, the senior node (the oldest in the grid) is responsible for coordinating ownership of the partitions. Note that this is not a single point of failure since the departure of a senior node will result in the promotion of the next oldest node which will seamlessly take over the responsibilities of the senior. However, when a large cluster with a large partition count is starting up, the senior node does have a lot of (CPU) work to do regarding the management of partitions, and this becomes noticeable when starting up many nodes simultaneously. For this reason, large clusters are typically started up a few nodes at a time, the exact number depending on the CPU capabilities of the box.

I decided to test this out with a hypothetical cluster consisting of 100 storage nodes. Using the old partitioning guidelines, this would require 10,007 partitions. I started up 4 nodes at the same (since there are four cores on this box) time with this partition count:

Coherence 3.4 with 10,007 partitions

As you can see, we experienced about 95% CPU usage for about 15 seconds, followed by a couple of minutes of 5% usage.

In Coherence 3.5, we improved the distribution algorithm to require fewer partitions for large clusters. The new guidelines indicate that each partition should store no more than 50MB. In my hypothetical example, I’m running 100 nodes, each with a 512MB heap. Taking backups and headroom into account I can store 170MB of primary data on each node. If I take a conservative approach and store about 25MB per partition, that takes me to 7 partitions per node. 7 * 100 rounded up to the next prime is 701.

To summarize, we’re going from 10,001 partitions in 3.4 to 701 partitions in 3.5.

Here is the same CPU chart when starting up 4 nodes in 3.5 with this partition count:

35_all-cpu

This time we experience a very brief momentary spike of 65% CPU usage, after which we are using no CPU. Both charts cover about a four minute period.

This enhancement will drastically reduce the amount of time required to start up large clusters, and it will result in less CPU consumption by the senior member. The larger the cluster, the larger the benefit!

Written by Patrick Peralta

July 20th, 2009 at 4:46 pm

My Favorite New Features in Coherence 3.5

without comments

Coherence 3.5 is the third major release of Coherence since we joined Oracle back in mid 2007. We’re quite excited about this release as it includes many new features. We especially gave Coherence*Web lots of love by adding:

It turns out we’re not the only ones giving Coherence lots of love. Check out this report by Gartner covering Data Grid solutions from The Big Three (Oracle, IBM, & Microsoft.)

Those interested in finding out more can watch Cam’s Live Webcast on July 29th.

The next three posts will cover my favorite new Coherence 3.5 features:

Stay tuned for more!

Written by Patrick Peralta

July 20th, 2009 at 3:50 pm

Migrating to my new MacBook Pro

without comments

I switched to the Mac and OS X soon after joining Tangosol. I’ve been very happy with OS X as it has the best of both worlds: the support of a mainstream OS (for peripherals such as the iPhone) along with the power of the Unix shell, and it looks good to boot. After running on a MacBook Pro for for 2 1/2 years, I’ve been fortunate enough to receive a new 15″ MBP at work. I was happy with the old MBP; the biggest (insurmountable) problem was that I had maxed out on 3GB of ram. Trying to run IDEA, Firefox, Parallels, and a bunch of other apps on 3GB doesn’t work very well.

As is always the case when getting new hardware, I had the daunting task of moving everything over to the new machine and setting it up to my liking. In an attempt to save myself a week of work, I decided to try out restoring my account and apps via Migration Assistant and Time Machine. I was a bit hesitant as I previously did an upgrade to Leopard from Tiger (instead of a fresh install, as I was then also trying to save time) but I figured that I didn’t have much to lose. If it didn’t work, I could just reinstall OS X on the new Mac and start over.

I’ve been running on the new machine for a few days now, and so far things seem to be working well. I did run into a few glitches that I was able to overcome:

  • Reinstalls of Parallels and the Cisco VPN client were required, I suspect this was the case because these apps require kernel extensions of some sort.
  • Java 6 was nowhere to be found which I thought was strange (doesn’t Java 6 ship OOTB with Leopard?) I installed the Java package from the DVD and also did a software update; this included a Java update which included Java 6.
  • The biggest challenge I encountered: when bringing up the screensaver options in the System Preferences app, it would spin up to 100% CPU and require a hard kill from the Activity Monitor. I noticed some messages in the Console about some deprecated API, so my first attempt to fix this issue was to follow the instructions in this thread. This did remove the warnings from the console, but the lockups continued. After reading this thread I decided to try installing iLife Support 9.0.3; this ended up fixing the problem for good.

So far I have to say that the effort has been worth it. Fixing the glitches above took far less time than reinstalling everything, and all of my preferences and settings migrated over with no problem (saved passwords via keychain, OS and network settings, etc.)

Written by Patrick Peralta

July 5th, 2009 at 6:40 pm

Posted in General

Remake of “Pelham 123″

without comments

My wife and I went to see “The Taking of Pelham 123″ today. Since I own the original on DVD I was quite interested to check out this new version, especially after having read this review by Randy Kennedy.

Like most of Mr. Scott’s movies, “Pelham 1 2 3,” which is set to open June 12, is a finely calibrated box-office machine with plenty of fireworks and star power. But for anyone who has spent time past a turnstile, one of its chief attractions will also undoubtedly be how devoted it is to the look and feel of the subway and how well it pulls it off. In early reviews since the trailer was released on the Internet, even the most bloodthirsty critics — New York City subway buffs — have had a few nice things to say.

As most fellow geeks that like to laugh and point out blatant inaccuracies on film and TV, I was interested to see for myself how well they would portray the system, especially given how well it was done in the 1974 edition. I was actually quite pleased with it; it seems that the MTA provided a lot of background to make many parts of the story quite plausible. The thing that piqued my interest was the escape scene where they ended up in the Waldorf-Astoria. It turns out that there indeed was a platform at Grand Central for this hotel. This sort of thing was not all that uncommon back then. However, the station where they filmed the escape scene was labeled “Roosevelt” which is nowhere near Manhattan. I’m thinking that it must have been in the abandoned section of the Roosevelt Ave station in Queens.

Another interesting plot is sending the runaway train to Coney Island instead of South Ferry. As Coney Island is outdoors that would make the filming more interesting; but I’m not sure how a 6 could get switched to BMT tracks to take it to Coney Island. At a minimum it would have to be switched to the 4/5 express tracks to send it over to Brooklyn in the first place. I’m thinking that the train would have derailed long before making it in any event. ;)

The 70′s edition had more interaction between the hijackers and the passengers; there is a lot more character development in that story line. In this edition, it is clearly all about the stars Travolta and Denzel which also works well.

The intro for the 2009 version features the high pitched whine that anyone that’s ever ridden the 6 would be familiar with; I thought that was a nice touch. However, the original has the best opening credits score of any movie! I’ve embedded it below for your listening pleasure.

Written by Patrick Peralta

June 21st, 2009 at 10:06 pm

Posted in New York