Thoughts on software development and other stuff

Archive for the ‘Coherence’ Category

Springtime for Coherence

without comments

As I type this I am 35,000 feet over Alaska (no I can’t see Putin’s backyard from here) en route to QCon Tokyo. Brian was scheduled to talk but was unable to make it so I’m stepping in to present a talk on spanning multiple data grids across disparate data centers. (You can see Brian describing this talk at QCon SF here.) In addition to this presentation, I’m looking forward to sampling the food and culture in Japan. As a big city guy I’ve always wanted to visit Tokyo so I’m quite fortunate to have the chance.

As they say, when it rains it pours. I just came back from the New York SIG where I talked about real world challenges in customer deployments and passed along lessons that we learned in solving these problems. After leaving Tokyo, I will be joining Cameron and Noah in Toronto for the inaugural SIG where I will deliver the same presentation I gave in NY. Cameron will talk about the past and future of Coherence, and Noah will give an update on the latest innovations in the Coherence Incubator project.

Has everyone received their copy of the Coherence book yet? One of my favorite parts of the book is the very beginning where Aleks describes what it takes to build scalable applications. In fact a good portion of the first chapter doesn’t even mention Coherence; it just talks about aspects of scalability that developers and architects of high scale systems should be familiar with. A copy of this book was raffled at the NY SIG, and more copies will be given away at the Bay SIG and Toronto SIG.

Written by Patrick Peralta

April 18th, 2010 at 5:50 am

Posted in Coherence

What’s happening in the world of Coherence?

without comments

It has been a while since I’ve posted, so I figured it would be a good time to give an update on what is happening in Coherence land.

New Coherence Bloggers

We have two new bloggers sharing their experiences with Coherence!

The first is by Oracle JDBC expert Pas Apicella who recently took on Coherence. Upon his introduction to Coherence he immediately proceeded to create a CacheStore example using PL/SQL, followed by an example of using the Oracle JDBC Data Change Notification mechanism to push updates from the database to a cache.

Additionally we have Coherence architect Andy Nguyen debuting with a detailed description of a sophisticated distributed bulk loading technique he’s employed on several customer projects.

Coherence Book in March

After many months of blood, sweat, and blisters from too much typing, Aleksandar Seovic has completed the highly anticipated Coherence book published by Packt. Having worked closely with Aleks on reviews and contributions, I believe this book will be a terrific resource for developers and architects that need to write scalable applications. Both experienced users of Coherence and new users will find relevant and useful content. Aleks was recently interviewed by Cameron Purdy about the book which can be downloaded as an MP3.

User Group Meetings

The UK SIG in London is the last Coherence user group meeting for the winter, it is coming up on February 26th. The spring events are currently being planned; stay tuned for details!

Also coming up on February 24th is the first Boston SUG meeting of the year. Although the topic won’t be Coherence this time, it will be of interest for developers and architects interested in scalable systems. We’ll be meeting up for drinks and snacks at Bertucci’s afterwards. And I’ll be there if anyone wants to chat about Coherence or any other topic!

Written by Patrick Peralta

February 19th, 2010 at 2:28 pm

Next NY Coherence SIG on October 1st

with one comment

The next NY Coherence SIG is on October 1st (two weeks from today) and it promises to be a great event. For those of you who follow my blog, I previously introduced these two gentlemen a few months ago when they started blogging.

The first talk will be by Aleksandar Seovic who has an extensive Coherence resume. In addition to implementing POF in .NET, he is also the author of an upcoming book on Coherence. In his spare time (insert tongue in cheek) he runs S4HC, a consulting company specializing in Spring, Coherence, and other technologies.

If you’re in Tampa, you can also check out his upcoming talk at Tampa JUG on September 29th.

We will also have Mark Falco, one of our rock star engineers who concentrates on our network protocol (TCMP), C++, and other areas. Mark is usually the first (and last) person I reach out to when I have questions about Coherence networking – so if you have any questions of your own be sure to bring them. He will talk to us about TCMP and how to optimize your machines and network for optimum performance.

Finally, yours truly will talk about configuring Coherence to work with an external data source (usually relational databases.) I’ll describe in detail how each of the external connectivity features work (including many features you’ve probably never heard of), best practices, and good old fashioned war stories. (Shout out to Rob for helping with the Omni Graffle diagrams!) This is the same talk that I will present in mid October at Oracle Open World in San Francisco. I’ll provide more detail on this later; for now you can check out the Application Grid lineup – which includes WebLogic Server, Coherence, JRockit, Tuxedo and Enterprise Manager.

Written by Patrick Peralta

September 17th, 2009 at 8:35 am

Coherence 3.5: POF Extractor/Updater

with 5 comments

This article is part 3 of a 3 part series on my favorite new features in Coherence 3.5.

Ever since Coherence added support for .NET (and more recently C++, which can be implied when discussing .NET below) clients, we’ve always been asked this question:

When do I have to provide both .NET and Java implementations of my classes?

With each new release of Coherence, it becomes less of a requirement to provide .NET and Java implementations of cache objects. Here is a timeline of the evolution of multi language support:

Coherence 3.2/3.3

Support for .NET clients. .NET objects are serialized into a platform neutral serialization format (POF) and sent over TCP to a proxy server. The proxy server deserializes these objects and serializes them into Java format before sending into the grid for storage, thus the requirement for .NET and Java versions of each type.

Coherence 3.4

Support for .NET and C++ clients. Grid is enhanced to allow for POF binaries to be stored natively in the grid, thus removing the deserialization/serialization step previously required in the proxy servers. .NET and Java versions of cached objects are required for:

  • Entry processors
  • Queries
  • Cache Store
  • Key association

For applications with .NET clients that only do puts and gets, there is no need for Java versions of their objects in the grid.

Coherence 3.5

New in 3.5 is the ability for cache servers to extract and update data in POF binaries without deserializing the binary into an object. This is done via PofExtractors and PofUpdaters. A PofExtractor is an implementation of ValueExtractor, which is an interface that defines how to extract data from objects. The most common extractor in use today is ReflectionExtractor, which simply means that the provided method will be invoked on the target object, and the result from that method is returned.

This means that operations that rely on extractors (such as queries and some entry processors) can now be executed on the server side without needing Java classes to represent the data types.

Here is an example. Let’s say you have the following type (I wrote it in Java, but this could also be done in .NET)

public class Person
        implements PortableObject
    public Person()
    public Person(String sFirstName, String sLastName, String sEmail)
        m_sFirstName = sFirstName;
        m_sLastName = sLastName;
        m_sEmail = sEmail;
    // getters and setters omitted.. 
    public void readExternal(PofReader in)
            throws IOException
        m_sFirstName = in.readString(FIRST_NAME);
        m_sLastName  = in.readString(LAST_NAME);
        m_sEmail     = in.readString(EMAIL);
    public void writeExternal(PofWriter out)
            throws IOException
        out.writeString(FIRST_NAME, m_sFirstName);
        out.writeString(LAST_NAME, m_sLastName);
        out.writeString(EMAIL, m_sEmail);
    private String m_sFirstName;
    private String m_sLastName;
    private String m_sEmail;
    public static final int FIRST_NAME = 0;
    public static final int LAST_NAME  = 1;
    public static final int EMAIL      = 2;

Now for some sample code on executing a query:

NamedCache pofCache = CacheFactory.getCache("pof");
// These names are fictitious: any resemblence to real people
// is coincidental!
pofCache.put(1, new Person("Bob", "Smith", "bob.smith@google.com"));
pofCache.put(2, new Person("Jane", "Doe", "jane.doe@yahoo.com"));
pofCache.put(3, new Person("Fred", "James", "fred.james@oracle.com"));
pofCache.put(4, new Person("Amy", "Jones", "amy.jones@oracle.com"));
pofCache.put(5, new Person("Ted", "Black", "ted.black@google.com"));
// Query for oracle.com addresses
Set keys = pofCache.keySet(new LikeFilter(new PofExtractor(Person.EMAIL), 
        "%@oracle.com", '\\', false));
assert keys.size() == 2;
assert keys.contains(3);
assert keys.contains(4);

The cache configuration (note the system-property override in the serializer config; this comes into play later):

<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
            <param-value system-property="pof.config">pof-config.xml</param-value>
        <local-scheme />

And, the POF configuration on the client:

<!DOCTYPE pof-config SYSTEM "pof-config.dtd">

To run this, I set up a cache server without adding any extra classes to the classpath. I only provided the above cache configuration, and I supplied the following to the command line:


Why did I do this? This is because the server side does not need to know about the client’s POF configuration since it does not need to deserialize the objects. Therefore I’m simply supplying the default cache configuration that ships with Coherence.

Given the addition of this new feature, we can modify the list from 3.4 as such:

  • Entry processors
  • Queries
  • Cache Store
  • Key association

To summarize, the introduction of POF extractors and updaters means that .NET clients only need Java implementations of their respective classes when performing CacheStore operations and/or key association.

Written by Patrick Peralta

July 21st, 2009 at 10:59 am

Coherence 3.5: Service Guardian (Deadlock Detection)

without comments

This article is part 2 of a 3 part series on my favorite new features in Coherence 3.5.

One of the great benefits of using a modern JVM is deadlock detection. At my previous job I remember helping to track down an intermittent issue with our Swing desktop client that was eventually solved by providing instructions to our support/QA staff on how to generate a thread dump when the issue surfaced (which is much harder on Windows than on Unix/Linux based OSes.) Once they sent us the thread dump (which so conveniently printed the threads that were deadlocked at the bottom), fixing the issue was trivial.

Deadlocks can and do happen in distributed systems, and unfortunately there isn’t a good mechanism to detect distributed deadlocks. However, Oracle Coherence 3.5 does bring us closer with a new feature we call the Service Guardian. The concept behind the guardian is to ensure that each of the threads under our control are responsive; and when they’re not then the cluster node should take action. Out of the box you can configure it to remove the node from the cluster (default) or shut down the JVM. You can also provide an implementation of ServiceFailurePolicy to provide custom handling of detected deadlocks.

Deadlocks can have especially bad consequences in a distributed system since there are inherent dependencies between nodes. In my experience, I’ve seen deadlocks in clusters due to one of three reasons:

Bugs in customer code

Concurrent programming is difficult enough; mix it in with distributed computing and you can get into some sticky situations. Several times in the past I’ve seen deadlocks occur within event handling code. Here’s one way that event handlers can deadlock:

 * @author pperalta Jul 20, 2009
public class GuardianDemo
        implements MapListener
    public static void main(String[] args)
        NamedCache cache = CacheFactory.getCache("test");
        cache.addMapListener(new GuardianDemo());
        while (true)
            int nKey = RANDOM.nextInt(10);
                cache.lock(nKey, -1);
                List listValue = (List) cache.get(nKey);
                if (listValue == null)
                    listValue = new ArrayList();
                cache.put(nKey, listValue);
    public void entryInserted(MapEvent evt)
    public void entryUpdated(MapEvent evt)
        NamedCache cache = (NamedCache) evt.getSource();
        Object     nKey  = evt.getKey();
            cache.lock(nKey, -1);
            List listValue = (List) cache.get(nKey);
            if (listValue.size() > 0)
                Object lValue = listValue.remove(0);
                cache.put(nKey, listValue);
                System.out.println("Removed " + lValue + " from " + nKey);
    public void entryDeleted(MapEvent evt)
    private static Random RANDOM = new Random(System.currentTimeMillis());

When registering a map listener with Coherence, a background thread will be spawned to handle events. Upon receiving an event, Coherence will queue it up for the event handler (the customer provided implementation of MapListener) to process. If we notice that events are being handled slower than they are being generated, then we will attempt to throttle the creation of new events so as to not allow the event queue to grow unbounded (and eventually exhaust the heap.)

A bit of a digression: the event throttling is not a new feature of Coherence; it has been around since at least 3.2.

When I ran this code with Coherence 3.4, it ran for a while but eventually stopped:

Oracle Coherence Version 3.4.2/411p7
 Grid Edition: Development mode
Copyright (c) 2000-2009 Oracle. All rights reserved.
Removed 1248134614674 from 9
Removed 1248134614692 from 9
Removed 1248134614697 from 9
Removed 1248134614699 from 9
Removed 1248134614703 from 9
Removed 1248134614706 from 9
Removed 1248134614717 from 9
Removed 1248134614708 from 6
Removed 1248134614713 from 3
Removed 1248134614719 from 6
Removed 1248134614727 from 6
Removed 1248134614723 from 3
Removed 1248134614701 from 5
Removed 1248134614709 from 8
Removed 1248134614732 from 8
Removed 1248134614736 from 3
Removed 1248134614725 from 7
Removed 1248134614729 from 5
Removed 1248134614745 from 3
Removed 1248134614733 from 8

When it stopped running, I captured a thread dump:

"DistributedCache:EventDispatcher" daemon prio=5 tid=0x01019e40 nid=0x83e200 in Object.wait() [0xb1113000..0xb1113d90]
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:474)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:31)
	- locked <0x295b8d88> (a com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$LockRequest$Poll)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.lock(DistributedCache.CDB:37)
	at com.tangosol.util.ConverterCollections$ConverterConcurrentMap.lock(ConverterCollections.java:2024)
	at com.tangosol.util.ConverterCollections$ConverterNamedCache.lock(ConverterCollections.java:2539)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.lock(DistributedCache.CDB:1)
	at com.tangosol.coherence.component.util.SafeNamedCache.lock(SafeNamedCache.CDB:1)
	at com.tangosol.examples.guardian.GuardianDemo.entryUpdated(GuardianDemo.java:56)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:164)
	at com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
	at com.tangosol.coherence.component.util.SafeNamedCache.translateMapEvent(SafeNamedCache.CDB:7)
	at com.tangosol.coherence.component.util.SafeNamedCache.entryUpdated(SafeNamedCache.CDB:1)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.dispatch(DistributedCache.CDB:22)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap$ProxyListener.entryUpdated(DistributedCache.CDB:1)
	at com.tangosol.util.MapEvent.dispatch(MapEvent.java:195)
	at com.tangosol.coherence.component.util.CacheEvent.run(CacheEvent.CDB:18)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onNotify(Service.CDB:19)
	at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:37)
	at java.lang.Thread.run(Thread.java:613)
"main" prio=5 tid=0x01001480 nid=0xb0801000 waiting on condition [0xb07ff000..0xb0800148]
	at java.lang.Thread.sleep(Native Method)
	at com.tangosol.coherence.component.util.Daemon.sleep(Daemon.CDB:9)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$EventDispatcher.drainOverflow(Grid.CDB:15)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.post(Grid.CDB:17)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.send(Grid.CDB:1)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:12)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$BinaryMap.unlock(DistributedCache.CDB:32)
	at com.tangosol.util.ConverterCollections$ConverterConcurrentMap.unlock(ConverterCollections.java:2032)
	at com.tangosol.util.ConverterCollections$ConverterNamedCache.unlock(ConverterCollections.java:2555)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ViewMap.unlock(DistributedCache.CDB:1)
	at com.tangosol.coherence.component.util.SafeNamedCache.unlock(SafeNamedCache.CDB:1)
	at com.tangosol.examples.guardian.GuardianDemo.main(GuardianDemo.java:40)

We can see that the event dispatcher thread is waiting to acquire a lock for a key. However, the main thread has that key locked and (in a bit of an ironic twist) is attempting to release the lock. However, the throttling mechanism has kicked in, and it won’t allow for any more operations on the cache until the event queue is drained, which will never happen since the queue responsible for draining the event queue is stuck waiting for a lock to be released.

Now, let’s run it with Coherence 3.5:

Oracle Coherence Version 3.5/459
 Grid Edition: Development mode
Copyright (c) 2000, 2009, Oracle and/or its affiliates. All rights reserved.
Removed 1248136418346 from 2
Removed 1248136418361 from 2
Removed 1248136418363 from 6
Removed 1248136418365 from 3
Removed 1248136418366 from 6
Removed 1248136418369 from 2
Removed 1248136418367 from 3
Removed 1248136418371 from 6
Removed 1248136418376 from 6
Removed 1248136418389 from 2
Removed 1248136418383 from 6
Removed 1248136418384 from 3
Removed 1248136419975 from 3
Removed 1248136420113 from 2
Removed 1248136420114 from 7
Removed 1248136420116 from 2
2009-07-20 20:33:40.473/6.683 Oracle Coherence GE 3.5/459 <Warning> (thread=main, member=1): The event queue appears to be stuck.
Removed 1248136420076 from 12009-07-20 20:33:40.475/6.685 Oracle Coherence GE 3.5/459 <Error> (thread=main, member=1): Full Thread Dump
        java.lang.Object.wait(Native Method)
        java.lang.Thread.dumpThreads(Native Method)
        sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Here the guardian took the following actions:

  1. Detected the stuck event dispatcher thread: <Warning> (thread=main, member=1): The event queue appears to be stuck.
  2. Printed the stacks of each thread
  3. Restarted the cluster threads in order to keep the node up and running

Much nicer than having a deadlocked node!

Bugs in Coherence code

Despite our best efforts, bugs (including deadlocks) do occasionally appear in Coherence (just like any other product.) In particular, the kind of deadlock that has the worse consequences is a deadlock that involves a Service thread. Everything in Coherence (the clustering logic, replicated caches, distributed caches, remote invocations, statistics, etc) is implemented internally using queues and threads that are responsible for processing messages in queues. These are the service threads, and they are the lifeblood of Coherence. If this type of defect should slip into any future versions of Coherence, the guardian will detect this condition and take corrective action to allow the node (and the cluster) to continue to function.

Bugs in the JVM/Operating System

In the absence of bugs in customer or Coherence code, we do occasionally see bugs in the JVM and/or the operating system that result in locked up service threads. Perhaps the most notorious of these is with early versions of NPTL on Linux. In a nutshell, we saw that threads occasionally missed notifications (in other words, threads that were in Object.wait() would never receive the Object.notify() or Object.notifyAll() that we sent to it.) I’ve also seen older JVMs with buggy implementations of the wait/notify mechanism with the same results.

One of our goals with Coherence is to keep your application up and running at all times, even when components of the system fail (hardware, databases, etc.) This is yet one more tool in our arsenal to bring us closer to that goal.

Written by Patrick Peralta

July 20th, 2009 at 9:49 pm