Archive for the ‘Development’ Category
@ SpringOne
This morning I presented at the Oracle talk alongside Randy Stafford and Michael Chen on integration between Spring and Oracle middleware products. If you missed the session, feel free to stop by the Oracle booth and say hi to us!
Errata
Today I read an article comparing Terracotta to Coherence. IMHO these are different products that serve different needs and have very different approaches to clustering. I normally don’t talk about our competitors since I honestly know very little about them. However, I do know Coherence and I thought it would be appropriate to point out the factual errors in this document.
This analysis is not meant to be a point for point rebuttal or an evaluation of which product is “better” (since every use case is unique and I’m a firm believer of using the right tool for the job.) It is simply highlighting what I perceive to be the most glaring inaccuracies.
Unlike Oracle Coherence, Terracotta maintains a coherent data model at scale. To get scale with Oracle Coherence, you have to switch from synchronous updates (slow, but coherent) to asynchronous updates (faster, not coherent).
If you are evaluating Oracle Coherence right now – ask yourself how much out of sync your data can be?
Most users of Coherence use the partitioned cache which does not perform any asynchronous operations. Updates are always made to the primary node and the backup node synchronously. This means that the cost of putting items in the cache will remain constant, whether across 5 nodes or 50. This also means that the data in the cache is never “out of sync.”
Terracotta works with Spring, Hibernate, Quartz, Compass, Lucene, Guice, Camel, Struts, Joda time and more.
This somehow implies that Coherence cannot be used with these other frameworks. Many Coherence customers do in fact use these frameworks (especially Spring and Hibernate) as part of their stack.
Terracotta doesn’t force you to pick your cache strategy upfront. With Oracle Coherence, you need to carefully choose between a Near Cache, a Far Cache, a Replicated Cache, or Partitioned Cache (there’s more, but they have so many models I can’t keep track).
The caching strategy chosen does not require any code changes, as it is all based on configuration. The statement above seems to imply that once you’ve chosen a topology you’re stuck with it, which is wholly inaccurate. It is a bit odd to me that this could be considered a weakness, as I doubt that any off the shelf product can be all things to all people OOTB without taking into account usage patterns and configuring the product appropriately.
In fact, the partitioned cache is a core strength of Coherence, as it allows for
- increased storage capacity to a grid simply by adding nodes
- automatic partitioning of data, which is a key component to scalability
- the ability to send invocations into the grid directly where the data is stored
Terracotta is the only product that gives you High Availability of your data by not only replicating it across servers for redundancy, but also by providing a simultaneous path to disk.
This means unlike Coherence, you can trust your heap, even in the event of a complete cluster failure. Terracotta can truly replace temporal data in the database, to alleviate the bottleneck that databases introduce. Oracle Coherence needs you to buy a database too. I wonder why.
As a matter of fact, there are many Coherence users that don’t use a database at all. In order to provide redundancy, scalability, and no single points of failure, Coherence does not require any disk writes (to an expensive SAN) whatsoever. However, Coherence does provide a means to asynchronously write to a database or any other storage.
I am happy to see a competitive landscape in this area, as it is an indication of the vitality of this market. However (as idealistic as this sounds) I would prefer to have more accurate and factual data out there and less FUD.
First NY Coherence SIG A Success
Last Thursday was the first NY Coherence SIG. We had an attendance of roughly 70 or so come out to see Cameron, Brian, and Steve Jacobs from Merrill Lynch. The key announcement was the introduction of the Coherence Incubator. This is the home for patterns and advanced examples built on Coherence, including multi-cluster replication which is already being used in production by a few customers.
Coherence query cache technique using refresh-ahead
A few days ago I read an excellent post by Martin Elwin on creating a query cache using Coherence. To summarize, he uses a Coherence Filter (an object that contains query criteria) as a key and the results of the query (in this case, a set of keys for entries that match the criteria) as the value of a cache. When using Coherence (as in any data grid product) key based access is orders of magnitude faster (and of greater importance, far more scalable) than data retrieval with a query, so I think this is a great technique.
Reading through his code, I thought about ways in which Coherence can do even more of the work. This can be accomplished through the use of a CacheLoader. A CacheLoader in Coherence is used to load data for a cache from an external resource (normally a database) in the case of a cache miss. Therefore subsequent requests for that key result in a cache hit.
Many Coherence users of CacheLoaders and CacheStores take advantage of the asynchronous write-behind capabilities to scale writes to a database or a file system. However, one of the most underused (and one of the best, IMHO) features of Coherence is the ability to refresh the cache from the data source asynchronously. For example, if a cache entry is read from the database and the cache is configured to expire after 5 minutes, Coherence can be configured to fetch the data from the database asynchronously before the 5 minute expiry. This is known as the refresh ahead factor. If I configure the factor to .5, this means that any client request for this key that occurs after the entry is 2 minutes and 30 seconds old will (a) return the current value for the key and (b) request that the entry be refreshed in a background thread. If no client requests happen between the 2.5 and 5 minute mark, the item expires as normal and the next request results in a synchronous load.
So if we put these concepts together, we can create a generic CacheLoader that executes queries for any cache, expires the results after a configureable amount of time, and (after the initial request) refreshes the query results ahead of time so that no client thread has to wait for the results. Best of all, this all happens in the grid, so clients don’t have to worry about the mechanics behind it all.
Here is an example: consider an application with a Person object that looks like this:
public class Person implements Serializable { public Person(String sName, String sPhone) { m_sName = sName; m_sPhone = sPhone; } public String getName() { return m_sName; } public String getPhone() { return m_sPhone; } public String toString() { return "Person{" + "'" + m_sName + '\'' + ", phone: " + m_sPhone + '}'; } private String m_sName; private String m_sPhone; } |
The cache that we’ll use to store person:
<cache-mapping> <cache-name>person</cache-name> <scheme-name>person-cache-scheme</scheme-name> </cache-mapping> ... <distributed-scheme> <scheme-name>person-cache-scheme</scheme-name> <service-name>PersonDistributedCache</service-name> <backing-map-scheme> <local-scheme /> </backing-map-scheme> </distributed-scheme> |
Pretty straight forward so far. Objects of this type will be stored in a distributed cache. Now, in our application we will be doing queries often for area code, such as:
Filter filter = new LikeFilter("getPhone", "917%"); |
This will give us all Persons with a New York area code.
Now, the configuration for the query cache:
<cache-mapping> <cache-name>query-*</cache-name> <scheme-name>query-cache-scheme</scheme-name> </cache-mapping> ... <distributed-scheme> <scheme-name>query-cache-scheme</scheme-name> <service-name>QueryCacheDistributedCache</service-name> <backing-map-scheme> <read-write-backing-map-scheme> <internal-cache-scheme> <local-scheme> <expiry-delay>1m</expiry-delay> </local-scheme> </internal-cache-scheme> <cachestore-scheme> <class-scheme> <class-name>com.tangosol.examples.QueryCacheLoader</class-name> <init-params> <init-param> <param-type>string</param-type> <param-value>{cache-name}</param-value> </init-param> </init-params> </class-scheme> </cachestore-scheme> <read-only>true</read-only> <refresh-ahead-factor>.5</refresh-ahead-factor> </read-write-backing-map-scheme> </backing-map-scheme> <backup-count>0</backup-count> <thread-count>10</thread-count> <autostart>true</autostart> </distributed-scheme> |
There’s a bit of configuration, so I’ll explain the various bits:
- We’re specifying a 1 minute delay for the expiry of this cache. This means that the results for this query will never be more than 1 minute old.
- The CacheLoader class name is specified, and we have a parameter defined that will pass along the name of the cache being used with this CacheLoader.
- The refresh-ahead-factor is configured to .5, meaning that any request for a key in the cache that is more than 30 seconds old will result in the refreshing of that cache by re-executing the query.
- The backup count is set to 0. This is transient data so if we lose a storage node in the cluster, we’re OK with losing this data since it is easy to recreate. By setting backup count to 0, this eliminates the overhead of maintaining a backup and reduces the storage requirements in the grid.
- The thread count is set to 10. I just picked an arbitrary number, but in practice this should be set high enough to ensure that there are enough threads at any given time to execute client queries simultaneously. I will point out that the asynchronous load described above happens on a separate thread.
- Also note that the service name is different than the cache storing the items we are querying: this is very important. A CacheLoader/CacheStore cannot make a call to CacheFactory.getCache() for a cache that is running under the same service, as this may lead to a deadlock.
OK, on to the CachLoader implementation!
// import statements omitted public class QueryCacheLoader extends AbstractCacheLoader { public QueryCacheLoader(String sCacheName) { azzert(sCacheName != null && sCacheName.startsWith(QUERY_PREFIX), "Cache name must start with '" + QUERY_PREFIX + "'"); m_sCacheName = sCacheName.substring(QUERY_PREFIX.length()); CacheFactory.log("Starting QueryCacheLoader for cache " + m_sCacheName, CacheFactory.LOG_DEBUG); } public Object load(Object object) { azzert(object instanceof Filter); NamedCache cache = CacheFactory.getCache(m_sCacheName); Filter filter = (Filter)object; CacheFactory.log("Executing filter " + filter + " on " + m_sCacheName, CacheFactory.LOG_DEBUG); Set keySet = cache.keySet(filter); // set that is returned is lazily deserialized only upon iteration; // so force deserialization and place into a hash set return new HashSet(keySet); } private final String m_sCacheName; public static final String QUERY_PREFIX = "query-"; } |
As we can see, the implementation is fairly straight forward. By using the naming convention “query-[cache name]“, the loader can determine which cache the query should be executed on.
From the client’s point of view, running the query is very simple:
NamedCache personQueryCache = CacheFactory.getCache("query-person"); Filter filter = new LikeFilter("getPhone", "917%"); // get set of keys that match the filter Set keySet = (Set) personQueryCache.get(filter); // retrieve values for keys; this may result in a near cache hit Map queryResults = personCache.getAll(keySet); for (Iterator i = queryResults.values().iterator(); i.hasNext();) { Person p = (Person) i.next(); System.out.println(p.toString()); } |
This technique will of course only work with the following preconditions:
- The queries that are executed are consistent in terms of criteria (not ad-hoc)
- It is acceptable to have slightly out of date results. If this is not acceptable, a MapListener could be registered to clear out the query cache when the source cache is updated.
Coherence at Oracle Open World
For anyone that is interested in technical sessions on using Coherence I suggest checking out Oracle OpenWorld later this month. There are more in-depth technical presentations on Coherence at this conference than any other, including JavaOne! In addition to talks by Tangosol founders and Coherence experts, there will be hands on training available for those that really want to get their feet wet. More training for customers == making my job easier.

