Archive for September, 2008
Coherence query cache technique using refresh-ahead
A few days ago I read an excellent post by Martin Elwin on creating a query cache using Coherence. To summarize, he uses a Coherence Filter (an object that contains query criteria) as a key and the results of the query (in this case, a set of keys for entries that match the criteria) as the value of a cache. When using Coherence (as in any data grid product) key based access is orders of magnitude faster (and of greater importance, far more scalable) than data retrieval with a query, so I think this is a great technique.
Reading through his code, I thought about ways in which Coherence can do even more of the work. This can be accomplished through the use of a CacheLoader. A CacheLoader in Coherence is used to load data for a cache from an external resource (normally a database) in the case of a cache miss. Therefore subsequent requests for that key result in a cache hit.
Many Coherence users of CacheLoaders and CacheStores take advantage of the asynchronous write-behind capabilities to scale writes to a database or a file system. However, one of the most underused (and one of the best, IMHO) features of Coherence is the ability to refresh the cache from the data source asynchronously. For example, if a cache entry is read from the database and the cache is configured to expire after 5 minutes, Coherence can be configured to fetch the data from the database asynchronously before the 5 minute expiry. This is known as the refresh ahead factor. If I configure the factor to .5, this means that any client request for this key that occurs after the entry is 2 minutes and 30 seconds old will (a) return the current value for the key and (b) request that the entry be refreshed in a background thread. If no client requests happen between the 2.5 and 5 minute mark, the item expires as normal and the next request results in a synchronous load.
So if we put these concepts together, we can create a generic CacheLoader that executes queries for any cache, expires the results after a configureable amount of time, and (after the initial request) refreshes the query results ahead of time so that no client thread has to wait for the results. Best of all, this all happens in the grid, so clients don’t have to worry about the mechanics behind it all.
Here is an example: consider an application with a Person object that looks like this:
public class Person implements Serializable { public Person(String sName, String sPhone) { m_sName = sName; m_sPhone = sPhone; } public String getName() { return m_sName; } public String getPhone() { return m_sPhone; } public String toString() { return "Person{" + "'" + m_sName + '\'' + ", phone: " + m_sPhone + '}'; } private String m_sName; private String m_sPhone; } |
The cache that we’ll use to store person:
<cache-mapping> <cache-name>person</cache-name> <scheme-name>person-cache-scheme</scheme-name> </cache-mapping> ... <distributed-scheme> <scheme-name>person-cache-scheme</scheme-name> <service-name>PersonDistributedCache</service-name> <backing-map-scheme> <local-scheme /> </backing-map-scheme> </distributed-scheme> |
Pretty straight forward so far. Objects of this type will be stored in a distributed cache. Now, in our application we will be doing queries often for area code, such as:
Filter filter = new LikeFilter("getPhone", "917%"); |
This will give us all Persons with a New York area code.
Now, the configuration for the query cache:
<cache-mapping> <cache-name>query-*</cache-name> <scheme-name>query-cache-scheme</scheme-name> </cache-mapping> ... <distributed-scheme> <scheme-name>query-cache-scheme</scheme-name> <service-name>QueryCacheDistributedCache</service-name> <backing-map-scheme> <read-write-backing-map-scheme> <internal-cache-scheme> <local-scheme> <expiry-delay>1m</expiry-delay> </local-scheme> </internal-cache-scheme> <cachestore-scheme> <class-scheme> <class-name>com.tangosol.examples.QueryCacheLoader</class-name> <init-params> <init-param> <param-type>string</param-type> <param-value>{cache-name}</param-value> </init-param> </init-params> </class-scheme> </cachestore-scheme> <read-only>true</read-only> <refresh-ahead-factor>.5</refresh-ahead-factor> </read-write-backing-map-scheme> </backing-map-scheme> <backup-count>0</backup-count> <thread-count>10</thread-count> <autostart>true</autostart> </distributed-scheme> |
There’s a bit of configuration, so I’ll explain the various bits:
- We’re specifying a 1 minute delay for the expiry of this cache. This means that the results for this query will never be more than 1 minute old.
- The CacheLoader class name is specified, and we have a parameter defined that will pass along the name of the cache being used with this CacheLoader.
- The refresh-ahead-factor is configured to .5, meaning that any request for a key in the cache that is more than 30 seconds old will result in the refreshing of that cache by re-executing the query.
- The backup count is set to 0. This is transient data so if we lose a storage node in the cluster, we’re OK with losing this data since it is easy to recreate. By setting backup count to 0, this eliminates the overhead of maintaining a backup and reduces the storage requirements in the grid.
- The thread count is set to 10. I just picked an arbitrary number, but in practice this should be set high enough to ensure that there are enough threads at any given time to execute client queries simultaneously. I will point out that the asynchronous load described above happens on a separate thread.
- Also note that the service name is different than the cache storing the items we are querying: this is very important. A CacheLoader/CacheStore cannot make a call to CacheFactory.getCache() for a cache that is running under the same service, as this may lead to a deadlock.
OK, on to the CachLoader implementation!
// import statements omitted public class QueryCacheLoader extends AbstractCacheLoader { public QueryCacheLoader(String sCacheName) { azzert(sCacheName != null && sCacheName.startsWith(QUERY_PREFIX), "Cache name must start with '" + QUERY_PREFIX + "'"); m_sCacheName = sCacheName.substring(QUERY_PREFIX.length()); CacheFactory.log("Starting QueryCacheLoader for cache " + m_sCacheName, CacheFactory.LOG_DEBUG); } public Object load(Object object) { azzert(object instanceof Filter); NamedCache cache = CacheFactory.getCache(m_sCacheName); Filter filter = (Filter)object; CacheFactory.log("Executing filter " + filter + " on " + m_sCacheName, CacheFactory.LOG_DEBUG); Set keySet = cache.keySet(filter); // set that is returned is lazily deserialized only upon iteration; // so force deserialization and place into a hash set return new HashSet(keySet); } private final String m_sCacheName; public static final String QUERY_PREFIX = "query-"; } |
As we can see, the implementation is fairly straight forward. By using the naming convention “query-[cache name]“, the loader can determine which cache the query should be executed on.
From the client’s point of view, running the query is very simple:
NamedCache personQueryCache = CacheFactory.getCache("query-person"); Filter filter = new LikeFilter("getPhone", "917%"); // get set of keys that match the filter Set keySet = (Set) personQueryCache.get(filter); // retrieve values for keys; this may result in a near cache hit Map queryResults = personCache.getAll(keySet); for (Iterator i = queryResults.values().iterator(); i.hasNext();) { Person p = (Person) i.next(); System.out.println(p.toString()); } |
This technique will of course only work with the following preconditions:
- The queries that are executed are consistent in terms of criteria (not ad-hoc)
- It is acceptable to have slightly out of date results. If this is not acceptable, a MapListener could be registered to clear out the query cache when the source cache is updated.
Coherence at Oracle Open World
For anyone that is interested in technical sessions on using Coherence I suggest checking out Oracle OpenWorld later this month. There are more in-depth technical presentations on Coherence at this conference than any other, including JavaOne! In addition to talks by Tangosol founders and Coherence experts, there will be hands on training available for those that really want to get their feet wet. More training for customers == making my job easier.