Archive for the ‘Development’ Category
My Favorite New Features in Coherence 3.5
Coherence 3.5 is the third major release of Coherence since we joined Oracle back in mid 2007. We’re quite excited about this release as it includes many new features. We especially gave Coherence*Web lots of love by adding:
- Native integration with WebLogic Server/WebLogic Portal 9.2 and 10.2+.
- Fine grained concurrency management for session access
- Improved distributed session expiry algorithm
- Support for Coherence*Extend
It turns out we’re not the only ones giving Coherence lots of love. Check out this report by Gartner covering Data Grid solutions from The Big Three (Oracle, IBM, & Microsoft.)
Those interested in finding out more can watch Cam’s Live Webcast on July 29th.
The next three posts will cover my favorite new Coherence 3.5 features:
- Improved Partition Distribution Algorithm
- Service Guardian (Deadlock Detection)
- POF Extractor/Updater
Stay tuned for more!
Next Coherence SIG on June 24th
The Summer edition of the Oracle Coherence NY SIG is next week. (It feels strange saying “summer edition” as we’ve had highs in the 60′s since the start of June.) It promises to be a real treat for users of Coherence*Extend. Timur Fanshteyn has experience using Extend with .NET clients and will share things that he’s done (including a LINQ interface to Coherence.) Jason Howes will cover the internals of Extend, including the internal message API, the TCP/IP infrastructure, the ProxyService, and how it all fits in with the rest of Coherence (distributed caching, remote invocation, etc.) Finally Noah Arliss will give us an update on the Coherence Incubator. He has been working alongside Brian Oliver for the past few months on new functionality so he’ll be able to provide a good perspective on the progress being made.
If you’re on Twitter, watch for the #nysig tag. And of course check out (SIG organizer) Craig Blitz’s blog for more detail.
An Introduction to Data Grids for Database Developers
A little over a month ago I attended Collaborate 09 in Orlando. While having lunch one day I was lucky enough to run into The Oracle Nerd. He provided a good description of our encounter (hint: I’m the “engineer on the Coherence team” he mentions.) I first encountered his blog via this thread, which turned out to be his first exposure to data grids.
Expanding on that, we both agreed that a writeup on data grids for DBAs (or, as he prefers to be called, a [lower case] dba) would be useful. Little did he know what an awful procrastinator I am.
I’m going to limit the scope of this introduction to applications that use relational databases, as opposed to addressing applications that don’t use any kind of relational database or grid-oriented apps (that will have to be a topic for another time.)
Smarter Caching
An obvious (or maybe not so obvious depending on who you ask) first step in scaling a database application is to cache as much as you can. This is fairly easy to do if you have a single app server hitting a database. It becomes more interesting however as you add more app servers to the mix. For instance:
- Is it OK if the caches on your app servers are out of sync?
- What happens if one of the app servers wants to update an item in the cache?
- How do you minimize the number of database hits to refresh the cache?
- What if you don’t have enough memory on the app server to cache everything?
This is where a data grid can come in handy. Each of these items are easily addressed by Coherence:
The view of a Coherence cache will always be consistent across all nodes. If an app server updates the cache, all nodes will have instant access to that data. In fact, servers can register to receive notifications when data does change.
Most caching in app servers uses a pattern we call “cache aside,” meaning that it is up to the application to
- check to see if the data is cached
- if not, then load it from the database, place it into the cache, and return the item to the caller
A better approach is to use the “read through” pattern, meaning that it is up to the cache to load the data from the database upon a cache miss. The benefits to this approach are
- application code is much simpler; it assumes that all data can be read through the cache API
- if multiple threads (in a single JVM or across multiple JVMs) access the same item that is not in the cache, a single thread will read through to the database to load that item
This is a big win for the database; instead of answering the same question repeatedly, the database can answer the question once and all app servers benefit.
If expiry is desired for a cache, Coherence can be configured to perform “refresh ahead” on a cache. For example, if a cache is configured to expire after 5 minutes, you can configure a refresh ahead value (say 3 minutes) to determine when the value will be reloaded from the database. If an item is >3 minutes old (but not yet expired) and a thread requests that value, the currently cached value will be returned, and the value will be asynchronously refreshed in the background.
If more storage capacity is required, all you have to do is add servers. Generally there is no extra configuration required.
All in all, it provides a very sophisticated cache that can drastically reduce the number of SELECTs issued against a database.
Scaling Writes
This next data grid feature may be a bit more controversial for database developers. In the previous example, we can assume that the database is still the system of record (a.k.a the source of Truth.) For situations where we always want the database to hold the Truth, data grids can have caches configured to use the “write through” topology. This means that updates made to the cache will be synchronously written to the database, just like any other database app. However, if you have dozens of app servers with dozens of threads each writing to the database simultaneously, scalability will definitely be a concern. In this case, the cache can be configured to write the updates to the database asynchronously; this is known as “write behind” topology. Here are some of the objections that I’ve heard (and my response):
- What happens if I lose a server? Coherence maintains a backup of each entry in the cache, and it keeps track of items that have not been flushed to the database. If a JVM or a machine is lost, the data (and the fact that it still needs to be flushed) will not be lost.
- Are there any ordering guarantees? What about referential integrity? Let’s say you had a cache for Person and a cache for Address. If Address has a foreign key dependency on Person, then write behind is not a good fit. There is no guarantee that Person will make it out to the database before Address. In this case you’d have to combine the two into an object, and the write behind routine would know to insert Person before Address.
This may change someday, but the reality of write-behind today is that the database write should never fail (short of the database itself failing, in which case the item can simply be requeued until the database comes back up.) Read only caches can generally be easily retrofitted into an existing application; the same is not always true for write behind.
However, it is a very powerful apparatus in the data grid toolbox to help scale database applications.
Transient Data That You Don’t Want To Lose
Sometimes an application has transient data that does not need to be stored in a database, but it ends up being stored in the database anyway (because you don’t want to lose it.) A good example of this is HTTP sessions. You don’t want to lose a session if a web server goes down, but the database just seems like overkill for this. The scalability of the application will be limited, not to mention the scripts that will have to be written to clear out expired sessions.
For the specific case of HTTP sessions, Coherence*Web provides a solution to store sessions in Coherence caches. This is an OOTB solution; it will work with just about any J2EE compliant web application. It also works across many popular web containers, including open source and proprietary (Oracle and non Oracle) containers.
Hopefully this is a good broad introduction to data grids for database developers. Comments or follow up questions are welcome.
Two New Coherence Blogs
I’m pleased to announce two new Coherence related blogs:
First is the blog of Mark Falco, who is responsible for (among many things) our new C++ client and a bunch of the plumbing behind Coherence (including TCMP.)
Next is Aleksandar Seovic, whose claim to fame includes contributions to Spring .NET and the implementation of POF in .NET.
Be sure to add these to your RSS feed; I’m looking forward to even more great Coherence related content!
Code Review
Yesterday I had the chief Coherence architect review some of my code before submitting it to Perforce. (No it wasn’t done by Cameron.) I actually look forward to them because I always learn something new. This time what stuck with me was the refactoring of a method and the before/after difference.
Here’s the before:
protected void configureAffinitySuffix() { String sJvmRoute = m_sConfiguredJvmRoute; if (sJvmRoute != null) { CacheFactory.log("jvmRoute set to '" + sJvmRoute + "' via configuration", 6); } else { sJvmRoute = getJvmRouteViaJmx(); if (sJvmRoute == null) { m_sAffinitySuffix = ""; CacheFactory.log("jvmRoute is not configured", 6); } } if (sJvmRoute != null) { CacheFactory.log("Configured affinity suffix: " + sJvmRoute, 4); m_sAffinitySuffix = SUFFIX_SEPARATOR + sJvmRoute; } } |
Including blank lines, this is a 24 line method; and it isn’t very complex at all. Here is the newly refactored version:
protected void configureAffinitySuffix() { String sJvmRoute = m_sConfiguredJvmRoute; if (sJvmRoute == null) { sJvmRoute = getJvmRouteViaJmx(); } if (sJvmRoute == null) { m_sAffinitySuffix = ""; CacheFactory.log("The jvmRoute setting is not configured", 3); } else { m_sAffinitySuffix = SUFFIX_SEPARATOR + sJvmRoute; CacheFactory.log("Configured affinity suffix: " + sJvmRoute, 3); } } |
The main differences are:
- The new method is 19 lines, reduced from 24
- All of the if statements are consistent; they’re all testing for a positive comparison
- There are no nested blocks
- The logging is more consistent, both in the ordering (the logging appears after the relevant line of code) and in the log level. There is also less logging so as to prevent the logs from filling up with noise.
All in all, we didn’t change the functionality of this method at all; however it is much cleaner and easier to follow. This is the attention to detail that IMHO is a significant factor in the quality of Coherence. (BTW, all submissions are peer reviewed before checking into source control.)