Archive for June, 2009
Remake of “Pelham 123″
My wife and I went to see “The Taking of Pelham 123″ today. Since I own the original on DVD I was quite interested to check out this new version, especially after having read this review by Randy Kennedy.
Like most of Mr. Scott’s movies, “Pelham 1 2 3,” which is set to open June 12, is a finely calibrated box-office machine with plenty of fireworks and star power. But for anyone who has spent time past a turnstile, one of its chief attractions will also undoubtedly be how devoted it is to the look and feel of the subway and how well it pulls it off. In early reviews since the trailer was released on the Internet, even the most bloodthirsty critics — New York City subway buffs — have had a few nice things to say.
As most fellow geeks that like to laugh and point out blatant inaccuracies on film and TV, I was interested to see for myself how well they would portray the system, especially given how well it was done in the 1974 edition. I was actually quite pleased with it; it seems that the MTA provided a lot of background to make many parts of the story quite plausible. The thing that piqued my interest was the escape scene where they ended up in the Waldorf-Astoria. It turns out that there indeed was a platform at Grand Central for this hotel. This sort of thing was not all that uncommon back then. However, the station where they filmed the escape scene was labeled “Roosevelt” which is nowhere near Manhattan. I’m thinking that it must have been in the abandoned section of the Roosevelt Ave station in Queens.
Another interesting plot is sending the runaway train to Coney Island instead of South Ferry. As Coney Island is outdoors that would make the filming more interesting; but I’m not sure how a 6 could get switched to BMT tracks to take it to Coney Island. At a minimum it would have to be switched to the 4/5 express tracks to send it over to Brooklyn in the first place. I’m thinking that the train would have derailed long before making it in any event.
The 70′s edition had more interaction between the hijackers and the passengers; there is a lot more character development in that story line. In this edition, it is clearly all about the stars Travolta and Denzel which also works well.
The intro for the 2009 version features the high pitched whine that anyone that’s ever ridden the 6 would be familiar with; I thought that was a nice touch. However, the original has the best opening credits score of any movie! I’ve embedded it below for your listening pleasure.
Next Coherence SIG on June 24th
The Summer edition of the Oracle Coherence NY SIG is next week. (It feels strange saying “summer edition” as we’ve had highs in the 60′s since the start of June.) It promises to be a real treat for users of Coherence*Extend. Timur Fanshteyn has experience using Extend with .NET clients and will share things that he’s done (including a LINQ interface to Coherence.) Jason Howes will cover the internals of Extend, including the internal message API, the TCP/IP infrastructure, the ProxyService, and how it all fits in with the rest of Coherence (distributed caching, remote invocation, etc.) Finally Noah Arliss will give us an update on the Coherence Incubator. He has been working alongside Brian Oliver for the past few months on new functionality so he’ll be able to provide a good perspective on the progress being made.
If you’re on Twitter, watch for the #nysig tag. And of course check out (SIG organizer) Craig Blitz’s blog for more detail.
An Introduction to Data Grids for Database Developers
A little over a month ago I attended Collaborate 09 in Orlando. While having lunch one day I was lucky enough to run into The Oracle Nerd. He provided a good description of our encounter (hint: I’m the “engineer on the Coherence team” he mentions.) I first encountered his blog via this thread, which turned out to be his first exposure to data grids.
Expanding on that, we both agreed that a writeup on data grids for DBAs (or, as he prefers to be called, a [lower case] dba) would be useful. Little did he know what an awful procrastinator I am. I’m going to limit the scope of this introduction to applications that use relational databases, as opposed to addressing applications that don’t use any kind of relational database or grid-oriented apps (that will have to be a topic for another time.)
Smarter Caching
An obvious (or maybe not so obvious depending on who you ask) first step in scaling a database application is to cache as much as you can. This is fairly easy to do if you have a single app server hitting a database. It becomes more interesting however as you add more app servers to the mix. For instance:
- Is it OK if the caches on your app servers are out of sync?
- What happens if one of the app servers wants to update an item in the cache?
- How do you minimize the number of database hits to refresh the cache?
- What if you don’t have enough memory on the app server to cache everything?
This is where a data grid can come in handy. Each of these items are easily addressed by Coherence:
The view of a Coherence cache will always be consistent across all nodes. If an app server updates the cache, all nodes will have instant access to that data. In fact, servers can register to receive notifications when data does change.
Most caching in app servers uses a pattern we call “cache aside,” meaning that it is up to the application to
- check to see if the data is cached
- if not, then load it from the database, place it into the cache, and return the item to the caller
A better approach is to use the “read through” pattern, meaning that it is up to the cache to load the data from the database upon a cache miss. The benefits to this approach are
- application code is much simpler; it assumes that all data can be read through the cache API
- if multiple threads (in a single JVM or across multiple JVMs) access the same item that is not in the cache, a single thread will read through to the database to load that item
This is a big win for the database; instead of answering the same question repeatedly, the database can answer the question once and all app servers benefit.
If expiry is desired for a cache, Coherence can be configured to perform “refresh ahead” on a cache. For example, if a cache is configured to expire after 5 minutes, you can configure a refresh ahead value (say 3 minutes) to determine when the value will be reloaded from the database. If an item is >3 minutes old (but not yet expired) and a thread requests that value, the currently cached value will be returned, and the value will be asynchronously refreshed in the background.
If more storage capacity is required, all you have to do is add servers. Generally there is no extra configuration required.
All in all, it provides a very sophisticated cache that can drastically reduce the number of SELECTs issued against a database.
Scaling Writes
This next data grid feature may be a bit more controversial for database developers. In the previous example, we can assume that the database is still the system of record (a.k.a the source of Truth.) For situations where we always want the database to hold the Truth, data grids can have caches configured to use the “write through” topology. This means that updates made to the cache will be synchronously written to the database, just like any other database app. However, if you have dozens of app servers with dozens of threads each writing to the database simultaneously, scalability will definitely be a concern. In this case, the cache can be configured to write the updates to the database asynchronously; this is known as “write behind” topology. Here are some of the objections that I’ve heard (and my response):
- What happens if I lose a server? Coherence maintains a backup of each entry in the cache, and it keeps track of items that have not been flushed to the database. If a JVM or a machine is lost, the data (and the fact that it still needs to be flushed) will not be lost.
- Are there any ordering guarantees? What about referential integrity? Let’s say you had a cache for Person and a cache for Address. If Address has a foreign key dependency on Person, then write behind is not a good fit. There is no guarantee that Person will make it out to the database before Address. In this case you’d have to combine the two into an object, and the write behind routine would know to insert Person before Address.
This may change someday, but the reality of write-behind today is that the database write should never fail (short of the database itself failing, in which case the item can simply be requeued until the database comes back up.) Read only caches can generally be easily retrofitted into an existing application; the same is not always true for write behind.
However, it is a very powerful apparatus in the data grid toolbox to help scale database applications.
Transient Data That You Don’t Want To Lose
Sometimes an application has transient data that does not need to be stored in a database, but it ends up being stored in the database anyway (because you don’t want to lose it.) A good example of this is HTTP sessions. You don’t want to lose a session if a web server goes down, but the database just seems like overkill for this. The scalability of the application will be limited, not to mention the scripts that will have to be written to clear out expired sessions.
For the specific case of HTTP sessions, Coherence*Web provides a solution to store sessions in Coherence caches. This is an OOTB solution; it will work with just about any J2EE compliant web application. It also works across many popular web containers, including open source and proprietary (Oracle and non Oracle) containers.
Hopefully this is a good broad introduction to data grids for database developers. Comments or follow up questions are welcome.