This week London hosted the largest NoSQL conference so far. The aim was to explore non-relational data stores, which have grown in prominence recently, particularly withTwitter joining Facebook and Digg as a Cassandra adopter.
Pragmatism was a common theme of all the presentations, and the principle of “using the right tool for the job” came up again and again. There seemed to be general agreement that what we are really talking about is “not only SQL”, in other words to use these technologies as a complement to relational databases, where many years of experience have been accumulated.
I recommend reading the presentations for day 1 and day 2, both of which are on myNoSQL (which Makoto Inoue called “The Hello of NoSQL” in his very entertaining talk on Tokyo Cabinet). The two days I was there were enjoyable for many reasons, including many general pearls of architectural wisdom, but I want to focus on practical examples where NoSQL is in use.
So how are people using NoSQL?
Matt Wall and Simon Willison described how they do things at the Guardian. They have an Enterprise Java platform that provides feeds on which front-end developers can build useful features. Around the edges, the team have used various tools for rapid development, including Redis (read Simon’s post) for the BNP heat map and a more performant version of the MP expenses review page, Google AppEngine for Zeitgeist, and Google Docs (specifically spreadsheets) for sharing data.
Jonathan Ellis gave a technical presentation on Cassandra, which is designed from the bottom up for replicating data. Replication across nodes is easily achieved by streaming pre-sorted blocks sequentially.
Kevin Weil told us about the challenges they face at Twitter, starting with the 7TB of data they collect each day (writing that amount of data at a typical disk speed of 80MB/s would take 24.3 hours). In addition to adopting Cassandra, Twitter has developed FlockDB, which is a social graph store.
The BBC uses CouchDB as a key value store for iPlayer, the home page layout, and the film network. Enda Farrell explained that they control access to CouchDB through an API, which allows them to support authorisation, sharding and JMX instrumentation.
Matthew Ford has been involved in a number of NoSQL projects, and covered the pros and cons of the document-oriented data stores (CouchDB /MongoDB), and key-value stores.
Tobias Iversson‘s slide sums up where NoSQL data stores fit in to the architect’s toolbox; relational databases are suitable for the majority of cases, and we understand well how to manage these.
However, there are some specific cases, that is key-value stores and graph databases, where alternative solutions are better.
The choice of whether to choose a document data store instead of a relational database (RDBMS) is more difficult. Cassandra has been proven with large data sets, with indexing done by pre-defining “supercolumns” that provide the mapping between indexes and their corresponding data values. CouchDB queries are done through views, while MongoDB allows ad-hoc queries. Relational and non-relational databases all deal with structured data, but the non-RDBMS stores all the data in one place whereas the RDBMS requires you to join rows from normalised data tables.
The structured data model feels like a more natural fit to the data model typically used in an application, and avoids the object-relational mapping problems associated with mapping a hierarchical structure to a set of flat tables. However, adopting one of the “newer” data stores is intrinsically more risky than an RDBMS, because these have been around for less long.
Although Cassandra is apparently suitable for single-server installations I expect it will be the option for larger sites for some time, given the additional complexity associated with the super-column model. For smaller sites you may find CouchDB and MongoDBs features appealing, such as CouchDB’s replication (also being adopted by MongoDB), and the easier interfacing through JSON. However, a relational database is still likely to be the right choice for the majority of cases.