Over the past few weeks, we’ve been exploring various NoSQL options with the intent of raising awareness to the vast amount of NoSQL databases designed with particular shortcomings of relational databases mind. Armed with this information, hurdles in future projects may be easily conquered if the appropriate NoSQL database is implemented. We’ve touched on a document storage database with MongoDB, a graph database in Neo4j, and a combination of the two in OrientDB. This time, we’re going to take a look at the forfathers of NoSQL: key-value stores. One of the most popular key-value stores is Cassandra.
Cassandra was based off an earlier key-value store, Dynamo, and was developed by Facebook. It is still used by Facebook today, as well as some other big sites like Digg, Twitter, Rackspace, and Reddit. Key-value stores, in general, are simply an improved way to store large sets of key-value combinations. Cassandra goes a step beyond this and allows for column families. These families allow various and separate values to be stored to a given key. These column families are not equal to a relational database schema, as one key may have one column and another key may have two or more unrelated columns. The addition of multiple values for a given key makes Cassandra that much more powerful.
Cassandra, however, isn’t a magical database that defeats the CAP Theorem. But, Cassandra allows fine tuning to allow you to find the balance between performance and consistency that is best for our production environment. Although few of us will likely have the need to push something like Cassandra to its limits, there are certainly cases in all of our experiences where a set of constant queries to a relational database bottleneck our application. Key-value stores, like Cassandra, can help alleviate that issue.