Exploration of NoSQL: FlockDB


In our last article, we touched on the Facebook born NoSQL database called Casandra. For this article, we’ll jump over to the other major player in social media: Twitter. Twitter recently celebrated its 5th birthday, and posted some stats boasting one billion tweets per week. This, coupled with an average of 420,000 new Twitter accounts per day, makes for some interesting database scenarios. How can you handle this massive amount of tweeting, account creation, and connection between accounts (following, un-following, blocking, etc)? FlockDB was born to handle these problems.

The biggest difference between FlockDB and other graph databases like Neo4j and OrientDB is graph traversal. Twitter’s model has no need for traversing the social graph. Instead, Twitter is only concerned about the direct edges (relationships) on a given node (account). For example, Twitter doesn’t want to know who follows a person you follow. Instead, it is only interested in the people you follow. By trimming off graph traversal functions, FlockDB is able to allocate resources elsewhere.

What FlockDB lacks in graph traversals, it makes up in scalability and high rate of operations. According to FlockDB’s documentation, the FlockDB cluster powering Twitter stored over 13 billion edges and sustained peak traffic of 20,000 writes per second and 100,000 reads per second. That was as April 2010. Considering Twitter’s growth over the past year there’s no doubt that those numbers have at least doubled if not more, and thus further supports FlockDB’s ability to handle a massive amount of read and write operations on the graph. These numbers are impressive despite the public’s familiarity of the fail whale.

Twitter continues to maintain FlockDB with regular commits on github. Although there is a recommended PHP interface for FlockDB, it is not regularly maintained, and despite being built on thrift, a PHP library was not easy to build by hand. Thus, Ruby is the preferred language to integrate FlockDB. The need for such a high volume of read/write operations is unique, so there may not be many projects in the near future requiring such a powerful option. Until that point, it is difficult to see any serious language support other than Ruby, which is what apparently powers Twitter’s use of FlockDB.

Does FlockDB sound like something you would use? Let us know in the comments.

About the Author:
Michael Marr is a staff writer for WebProNews