Database architecture is the most debated and misunderstood topic regarding scalability today. The noSQL movement is enchanting because it is new and is designed for web-scale databases. The SQL movement is convincing because it has been the established standard for decades. Let’s look at some of the various architectures in the two movements and their scalability.
Before looking at SQL and noSQL it is important to note that popular debate has enormously skewed terminology, though perhaps unintentionally. In an attempt to merge the popular and scholarly debates I will shove the concepts of ACID compliance and relational databases under the umbrella of SQL and the concepts of BASE and non-relational databases under the umbrella of noSQL. The reason for doing this is conceptual. Databases are a galaxy of their own in the technology universe, so combining terms will aid understanding in the generic debate, but will need to become more discrete in the technical debate. Here we will look at the generic debate.
Let’s start with SQL, and again, note that I’m using that as an umbrella term. SQL databases, like MySQL, have been the standard go-to database for decades. Saying that the internet runs on MySQL is not too far from the truth. Yet, organizations have lately come to debate the use of SQL on three grounds: availability, distributability, and agility.
Where organizations have come to debate SQL’s availability can be explained by Brewer’s CAP Theorem. The theorem states that Consistency, Availability, or Partition tolerance (CAP) has to be compromised on distributed systems. Ok, so how does the theorem apply? As previously stated, distributing resources is the bottom line to scalability, because no single system, even a supercomputer, can handle the volume of data being transferred and used on the internet today. When enormous websites, such as Facebook, looked to accomplish this they had to deal with the issue that SQL’s ACID compliance (specifically the Consistency aspect) reduced the Availability of the website. This is not acceptable on the internet.
The second argument against SQL is its distributability. SQL database systems were developed at a time when resource distribution was not a common need. That has changed entirely with the internet. Now, there are many websites that need to distribute their resources to achieve scalability, but SQL databases do not have tools built in to assist that. The result is that organizations have to look outside of SQL, such as dbShards, to manage.
Lastly, SQL databases lack agility. That is, the relational model has limitations and disadvantages. For example, when adding a field to a table there must be a value populated for every row which uses additional space. Not only this, but to adhere to ACID principles, the table is unavailable at that time. Also, adding or removing indices can take large amounts of time and syncing the change across replicated databases even more so.
Along came the noSQL movement in response. To deal with the Availability issue, noSQL adheres to BASE instead of ACID which means it drops the Consistency requirement in favor of Eventual consistency. The result of this is that it is always available at the cost of potentially having stale data served to users. To deal with distributability noSQL databases, like the very popular MongoDB, come with tools for sharding and replication built in. As a result, noSQL databases are arguably easier to scale. Finally, because noSQL databases do not use the relational model it can easily adjust to a changing schema.
NewSQL (The Synthesis)
Any major database architecture can be scaled. It is not that SQL databases cannot be scaled, it’s that they haven’t needed to respond until now. MySQL has responded to the issues above and are producing solutions like MySQL Cluster. Though noSQL has answered many of the issues of SQL, the databases in the movement are very young, have not been proven reliable, and produce many issues of their own. SQL databases can be adjusted for eventual consistency, can be expanded to include distribution tools, and can be changed to allow for greater agility. Major websites like Facebook are still using MySQL, and if they can do it anyone can.
In conclusion, the general debate of database architecture scalability in a nutshell can be concluded as follows: any major database architecture can be scaled. That said, when it comes to more technical debates one needs to look at whether or not a relational or non-relation model is better. It may be that consistency is a bottle neck and can be eliminated by implementing the eventual consistency aspect of BASE. In the next part, we will look at some general guidelines and basic principles for the process of scaling databases.
- ACID versus BASE for database transactions
- Database high priest mud-wrestles Facebook
- A Co-Relational Model of Data for Large Shared Data Banks
- Your Ultimate Guide to the Non-Relational Universe!
- It’s Time to Drop the “F” Bomb – or “Lies, Damn Lies, and NoSQL.”
- NoSQL at Netflix
- Maintaining Eventual Path Consistency
- NoSQL, NewSQL and Beyond
- Availability & Consistency
- Dynamo: Amazon’s Highly Available Key-value Store