Database Scalability, Part 1: Introduction

The dawn of the internet age knew little of database scalability. One size fitted everyone. Yet, the boom in data spurred by consumption of mega web applications like Facebook, Google, and the like, have transformed the way we look at databases today. With good marketing and getting the right people on board, even the most absurd ideas can become popular, and when it comes to scalability there is a lot of hype but little substantial dialectic. This article serves to give a basic overview of the theory behind database scalability and hopefully give a bigger perspective of what is involved.

Scalability

Scalability is just that–the ability to scale. Julian Browne wrote the best article on the definition of what scalability is. He refers to a post by Werner Vogel, Amazon CTO, who says,

A service is said to be scalable if, when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.

Browne notes that, “[a]ny system can scale given enough time and money.” The question he then raises is what is the easiest route to scalability. There are three basic areas that can be improved to achieve scalability,

  • Hardware
  • Resource distribution
  • Database architecture

The first is the most obvious route and the simplest to implement, though beyond basic configurations like caching it requires capital to improve on and has a threshold. The second requires the combination of hardware and database architecture, and the skill involved with the third requires significant human capital.

Hardware

As said, hardware improvements are the simplest to implement. Buying more memory, hard drive space, faster processors, etc, can go a long way. That combined with database caching, or even system level caching provided by memcached or redis, can make even greater use of resources. However, for databases that are exploding in terms of use, this must be complemented by other areas.

Resource Distribution

Resource distribution is perhaps the most effective way to scale databases, but is the most difficult. Not only does it require the knowledge and expertise to setup and maintain the hardware, but it requires advanced techniques in database architecture to make use of it. It is here that Brewer’s CAP Theorem comes into play.

Database Architecture

Architecture is the most debated and misunderstood topic in the database community today. The noSQL movement has enchanted everyone through youth, charm, and a whole lot of misinformed publicity. While there is much to be said for schemaless and non-relational database models, the 30 years and running SQL movement has its place. It is well to state that SQL and noSQL are really just two sides of the same coin. MongoDB precisely states, “[d]atabases are specializing – the ‘one size fits all’ approach no longer applies.”

Published
Categorized as Database

By Joe Purcell

Joe Purcell is a technology virtuoso, cyberspace frontiersman, and connoisseur of Linux, Mac, and Windows alike.

Leave a comment