SQL Databases Don't Scale


Mon Jul 06 12:51:22 -0700 2009

A question I’m often asked about Heroku is: “How do you scale the SQL database?” There’s a lot of things I can say about using caching, sharding, and other techniques to take load off the database. But the actual answer is: we don’t. SQL databases are fundamentally non-scalable, and there is no magical pixie dust that we, or anyone, can sprinkle on them to suddenly make them scale.

What Is Scaling?

To qualify as true scaling, I believe a technique must fit the following criteria:

  1. Horizontal scale: more servers creates more capacity.
  2. Transparent to the application: the business logic of the app should be separated from concerns of scaling server resources.
  3. No single point of failure: there should be no one server which, if lost, causes downtime of the application.

As an example from the hardware world, a RAID5 disk array offers true scaling:

  1. Horizontal scale: you can run a RAID5 with 4 disks, or 12, or 20; more disks gives you more drive space and (generally) better performance.
  2. Transparent to the application: applications using the RAID as a single device. My text editor doesn’t care that the file it is saving or loading is split across many disks.
  3. No single point of failure: you can pop out a drive and the array will continue to function (albeit with a performance hit, known as “degraded mode”). Replace the drive and it will rebuild itself. All of this happens without the applications using the RAID being aware any interruption of the functioning of the disk storage.

So let’s take a look at some of the techniques used to “scale” SQL databases, and why they all fail to achieve the criteria above.

Vertical Scaling

One way to scale a SQL database is to buy a bigger box. This is usually called “vertical scaling.” I sometimes call it: Moore’s law scaling. What are the problems with this?

  • It’s a complicated transaction that usually requires manual labor from ops people and substantial downtime.
  • The old machine is useless afterward. This wastes resources and encourages you to overprovision, buying a big server before you know you’re really going to need it.
  • There’s a cap on how big you can go.

I know plenty of folks who have bumped their head on the last point (usually somewhere around a 256-core Sun server) and now they are painted into a corner. They find themselves crossing their fingers and hoping that they’ll stop growing - not cool at all.

So yes, you can put a SQL database on a bigger box to buy yourself more headroom. But this fails point #1 on my checklist for true scaling.

Partitioning, aka Sharding

Sharding divides your data along some kind of application-specific boundary. For example, you might store users whose names start with A-M on one database, and N-Z on another. Or use a modulo of the user id by the number of databases.

This requires deep integration into the application and careful planning of the partitioning scheme relative to the database schema and the kinds of queries you want to do. Summary: big pain in the ass.

So while sharding is a form of horizontal scaling, it fails point #2: it is not transparent to the business logic of the application.

The deeper problem with sharding is that SQL databases are relational databases, and most of the value in a relational database is that it stores relationships. Once you split records across multiple servers, you’re servering many of those relations; they now have to be reconstructed on the client side. Sharding kills most of the value of a relational database.

Read Slaves

MySQL’s killer feature is easy configuration of master-slave replication, where you have a read-only slave database that replicates everything coming to the master database in realtime. You can then create a routing proxy between the clients and your database (or build smart routing into the client library) which sends any reads (SELECT) to one of the read slaves, while only sending writes (INSERT, UPDATE, DELETE) to the master.

Postgres has replication via Slony, though it’s much more cumbersome to set up than MySQL’s replication. Years ago I even did basic master-slave replication with a 50-line Perl script that read from the query log and mirrored all write queries over to the slave. So replication is possible just about anywhere, with differing degrees of setup and maintenance headaches.

The read slave technique is the best option for scaling SQL databases. It qualifies as horizontal scaling on the read side, and is transparent to the application. This is the technique that many of the largest MySQL installs use.

And yet, this is still ultimately a limited technique. The master server is a bottleneck, particularly on write-heavy applications. And it fails point #3 on the true scaling definition: there is a single point of failure. This is a problem not only when the database fails, but when you want to perform maintenance on the server. Promoting one of the read slaves to master allows you to recover relatively quickly, but this switcharoo requires hands-on attention from the sysadmins.

And while it qualifies as horizontal scaling for read performance, it does not qualify for writes and capacity. Horizontal scaling should spread out both the load of executing queries and the data storage itself. This is how you can grow the total storage space of a RAID: add enough disks, and you can get a RAID which has a much greater capacity than any single hard drive on the market. Read slaves require complete copies of the database, so you still have a ceiling on how much data you can store.

A Long-Term Solution

So where do we go from here? Some might reply “keep trying to make SQL databases scale.” I disagree with that notion. When hundreds of companies and thousands of the brightest programmers and sysadmins have been trying to solve a problem for twenty years and still haven’t managed to come up with an obvious solution that everyone adopts, that says to me the problem is unsolvable. Like Kirk facing the Kobayashi Maru, we can only solve this problem by redefining the question.