Taps for Easy Database Transfers

taps databases opensource

Wed Feb 11 16:19:28 -0800 2009

Migrating databases from one server to another is a pain: mysqldump on old server -> gzip -> scp big dump file -> gunzip -> mysql. It takes a long time, and is very manual and (and thus error-prone), and generally has the stink of “lame” hanging about it.

Ricardo Chimal, Blake Mizerany and I cooked up our attempt at a solution to this problem: Taps.

Taps is a temporary web service you run on a server that has access to the database you want to export. You can then run the client to connect to that service and pull data out of it in chunks. It works through firewalls, doesn’t require a direct ssh connection, and - best of all - it’s database independent. So you can export from a MySQL database and import to PostgreSQL, or vice versa.

For example, let’s say you want to export a MySQL database running on your Slicehost slice to your freshly provisioned extra-large EC2 instance.

First, install the taps gem on both machines:

$ gem install taps

On the slicehost machine, run the server:

$ taps server mysql://root@localhost/mydb tmpuser tmppass
== Sinatra/0.9.0.4 has taken the stage on 5000 for production with backup from Thin

(You’ll need to either open port 5000 in your firewall, or use an Apache or Nginx to create a vhost for the temporary taps server.)

Now, on the ec2 instance, assuming you’re using the raw port:

$ mysqladmin create mydb
$ taps pull mysql://root@localhost/mydb http://tmpuser:tmppass@slicehost-box:5000 
Receiving schema from remote taps server
Receiving data from remote taps server
4 tables, 1,064 records
widgets:   20% |====              | Time: 00:00:00

After it finishes, you can Ctrl-C the taps server on the original machine.

The web service uses Sinatra and exports schema and data as marshalled arrays. It’s not as fast as a raw dump, but it’s not bad. Ricardo benchmarked a transfer of a 500MB Postgres database with about 250k records between two small EC2 instances at about 10 minutes.

Here’s an example working locally of exporting a SQLite database (in the file my.db) to Postgres:

$ taps server sqlite://my.db
== Sinatra/0.9.0.4 has taken the stage on 5000 for production with backup from Thin
$ createdb mydb
CREATE DATABASE
$ taps pull postgres://postgres@localhost/mydb http://localhost:5000
Receiving schema from remote taps server
Receiving data from remote taps server
1 tables, 2 records
widgets:       100% |====================| Time: 00:00:00

This is a challenging problem and Taps is still a very young solution. But I think it points the way to an automated, elegant approach to a relatively common problem. If you’ve had this itch yourself, grab a copy and help us test. Send bug reports and pull requests to Ricardo.