Amazon’s SimpleDB is brining non-relational databases (such as CouchDB and RDDB) into the spotlight. I’ve been eagerly looking forward to the paradigm in databases for some time now, having never been convinced that object databases were the next step. So this is exciting stuff.
One of the big questions I have when moving away from a SQL database is “what about joins?” Because ultimately SQL is about relations, and the JOIN clause is the heart of the power of a SQL database.
Assaf Arkin’s fascinating post on dumb databases argues that JOIN is actually a hack to fix an underlying design problem. In order to understand a world without them, you need to think in terms of weak and strong relations:
In modeling the entities, I realized that orders and products are distinct with weak ties between them. [...] But in modeling the order entity, the decision I made was to store line items inside the order itself. I realized I have no compelling use case to keep those separate. When I add or remove a line item, I’m changing the order, I expect the order to have a new version and updated timestamp. When I delete the order, I assume all the line items will go away. And when I query the order, I intend to find all the line items there, without resorting to Cartesian join and result-set gymnastics.
In a SQL database, there’s no explicit indication of the strength of a relationship. Product has_many :orders, but rarely or ever do you want to fetch every order associated with a product. On the other hand, Order has_many :items, and yes - any time you fetch or update an order you also want to fetch and update the items that go with it. The former is a weak link, the latter is a strong one. (You could probably an app’s weak and strong links by looking at which tables are typically :include’d from that model’s find.)
Other interesting stuff in Assaf’s post:
Maintaining indexes is rightfully business logic that should be handled by the application - since the application knows how it wants to query the data later. This makes more sense when you realize that CouchDB calls these views instead of indexes. Also note that computed values (that is, values that are calculated and then cached in a column of the table when the record is updated) become one and the same with indexes.
Schemas are useless constraints, putting business logic into the database, and requiring migrations and other headaches to maintain. (This might be a good time to review DataMapper, since that pattern fits well with a schemaless database.)
The old example of using a database transaction to wrap the transfer of money from one bank account to another is total bull. The correct solution is to store a list of ledger events (transfers between accounts) and show the current balance as a sum of the ledger. If you’re programming in a functional language (or thinking that way), this is obvious.
Further - and this is my own conclusion here - the result of these flat document databases is that web protocols and database protocols may be converging. CouchDB already uses JSON and REST. One implication may be that ActiveRecord and ActiveResource may become the same thing in the future.
So, like - woah, dude. This is some heady stuff, yet it rings true to me. SQL, as well as it has served us for so many years, may be on its way out. And not a moment too soon.