Don't Build the Super Nifty Node System

methodology

Thu May 08 15:26:00 -0700 2008

Programmers tend to overdesign. Rather than building a quick solution that works right now for the specific case, we want to build one that will solve all problems of that type, both now and for all time. Dreaming In Code shows a particularly egregious case of this: the developers spend years building a framework for the app, rather than the app itself. This type of story is quite common. So frequent is this pitfall that the agile methodology mantra for avoiding it if often referred to by its abbreviation: YAGNI (You Aren’t Going To Need It).

On the opposite extreme, there’s the quick-and-dirty hack. But this has its own problems: it’s fragile and inflexible. (It’s telling that Microsoft’s first pass at an operating system was named QDOS, where the QD stood for Quick-and-Dirty.) Quick hacks don’t lend themselves to being built upon. Contrived example: a method called sum_two_and_two that returns the constant four would not be nearly as useful as a method called sum that takes two integer arguments and returns their sum. Generalized solutions are important - hell, they’re what software is about.

So where is the right balance? My parter Orion puts it well: “You want to build an architecture that will be somewhat flexible, but not infinitely so.”

Here’s an anti-example: at our first venture together, we needed some customer relationship management software which the whole company could access. (This was eight years ago, and there weren’t any suitable web-based commercial or open source choices that we knew of.) We ended up writing something called the Super Nifty Node System. It didn’t track companies, people, and leads, like you might expect. Instead it had a single object type - a Node - and the users could define node types and relationships between them. The idea was that we were building a system which was so flexible that we wouldn’t need to do any programming when someone wanted to track something new. We thought we were solving both the original problem and many related problems, and would never need to touch the software again.

In practice, this worked out poorly. Our non-technical users didn’t understand how to create new node types, and didn’t really want to anyway. Things like getting the fields to go in a certain order was really difficult, leading to a lot of user frustration on entering addresses. Worse was that simple programmatic tasks we wanted to perform - like pulling out a list of email addresses for all our partner companies, or a list of all our customers who had been with us for a year or more - required SQL statements with fifteen complex joins that wrapped around the screen six times. In retrospect, we should have just made tables for companies, people, and leads - even though there would have been a fair bit of duplicated code, this being in the pre-Rails (and generally pre-framework) era.

Many people have asked us why we didn’t build Heroku to support multiple frameworks and languages. Why not Heroku for Python/Django, my second favorite language/framework combo after Ruby/Rails? Why not for PHP and some of the excellent MVC frameworks that exist for it? Why limit ourselves, and our potential audience, by building to the more specific case?

Were I building Heroku earlier in my career, I might have designed it with this in mind. I might have created an abstract base class App, and from that inherited RailsApp. Or perhaps App has_one :framework and has_one :language, and then Framework and Language are abstract classes that can be inherited by Framework::Rails, Framework::Django, Language::Ruby, and Language::Python. The Django and Python classes would have sat empty for months or years as we put our energy into developing for Ruby and Rails.

And there’s a cost to leaving that infrastructure laying around. Just because you’re not actively developing on a particular part of the codebase doesn’t mean its maintenance-free: you’ve got more abstractions to keep in your head, more training time for new members on the team, more specs to keep running. Even just the extra files hanging around in the code tree adds a tiny bit of overhead for your brain each time you do a directory listing or otherwise manage the code.

So don’t start generalizing until you have a strong need for it. Generalizing too early is the death of many a project - almost as often as generalizing too late.