Read-Only Source Trees

methodology cloud

Wed Jul 02 15:06:00 -0700 2008

Cloud computing is on everyone’s minds, because it offers the promise of infinite horizontal scalability. But to achieve this, we have to change how we build applications.

One such change is how we use the filesystem. The filesystem is unix’s database. “Everything is a file” has served us well for decades, and that concept will continue to be critical at the systems layer. But at the application layer, it’s time to stop treating the filesystem as a catch-all dumping ground, and start treating the data we store there in a more structured way.

An app’s main use of the filesystem is sourcefiles. What qualifies as a sourcefile? Your code, sure - Ruby, ERB, HTML, Javascript, CSS, specs/tests, rake tasks. But also, small static assets that are part of the application’s interface, like public/images/top_left_gradient.png and public/robots.txt. If you check it into revision control, then it is probably a sourcefile.

Other than sourcefiles, what do we stick on the filesystem? PIDs and logfiles come to mind. Anything that it is in tmp or log. This stuff is not source, which is probably why it’s in your .gitignore. In my opinion it should not be in your application’s directory structure at all.

How about user-uploaded assets, like profile pictures? attachment_fu offers a filesystem backend, which shoves files into your public/ dir. But these are not source - it’s application data. It has more in common with the contents of the database: data specific to a particular installation of the app. Putting this data into your source tree is confusing.

More significantly, it greatly complicates the problem of scaling.

The correct solution, in my opinion, is to forbid access to the source tree by the web app. Temporary files can be offered through Ruby’s Tempfile interface, with the understanding that files thus created are not accessible beyond the lifetime of the request being served.

Logs are a whole other challenge. I’m not a big fan of logfiles; there are better solutions to the logging problem, which I’ll write about some other time. In the meantime, logs should go outside the code tree, some sort of /var-style location which can be cycled or thrown away as needed. This location could be write-only for the app; it pushes things in, but it can’t read them back or otherwise access it once written. A one-way channel, ala syslog.

As for attachments, asset stores are the correct solution. attachment_fu’s :storage => :s3 backend, for example. Storing in the database is reasonable, though I’ve always found a lot of frustration in trying to store large binary data in the database.

As we continue to explore the next generation of application deployment, I think we’re going to bump into a number of ways to structure apps differently in order to make them scalable. There will be some transitionary pain with these changes, because structure implies restrictions. Many PHP developers coming to Rails have complained about not being able to access sessions from models, or write SQL in your view. MVC creates restrictions, yes, but those very restrictions are what provides the structure. Coming from an unstructured environment, those restrictions may seem cumbersome or arbitrary; but once you’re in the habit, you come to appreciate the structure they create.