Building a Queue-Backed Feed Reader, Part 1

Tue Apr 14 13:06:09 -0700 2009

Queueing is a critical tool for building truly scalable web apps. Don’t do any heavy lifting in the web processes (mongrels); instead, put jobs on a queue, let background workers do the work, and then display the results to the user in another request.

This concept is often met with a big “huh?” from Rubyists, so here I’m going to give a tangible example, including a small but completely functional Rails app, using Delayed Job (aka DJ) for the queueing solution. Follow along by cloning the example app, QFeedReader, on your local system:

git clone git://github.com/adamwiggins/qfeedreader.git
cd qfeedreader

Step 1: Non-Queueing Feed Reader

Our first pass on QFeedReader will be without queueing - the way your apps are probably written today. It works like this: a user submits the URL of a feed, the app creates a Feed model, and the after_create causes the feed to be fetched immediately. Then the user is redirected back to the front page, where they can see the results. After submitting a few feed URLs, clicking “refresh all” will visit all the feeds, looking for new posts, and updating the database appropriately.

Take a look at the code for this step by setting your local checkout out the step1 branch:

git checkout -b step1 origin/step1

Fire it up with rake db:migrate and script/server.

When submitting a new feed or refreshing an existing single feed, requests generally take around half a second - the upper limit of what’s acceptable for a web request. But if you submit a url with a bad host (for example: http://1.2.3.4/notafeed), an entire web process is stalled for 30 seconds or more as Net::HTTP blocks trying to open the connection. Running locally with a single process, this means that you can’t access the site at all during this time! (You can demonstrate this by opening another tab pointing at the local app.) Obviously, this doesn’t scale.

Another place where you can see the need for doing the feed-fetching in the background is the “refresh all” link. Once you’ve added a handful of valid feeds, the refresh-all link can take five seconds or more to run. The more links you add, the longer the refresh will take. The user gets no feedback in their browser during this time - for all they know, your app is offline.

What’s the solution? Move the heavy lifting - for this app, fetching feeds - into a background worker, which will be fed jobs from a queue.

Step 2: Queueing with DJ

There are many Rails queueing systems available: DJ, BJ, Workling, Beanstalk, and many more. For this example we’ll use DJ (short for Delayed Job), written by Tobias Lütke of Shopify.

Delayed Job is probably the most popular Rails solution for queueing, and it’s simple to set up since it doesn’t depend on a separate message bus (like XMPP or AMQP), but rather queues jobs through a table in the database. Note, though, that the app we’re building here could be ported to any of the above-mentioned queueing systems without much effort. I hope what you get out of reading this is an understanding of the concepts behind queueing and why it is valuable, rather than details of a particular queueing system’s implementation.

Take a look at the queueing version of QFeedReader on your local checkout:

git checkout -b step2 origin/step2

You’ll need to restart script/server to load the DJ plugin, and run db:migrate to add the delayed_jobs table (which is where DJ queues jobs).

There’s only one change to the business logic of the app: Feed#fetch will now put a job on the queue; the background work will be done by Feed#perform. Here's the diff.

Now, when feed.fetch is called (either on after_create, or from the refresh action in the feeds controller), the method will use DJ to enqueue a job (creating an entry in the delayed_jobs table) and return immediately. The user will never experience a delay on any pageview. QFeedReader now scales!

How is the queued job executed in the background? For this, you need to run a separate worker process. Worker processes are very similar to web processes (i.e., script/server); but instead of serving web requests coming in on a port, it performs jobs coming in on a queue. Start the DJ worker process like this:

$ rake jobs:work
*** Starting job worker host:crescent pid:15036
1 jobs processed at 0.8934 j/s, 0 failed ...

This output shows that I already had a job in queue when I ran jobs:work. It processes that job right away, and then sits and waits for more jobs to come in. You’ll need to leave this process open in a terminal (again, reminiscent of script/server) while you use the app locally in order for your jobs to be processed.

This is a key concept, so let me hammer on it for a moment. Your app now runs two different types of processes: the web server (Mongrel or Thin), and background workers (for DJ, rake jobs:work).

The purpose of the web server process is to serve requests, built from information in the database or a cache, returning as quickly as possible to the user.
The purpose of background workers is to handle computation, network, image resizing, or any work that could possibly take more than a few hundred milliseconds. This work is handled asynchronously and the results put into a location (usually the database) where future web requests can read it, providing updated information to users.

To Be Continued...

Queueing is a key piece of any scalable web app. This example shows that even a small app can benefit from using queues, and Delayed::Job makes it easy to do in Rails.

In Part 2, coming soon, we’ll look at how we can improve the user experience with ajax and polling.

a tornado of razorblades