Battling Wedged Mongrels with a Request Timeout

rails unix

Tue Jun 17 00:06:00 -0700 2008

The dreaded “wedged Mongrel” - your app server stuck on one request, with others piling up, waiting infinitely for it to come free - is a problem all production Rails apps face sooner or later. The solution most commonly used is to restart the app servers frequently, via something like Monit, or just on a cron job.

But such solutions are just a band-aids which hide the real problem, which is that your code is getting stuck in an infinite loop, or waiting on an IO request which never returns. A better solution is to wrap all your actions in a timeout:

class ApplicationController < ActionController::Base
  around_filter :timeout

  def timeout
    require 'timeout'
    Timeout.timeout(30) do
      yield
    end
  end
end

This prevents the wedged app server. And combined with an exception notifier, you’ll be able to see which requests are getting wedged, so that you can fix your code. (Periodic app server restarts are still needed to combat memory leakage - another problem entirely.)

I’m surprised that request timeouts aren’t a standard part of web frameworks like Rails, application servers like Mongrel, or both. (If you’ve seen the “timeout” parameter for Thin or Mongrel, don’t be fooled - it's not that kind of timeout.) After all, web requests aren’t supposed to be long-lasting. Nginx or Apache will time out the request after 90 seconds or so anyway, but this doesn’t stop your app server from grinding away infinitely on the request.

But there’s a catch with Timeout. It uses Ruby threads, which only works as long as it’s Ruby code that’s getting stuck or taking too long. The second case - a system call that’s getting stuck - is often the problem. So this will time out:

Timeout.timeout(3) do
  sleep 4
  puts 'done'
end

…but this will not:

Timeout.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Good unix jockeys know that SIGALRM is the correct solution here. Back in my MUD days I encountered this technique in the CircleMUD server: it would detect infinite loops and abort with a log message, allowing the game to continue running. “Wow,” I said the first time I saw it in action. “How does it know?” That’s the magic of SIGALRM.

Philippe Hanrigou and David Vollbracht have implemented a SIGALRM solution for Ruby in the form of SystemTimer. (They also give a great description of green threads and why they don’t play well with the underlying OS.) This is a nearly drop-in replacement for Timeout. Try it:

SystemTimer.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Woot! So now, your final solution for preventing wedged app servers in production:

class ApplicationController
  around_filter :timeout

  def timeout
    require 'system_timer'
    SystemTimer.timeout(30) do
      yield
    end
  end
end