Rethinking Cron

Tue Apr 13 15:42:56 -0700 2010

Cron is a trusty tool in the unix toolbox for scheduling work to run at periodic intervals. In addition to system tasks, it’s common for app developers to use an app-specific crontab to run application tasks. For example, if your app is a feed reader, you might use a cronjob to fetch new feeds every three hours, and another cronjob to clean out old unread articles every night.

Cron Weaknesses

While application crontabs have served us well enough, this technique has a number of weaknesses.

One problem is that cron is per-machine, so once you scale to multiple app servers you’ll need locks stored in a shared location (database or memcache) to avoid scheduling the same job twice. Locks require maintenance on those locks - cleaning up stale locks from cronjobs that exited abnormally or got stuck in an infinite loop. What was a one-line cronjob can quickly balloon into a whole mess of pidfiles, locks, and cleanup code.

Cron problems are difficult to debug. The arcane syntax of crontab is terse to the point of near inscrutability, making it easy to accidentally schedule jobs at the wrong time. And the subtle differences between a cronjob’s shell environment and your command prompt’s shell environment can be maddening. Lack of feedback makes these or any other problem with your cronjobs difficult to diagnose.

Lastly, cronjobs have a tendency to be turn into a kind of poor-man’s background job solution. Check the crontab for any reasonably complex application and there’s a good chance you’ll see a one minute or five minute cronjob which looks in the database for work to be done. This can almost always be better done with a job queueing + workers system. Cron is for scheduling things, not doing them.

While cron will remain the ideal solution for system tasks like log rotation for some time to come, the above problems with application use of cron suggest that it might be time for a new scheduling solution for apps.

Cron Replacement Wishlist

My wishlist for a new app scheduling solution is:

Powerful and human-friendly syntax
Easy to test
Visibility
No difference between scheduler environment and one-off / test environment
Encourage use of a queueing system rather than doing the work directly in the scheduler
Scales without use of locks

Recently, the Flightcaster guys introduced me to resque-scheduler. With resque-scheduler, you make a yaml file of jobs to be scheduled. When each time specified is reached, the job will be queued via the Resque job queueing system.

What’s most interesting to me is that redis-scheduler runs in a standalone, long-running daemon process. Launch it like this:

$ rake resque:scheduler

The standalone process is an fascinating solution to the locks problem. Because there’s only one process, you don’t need any locks - an approach that sounds strikingly similar to the reasons for using async. A data format (yaml) rather than code prevents you from doing any work in the scheduler, since you can only specify the name of a job to queue. This enforces that the work will be done in the background workers, where they belong. Since the scheduler process does no heavy lifting, there are no scalability issues.

For diagnostic/debug visibility, set up logging and exception handling (e.g. Exceptional, Hoptoad) exactly like you would for your web or worker processes. resque-scheduler also provides some extensions to the Resque web UI (screenshots at the bottom of this page) for additional visibility and control.

Generalizing the Single-Process Scheduler

Resque-scheduler still uses a cron-style syntax for specifying when jobs will run; and Resque is not my favorite queueing system anyway (I prefer dedicated MQ backends like RabbitMQ, Kestrel, and Beanstalk). But the single-process scheduler idea implemented by resque-scheduler can easily be applied to other queueing systems. For example, you could use rufus-scheduler in combination with Minion+RabbitMQ to write a scheduler process for your app. In a file called scheduler.rb:

require 'rufus/scheduler'
require 'minion'

scheduler = Rufus::Scheduler.start_new
scheduler.every '5m' { Minion.enqueue('twitter.refresh') }
scheduler.every '3h' { Minion.enqueue('feeds.refresh')  }
scheduler.join

You’ve probably already defined or documented somewhere a list of processes needed to run your app. This may be one or more web processes (mongrel_cluster start, thin start, or unicorn start) and one or more worker processes (rake jobs:work, rake resque:work, or ruby minion.rb). Add to this list your new scheduler process:

ruby scheduler.rb >> log/scheduler.log 2>&1

Conclusion

While the single-process scheduler approach is still in its infancy, I believe it bears strong potential for the future of application cron.

a tornado of razorblades

Rethinking Cron

Cron Weaknesses

Cron Replacement Wishlist

Generalizing the Single-Process Scheduler

Conclusion