Applying the Unix Process Model to Web Apps

Mon May 09 09:22:53 -0700 2011

The unix process model is a simple and powerful abstraction for running server-side programs. Applied to web apps, the process model gives us a unique way to think about dividing our workloads and scaling up over time.

Process model basics

Let’s begin with a simple illustration of the basics of the process model, using a well-known unix daemon: memcached.

Download and compile it:

$ wget http://memcached.googlecode.com/files/memcached-1.4.5.tar.gz
$ tar xzf memcached-1.4.5.tar.gz 
$ cd memcached-1.4.5
$ ./configure
$ make

Run the program:

$ ./memcached -vv
...
<17 server listening (auto-negotiate)
<18 send buffer was 9216, now 3728270

This running program is called a process.

Running manually in a terminal is fine for local development, but in a production deployment we want memcached to be a managed process. A managed process should run automatically when the operating system starts up, and should be restarted if the process crashes or dies for any reason.

We can use a process manager to put processes under management. There are many process managers, but operating systems usually have defaults. On OS X, launchd is the built-in process manager; on Ubuntu, Upstart the built-in process manager.

Let’s set up memcached to run as a managed process on Ubuntu. Write an Upstart config:

/etc/init/memcached.conf

description "Memcached"
exec /usr/bin/memcached >> /var/log/memcached.log
start on runlevel [345]
respawn

We can now tell Upstart to start our process for the first time:

$ start memcached
memcached start/running, process 1212

The memcached process is now running in the background, managed by the process manager, with its output stream going to /var/log/memcached.log.

Now that we’ve established a baseline for the process model, we can put its principles to work in more novel way: running a web app.

Mapping the unix process model to web apps

A server daemon like memcached has a single entry point, meaning there’s only one command you run to invoke it. Web apps, on the other hand, typically have two or more entry points. Each of these entry points can be called a process type.

A basic Rails app will typically have two process types: a Rack-compatible web process (such as Webrick, Mongrel, or Thin), and a worker process using a queueing library (such as Delayed Job or Resque). For example:

Process type	Command
web	bundle exec rails server
worker	bundle exec rake jobs:work

A basic Django app looks strikingly similar: a web process can be run with the manage.py admin tool; and background jobs via Celery.

Process type	Command
web	python manage.py runserver
worker	celeryd --loglevel=INFO

Process types differ for each app. For example, some Rails apps use Resque instead of Delayed Job, or have multiple types of workers. Every app needs to declare its own process types.

Declaration of process types is conceptually similar to declaration of dependencies. In the Ruby world, Gem Bundler and the Gemfile give us a declarative, canonical way to specify the gem dependencies for an app. We need the equivalent of Gemfile and Bundler, but for process types.

Procfile, a format to declare your process types

Procfile is an extremely simple file format which allows you to declare the process types your app uses. Its format is one process type per line, with each line formatted as:

<process type>: <command>

A Rails app might have a Procfile like this:

web:    bundle exec rails server -p $PORT
worker: bundle exec rake jobs:work

One purpose for this is structured documentation - a developer can view the Procfile to see the app’s process architecture, just as they can view the Gemfile to see its dependencies. But the greater utility of Procfile lies in our ability to parse the file and run the app’s processes automatically.

Foreman, a process manager for local development

Foreman is a handy command-line tool written by David Dollar. It reads your Procfile and runs one process for each process type declared by your app.

Install it:

$ gem install foreman

If you’ve written a Procfile (such as the one shown in the previous section) and put it in the root of your app, you can now run it like this:

Foreman runs one process for each process type that we’ve declared. Once running, the output streams for each running process are conveniently interleaved in the foreground on our terminal. Each line is prefixed with a timestamp and the name of the running process, and color-coded by which process emitted which line.

Foreman is a process manager in the same sense as launchd or Upstart, but tailored to the needs of app development. It runs only a single app at a time, with all processes in the foreground, and terminates if any process crashes or if you press Ctrl-C.

Using Procfile for deployment

Bundler has a --deployment command-line option, allowing you to use your app’s Gemfile to set up gems on your production server. Procfile and Foreman can be used in a similar fashion, using a feature of Foreman to export to a process manager format of your choice.

For example, let’s deploy a Procfile-backed Rails app to an Ubuntu server, selecting Upstart as the export format. As root, run the following from wherever your Procfile is located:

$ foreman export upstart /etc/init
[foreman export] writing: /etc/init/myapp.conf
[foreman export] writing: /etc/init/myapp-web.conf
[foreman export] writing: /etc/init/myapp-web-1.conf
[foreman export] writing: /etc/init/myapp-worker.conf
[foreman export] writing: /etc/init/myapp-worker-1.conf
$ start
myapp start/running, process 28572

Your app is now running as two managed processes. You can use all of Upstart’s control capabilities, such as restarting the app when deploying a new release of your code:

$ restart myapp
myapp start/running, process 28591

Process types vs processes

To scale up, we’ll want full grasp of the relationship between process types and processes.

A process type is the prototype from which one or more processes are instantiated. This is similar to the way a class is the prototype from which one or more objects are instantiated in object-oriented programming.

Here’s a visual aid showing the relationship between processes (on the vertical axis) and process types (on the horizontal axis):

Processes, on the vertical axis, are scale. You increase this direction when you need to scale up your concurrency for the type of work handled by that process type. Foreman lets you specify concurrency for each process type when you export with the -c option. To get a process formation matching the diagram, you’d use this command:

$ foreman export upstart /etc/init -c web=2 -c worker=4 -c clock=1

Process types, on the horizontal axis, are workload diversity. Each process type specializes in a certain type of work.

For example, some apps have two types of workers, one for urgent jobs and another for long-running jobs. By subdividing into more specialized workers, you can get better responsiveness on your urgent jobs and more granular control over how to spend your compute resources.

Scheduling work at a certain time of day (e.g., the equivalent of cron) can be achieved with a specialized process type: a library like resque-scheduler or Clockwork can be run as a singleton process for a very flexible cron replacement. Consuming the Twitter streaming API is another type of specialized work best served by a singleton process.

Pulling all of these potential use cases together, here’s an example of a Procfile for an app with five process types: a Sinatra web app, two types of Resque workers, a singleton clock with Clockwork, and a singleton ruby script consuming the Twitter streaming API:

web:          bundle exec ruby web.rb -p $PORT
fastworker:   QUEUE=urgent bundle exec rake resque:work
slowworker:   QUEUE=*      bundle exec rake resque:work
clock:        bundle exec clockwork clock.rb
tweetscan:    bundle exec ruby tweetscan.rb

When we run this Procfile with Foreman, we’ll give five processes - one for each process type. In production, we can use Foreman’s concurrency argument to fan out to dozens or even hundreds of running processes, potentially spread out across multiple machines.

Conclusion

The unix process model is a powerful way to approach running your web app. Procfile gives us a way to declare process types, and Foreman gives us an easy way to run the app’s processes in both development and deployment environments.

a tornado of razorblades