When we say that an http request takes 300ms on the server side, that number actually encompasses two phases. One phase is the time the request spends in the backlog, waiting for an available backend process (mongrel/thin/dyno). The second phase is how long it takes to process the request once it reaches the backend process. Conflating these two is a mistake, as it masks important information about what steps you can take to make your app able to handle more traffic.
You can see the difference between wait time and processing time through a simple experiment: create an app with a known wait time, run it on a single-threaded server, and then launch two simultaneous requests against it.
We’ll try this be making a Sinatra app in a rackup file named wait.ru:
require 'sinatra'
get '/' do
sleep 1
"request complete\n"
end
run Sinatra::Application
Start it using Thin:
$ thin start -R wait.ru
>> Thin web server (v1.0.0 codename That's What She Said)
...
In another terminal, execute two requests:
$ for i in 1 2; do (time curl http://localhost:3000 &); done
request complete
real 0m1.006s
user 0m0.001s
sys 0m0.003s
request complete
real 0m2.005s
user 0m0.001s
sys 0m0.002s
Here we can see that the first request took a smidge over one second, which is what we’d expect from the “sleep 1” in our server-side code. But then the second request takes two seconds. That request didn’t take two seconds to process: it was sitting in the backlog for one second, waiting for the first request to finish; and then it was itself processed during the 2nd second.
You can try this with three, four, or five requests to see the pattern repeat. Each request will spend that much longer in the backlog, waiting in line for the other requests to finish.
By understanding and tracking time spent in the backlog (wait time) with time spent processing (request time), you’ll have better information about what steps to take in order to scale the app.
If the wait time is long (or really, anything much above zero), then you’re lacking in concurrency. Increase the number of dynamic web processes you’re running, by editing mongrel_cluster.yml, thin.yml, or increasing your dyno slider. In some cases, you may need to add a new machine - more than 3 or 4 web processes per CPU core probably won’t help much.
But if wait time is zero and the request time is long (anything over 200ms), then you can add new mongrels, thins, or dynos to the mix all you want and it won’t make a lick of difference. Perhaps you’re making an API call to an external service, sending an email, or resizing an image. In that case, what you want to do is move the work into a work queue, so that your web request can return quickly. Or maybe just some good old-fashioned optimization, like optimizing your database queries or using memcached.
These two values aren’t totally independent - speeding up your request time might eliminate your backlog simply because it improves throughput. So faster requests is always a good thing; but generally it’s easier to increase concurrency. But that option will only help if you know you are often running a backlog.