Consuming the Twitter Streaming API

eventedio eventmachine twitter

Fri Mar 19 11:01:54 -0700 2010

If you’ve been using polling to track Twitter search terms (totally random example), you may have wondered if there is a more efficient and reliable method. The Twitter streaming API is a potential solution.

Try out the sample stream with curl:

$ curl http://stream.twitter.com/1/statuses/sample.json -uYOUR_TWITTER_USERNAME:YOUR_PASSWORD

Track a term in realtime, like “ruby”:

$ curl http://stream.twitter.com/1/statuses/filter.json?track=ruby -uYOUR_TWITTER_USERNAME:YOUR_PASSWORD

How do you integrate this into a Ruby app? Standard HTTP clients such as RestClient and HTTParty aren’t appropriate, since they’re designed for atomic HTTP requests, not streaming. With this API, you want to keep the socket open indefinitely, decoding JSON one line at a time.

Async I/O is the right tool for this job. Here’s an example script using Ilya Grigorik’s evented HTTP client. Install the em-http-request gem, then:

require 'eventmachine'
require 'em-http'
require 'json'

usage = "#{$0} <user> <password>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift

url = 'http://stream.twitter.com/1/statuses/sample.json'

def handle_tweet(tweet)
  return unless tweet['text']
  puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
end

EventMachine.run do
  http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }

  buffer = ""

  http.stream do |chunk|
    buffer += chunk
    while line = buffer.slice!(/.+\r?\n/)
      handle_tweet JSON.parse(line)
    end
  end
end

Run this at the command line with your Twitter username and password as arguments, and it will start printing out results. In a real app, you’d replace the body of handle_tweet with code to do something like inserting the result into your database.

Note that, even in a production app, you should never run more than one of these processes. It’s a background worker of sorts; you can think of the open socket as a queue that’s delivering jobs. But since this queue can’t split the work among multiple workers, you’re limited to just one.