If you’ve been using polling to track Twitter search terms (totally random example), you may have wondered if there is a more efficient and reliable method. The Twitter streaming API is a potential solution.
Try out the sample stream with curl:
$ curl http://stream.twitter.com/1/statuses/sample.json -uYOUR_TWITTER_USERNAME:YOUR_PASSWORD
Track a term in realtime, like “ruby”:
$ curl http://stream.twitter.com/1/statuses/filter.json?track=ruby -uYOUR_TWITTER_USERNAME:YOUR_PASSWORD
How do you integrate this into a Ruby app? Standard HTTP clients such as RestClient and HTTParty aren’t appropriate, since they’re designed for atomic HTTP requests, not streaming. With this API, you want to keep the socket open indefinitely, decoding JSON one line at a time.
Async I/O is the right tool for this job. Here’s an example script using Ilya Grigorik’s evented HTTP client. Install the em-http-request gem, then:
require 'eventmachine'
require 'em-http'
require 'json'
usage = "#{$0} <user> <password>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
url = 'http://stream.twitter.com/1/statuses/sample.json'
def handle_tweet(tweet)
return unless tweet['text']
puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
end
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
handle_tweet JSON.parse(line)
end
end
end
Run this at the command line with your Twitter username and password as arguments, and it will start printing out results. In a real app, you’d replace the body of handle_tweet with code to do something like inserting the result into your database.
Note that, even in a production app, you should never run more than one of these processes. It’s a background worker of sorts; you can think of the open socket as a queue that’s delivering jobs. But since this queue can’t split the work among multiple workers, you’re limited to just one.