Home

Awesome

Comparing performance of Twitter Streaming API libraries

Streaming data from Twitter at high volume is a a big part of what I do on a lot of my projects, most notably emojitrack-feeder, so I wanted to compare the overhead of various Twitter Streaming API clients.

My hopes is I will not only learn something about good options for future potential platform migrations for emojitrack-feeder, but also shed some light on areas of potential improvements in all the libraries, since I would love to help the overall ecosystem improve as much as possible.

Compared:

In general what all these libraries do is connect to the Twitter Streaming API via HTTPS/OAuth, stream a buffer of JSON messages, and deserialize those messages into tweet objects and status messages. They have varying degrees of event handling for the myriad of conditions the Twitter Streaming API can throw at you.

NOTE: This is still in progress! I'm still working out some kinks, so please don't consider this "final" just yet.

Feature Comparison

LibraryGZipDelimitedRateWarnStallWarnEmojiData
rb-tweetstreamxx
rb-twitterx (1)x (2)x (3)
js-twitxx (4)
js-nodetwitterxxxx
go-twitterstreamxxxx
ex-twitterxx
java-hbcx

Definitions:

Categories with an asterisk are required to be a real candidate for future versions of emojitrack-feeder. (The others are nice to have that could contribute to performance.)

Notes:

  1. Now being tracked for v6.0, see sferik/twitter#650
  2. Inquired about in sferik/twitter#631.
  3. Requested in sferik/twitter#649, now being tracked for v6.0.
  4. PR adding this in ttezel/twit#150, which I also compare in my benchmarks.

Benchmarks

My test programs connect to the Twitter Streaming API and do a filter keyword track on the ten most common words in the English language (which should be more than plenty to saturate results!) and then disconnect and quit once they have processed 250,000 tweets.

I try to emulate periodic stats output of emojitrack-feeder while running.

It's worth noting my Twitter dev account has elevated partner access, so if you are running these benchmarks on your own key you will probably get different results.

Summary

Results from a quad-core 4.0GHz Core i7 machine with 32GB of RAM on a 100Mbps fiber internet connection.

LibraryUserSysCPUMemoryMsgReceivedBWApprox
rb-tweetstream43.463.5529%44144465857~0.94 GB
rb-twitter (MRI)40.266.6728%564002730766~0.94 GB
rb-twitter (JRuby)60.914.8739%349080463234~0.95 GB
js-twit72.633.9647%93688410594~0.94 GB
js-mroth/twit#perf32.774.6123%93972515569~0.95 GB
js-nodetwitter32.084.6422%93224534422~0.95 GB
go-twitterstream57.868.6440%10468439641~0.35 GB
ex-twitter119.2316.4283%49840832032~0.96 GB
scala-hbc87.697.5456%5388762654263~0.35 GB

Decoding the output:

Full benchmark output

Full output of the tests runs (with more stats) can be found in the RESULTS.txt file.

Conclusions/Thoughts (In Progress)

Shameless self-promotion

If you enjoy these sort of benchmarks / programs and find them useful, please consider following me on GitHub or Twitter to see when I post new projects. :v::man::v:

And of course, the reason I care about these benchmarks personally is :dizzy:Emojitracker, which you should totally check out too.