Attack of the Twitter Bots

I recently wrote my own URL-redirection service at kbhr.co, and a quick surprise when using my own service is how many “Twitter Bots” are out there. When I used goo.gl for my URL-redirects, it didn’t show data for any of these, but once I made my first post using kbhr.co, I found that at least 80% of the click-thrus were from bots, and many of them happen immediately.

I use MongoDB for my kbhr.co database, and this is a little peek at the “user-agent” field from the headers of the bots that followed one link I posted:

> db.url_visitors.find( {uri: "5"}, {userAgent: 1, _id: 0} )

{ "userAgent" : "InAGist URL Resolver (http://inagist.com)" }
{ "userAgent" : "Twitterbot/1.0" }
{ "userAgent" : "Twitterbot/1.0" }
{ "userAgent" : "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8" }
{ "userAgent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:19.0) Gecko/20100101 Firefox/19.0" }
{ "userAgent" : "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "peerindex" }
{ "userAgent" : "MetaURI API/2.0 +metauri.com" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "Java/1.6.0_26" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" }
{ "userAgent" : "JS-Kit URL Resolver, http://js-kit.com/" }
{ "userAgent" : "ShowyouBot (http://showyou.com/crawler)" }
{ "userAgent" : "Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)" }
{ "userAgent" : "Java/1.7.0_25" }
{ "userAgent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) Contact: feedback@getprismatic.com" }
{ "userAgent" : "Ruby" }
{ "userAgent" : "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)" }
{ "userAgent" : "Mozilla/5.0 (Digg/1.0; support@digg.com)" }
{ "userAgent" : "Ruby" }
{ "userAgent" : "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0" }
{ "userAgent" : "newsme/1.0; feedback@news.me" }
{ "userAgent" : "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0" }
{ "userAgent" : "Java/1.6.0_20" }
{ "userAgent" : "PulseCrawler/1.1 (+http://www.pulse.me/) AppEngine-Google; (+http://code.google.com/appengine; appid: s~hr-pulsesubscriber)" }
{ "userAgent" : "Mozilla/5.0 (compatible; TweetedTimes Bot/1.0; +http://tweetedtimes.com)" }
{ "userAgent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) Contact: feedback@getprismatic.com" }
{ "userAgent" : "Mozilla/5.0 (compatible; TweetedTimes Bot/1.0; +http://tweetedtimes.com)" }
{ "userAgent" : "Mozilla/5.0 (compatible; TweetedTimes Bot/1.0; +http://tweetedtimes.com)" }
{ "userAgent" : "peerindex" }
{ "userAgent" : "InAGist URL Resolver (http://inagist.com)" }
{ "userAgent" : "Java/1.6.0_26" }
{ "userAgent" : "JS-Kit URL Resolver, http://js-kit.com/" }
{ "userAgent" : "ShowyouBot (http://showyou.com/crawler)" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "LongURL API" }
{ "userAgent" : "NING/1.0" }
{ "userAgent" : "Java/1.6.0_35" }
{ "userAgent" : "jack" }
{ "userAgent" : "LongURL API" }
{ "userAgent" : "Java/1.6.0_35" }
{ "userAgent" : "LongURL API" }
{ "userAgent" : "newsme/1.0; feedback@news.me" }
{ "userAgent" : "Java/1.6.0_20" }
{ "userAgent" : "Java/1.6.0_20" }

(There were other hits from services like Prismatic and Flipboard, but I removed those because I couldn’t tell whether they were coming from bots, or actual users.)

Discussion

All of these bot hits are from one URL I posted on Twitter. There were 65 total click-thrus, including these 46 bot clicks. Without looking at more fields in the data, the things that stand out are all the “NING” bots, and also that services like topsy, prismatic, and tweetedtimes hit the same URL several times for some reason.

As I’ve written before, I usually get 15 clicks from bots before I can even switch from my Twitter browser window over to my kbhr.co window to see the click-thrus, and they’re all from Twitter bots like these.

More data this weekend

I’m going to run some experiments this weekend to see how the bots vary when I used different tags, like #java, #zen, etc. Come back here on Monday to see the results of those experiments.