Followers Scraping

For what it’s worth I thought I’d post the ruby script that lets me scrape my followers from Tumblr. Of course, you can get the tumblrs you are following considerably more easily, since there are formatting tags for that, but for some reason unknown to man I wanted to list all the people that are following me, rather than the other way around. (Perhaps because I’m astonished and grateful that anyone would want to do so; perhaps because I’m just awkward.)

So, here it is. You should be able to save your followers page to a file, then run this script passing the name of the file, and it should spew out HTML code to paste into your tumblr.

It uses _Why’s wonderful Hpricot library to parse the HTML. My understanding is that if you have Gem installed, it should load this for you automagically, but I wonder how well that works out in practice. Sorry about that.

#! /usr/bin/env ruby

require 'rubygems'
require 'hpricot'

def main(filename)
   begin
      doc = Hpricot(open(filename, "r"))
   rescue =>eurgh
      $stderr.puts "file read failed: #{eurgh}"
      return
   end

   doc.search("//a").each do |a|
      puts modlink(a) if testlink(a)
   end

end

def testlink(a)
   return a.search("img[@class='avatar']").any?
end

def modlink(a)
   href = a.attributes["href"]
   src = a.search("img").first.attributes["src"]

   src.gsub!(/followers_files/, "http://data.tumblr.com")
   src.gsub!(/128\..*/, "24.gif")
   src = "images/default_avatar_24.gif" if src == "http://data.tumblr.com/default_avatar_24.gif"

   return "<a style='avatar' href='#{href}'><img src='#{src}' /></a>"
end

if ARGV[0] == nil
   $stderr.puts("Usage: followparse.rb <saved followers page>")
else
   main(ARGV[0])
end

~ by shadowfirebird on May 21, 2008.

Leave a Reply

You must be logged in to post a comment.