At the beginning of the week, I officially launched a little project named MicroBlogBuzz. The concept is simple: find URLs on micro-blogging platforms, and present the top blogged ones.
For those not familiar with micro-blogging, it’s pretty much like blogging, but smaller — hence the micro prefix. Micro-blogging posts are very short, 140-160 characters that is. The most popular platform is Twitter, but new platforms are appearing, such as Pownce, Jaiku and Identica. Also, this year’s TechCrunch50 winner is the project oriented micro-blogging platform Yammer (which I didn’t get to try yet).
Back to MicroBlogBuzz, I started this little project part time while testing various APIs, and when stumbled on TwitterBuzz, which presents the most popular links on Twitter. However, TwitterBuzz only present the domain name, which isn’t really meaningful since most micro-bloggers use TinyURL or similar services to shorten their URL. TinyURL was presented as the Top Twitter link. And I thought it was kind of stupid. So I decided to do the same thing, but to follow the HTTP redirections to get to the final URL.
I got surprised by the quantity of data that I would collect. Around 1200 links every 15 minutes, and over 330,000 links and 400,000 comments in five days. I quickly ran into problems, as I built my database on InnoDB and foreign keys to keep things clean, and with this amount of data and the small server it runs on, well it can’t be clean and fast at the same time.
So I switched to MyISAM. But still, it wasn’t enough. And I added caching and smart HTTP headers. But still, it wasn’t enough. And I added preprocessing. And it was ok. At least for now, with my small 500 visitors per day.
Feel free to send me your comments and suggestions, by email, or even better, on Twitter
If you like this article, leaving a comment, tweeting ofr liking it is always appreciated.