I’ve seen people asking on the Internet if there’s a way to get a list of the most bookmarked pages from a specific site. Some other social bookmarking services (like Digg) have this feature but del.icio.us doesn’t. This looked like an interesting problem so I decided to write a script that would crawl a given website and determine which pages have been bookmarked the most. That turned out to be more complex than I had thought.
Currently I have something that “kind-of” works, but it’s slooooow. This is mainly because I have to have a “grace period” between each query sent to del.icio.us to retrieve the bookmark count for an URL. If someone overuses the del.icio.us/url/[hash] feature by invoking it 10 times per second del.icio.us will temporarily block their IP. This means I have to wait a few seconds between each attempt to get the number of bookmarks, resulting in maybe a dozen pages checked per minute. Since many sites have hundreds of pages, getting a complete list of most bookmarked URLs can take ages.
The only solution I have come up so far is to have more than one server counting the bookmarks. I wrote a “count relay” script that would fetch the bookmark count on request. This PHP script can be placed on any site that supports PHP and has cURL installed. The script URL should then be added to my database and my server would use it to get the bookmark count (only once in a while, there’s a grace period for relay scripts, too + the more relays there are the less work any individual relay has to do).
Hmm, so how do I get people to set up those scripts on their sites? Or maybe there’s another solution?
If you want to see the bookmark counter you can visit http://w-shadow.com/shadowcounter/. Note that it’s a “work and progress”, “alpha”, “no warranties” and all the other things that basically mean “you’ve been warned”. As I said above it’s also really slow (however, all queries are saved and processed in background, meaning you could enter some domain name and come back a day later to see what the results are. Well, maybe 2 days. Ehh, maybe a week? Shouldn’t come to that.).
By the way, there’s currently a 1000 pages/domain limit to prevent the server from getting overwhelmed in case some wiseguy gets the idea to check how many bookmarks en.wikipedia.org has 😉Related posts :