Digg : Some Useless Comment Stats

As part of a little side project, I’ve written a script that determines the most “dugg” and most “buried” Digg.com comments for the day. It can also find the discussion thread that contains the most diggs. Only popular submissions are checked for comments.

I’ve put it up at TheBestOfDigg.com. The idea is to post the best/worst comments on the blog every day. The process is mostly automated, but there’s still some finetuning to do.

I’m also going to collect comment data about the last year or so and calculate some mildly interesting stats, like Kris did last year with his list(s) of top duplicate Digg comments. There’s also a list of hand-picked meme’s on another site.

Aside from the slim chance of attracting readers/visitors, this would also have practical uses. For example, I might find –

  • Failsafe comments – always get dugg. Helps craft digg-able content… and write spambots?.
  • Controversial issues – phrases and topics that get both a lot of diggs and a lot of buries.
  • Comments that are worthy of their own article(s).

P.S.
And when I’m done with that, there are some WordPress plugin ideas I’d like to implement. And maybe get a VPS for my DeviantArt recommendation engine.

Related posts :

2 Responses to “Digg : Some Useless Comment Stats”

  1. Lars says:

    Interesting idea. Isn’t that script taking up a lot of bandwidth? I love useless stats like that. sometimes I feel like I spend more time writing the code for showing me the stats on the projects I work on, than the actual script..

    Is that just me?! πŸ™‚

    Looking forward to when you will tell more about the sideproject or when you get that DA recommendation engine up and running πŸ™‚

  2. White Shadow says:

    Yes, it takes a lot of bandwidth. In fact, I still haven’t finished downloading the data. Might take a few more days.

    That particular sideproject will likely remain secret πŸ˜› As for the recommendation engine, I need to find a way to significantly optimize it before I can put it onlin. With only 156 200 favorites stored, it took around 18 minutes to pre-compute the needed correlations (AKA “links” between deviations, of which there are now 5 441 350). That might not seem too long but there are currently only 2-3 active users (44 000 being analyzed in total), and I’m afraid the time would grow exponentially as more are added. That is, until I index all of DA users, if that ever happens πŸ™‚

    I hear ServInt is good for VPS, hmm….

Leave a Reply