Extracting The Main Content From a Webpage

January 25th, 2008

I’ve created a PHP class that can extract the main content parts from a HTML page, stripping away superfluous components like JavaScript blocks, menus, advertisements and so on. The script isn’t 100% effective, but good enough for many practical purposes. It can also serve as a starting point for more complex systems. […] Continue Reading…


(Im)Practical Voice Commands

January 17th, 2008

A few days ago I saw an IRC quote that went basically like this :
When voice control interfaces finally go mainstream, the very first thing they need to do is make “Oh fuck!!!” immediately abort and undo the latest command or task.

Sounds fun. Below is my lame attempt […] Continue Reading…


Be Unique. Or Else.

January 14th, 2008

Randall Munroe, of XKCD fame, just posted a very interesting blag entry (sic) about an interesting way to ensure that IRC discussion remain unique and thoughtful. The idea itself is very simple, and I sure did get the slightly annoying feeling of “damn-this-is-obvious” when reading the post. The trick […] Continue Reading…


How To Highlight Nofollow With Opera & More SEO Tools

January 9th, 2008

A review of a free Opera plugin that can highlight nofollow links, display Alexa rank and PageRank, show the number of backlinks that a page or a domain has, and more. Basically it is a full-featured SEO toolbar for the Opera web browser. […] Continue Reading…


Why Nofollow Didn’t Work – A Different Perspective

January 7th, 2008

It is no secret that the introduction of the rel=nofollow completely failed to stop, or at least decrease link spam on blogs, forums and similar sites. The possible reasons, and the negative effects of Nofollow on legitimate users, have been discussed to death, but still I feel there is one aspect of the problem that has been overlooked. […] Continue Reading…