Scrape Google Blog Search With PHP

March 28th, 2008

I’m currently lacking real “bloggable” ideas, so here’s something simple and hopefully useful – a PHP script to get the blog search results from Google. The script is provided strictly for educational purposes, blah blah blah. And by the way, if you only need the top X results, it would be simpler to use the […]

Continue Reading...

Free Tools for Natural Language Processing

March 13th, 2008

I’ve compiled a list of various Python modules and functions that I found most useful in certain Natural Language Processing tasks. For easier skimming, the list is grouped by NLP task, such as tokenization and tagging.

Continue Reading...

WordPress Version Survey

March 10th, 2008

A while ago I saw the blog version survey at BlogSecurity.net and got an idea to do my own. The previous survey is more than 8 months old and several new WordPress version have been released since then, so I think a new study is in order 🙂 I collected a large list of WordPress […]

Continue Reading...

Get Google Image Search Results With PHP

February 28th, 2008

Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. Here’s a simple PHP script that can parse and return the results of any Image Search query. For education purposes only.

Continue Reading...

Extracting The Main Content From a Webpage

January 25th, 2008

I’ve created a PHP class that can extract the main content parts from a HTML page, stripping away superfluous components like JavaScript blocks, menus, advertisements and so on. The script isn’t 100% effective, but good enough for many practical purposes. It can also serve as a starting point for more complex systems.

Continue Reading...