Scrape Google Blog Search With PHP

I’m currently lacking real “bloggable” ideas, so here’s something simple and hopefully useful – a PHP script to get the blog search results from Google. The script is provided strictly for educational purposes, blah blah blah.

And by the way, if you only need the top X results, it would be simpler to use the Blogsearch’s RSS feed and parse it with something like MagpieRSS. Or just a regexp.

Here’s the script :

function googleBlogsearch( $keyword, $page=1, $per_page=10, $sort_by_date=false ){
	$url = 'http://www.google.com/blogsearch?hl=en&ie=UTF-8&sa=N&q='.urlencode($keyword);
	if ($per_page!=10) $url.="&num=$per_page";
	if ($page!=1) $url.="&start=".(($page-1)*$per_page);
	if ($sort_by_date) $url.='&scoring=d'; 
	
	$hc = new eHttpClient;
	$html = $hc->get($url);
	if (empty($html)) return false;
	if (function_exists('normalizeHtml')){
		$html = normalizeHtml($html);
	}
		
	$results = array();
	if (!preg_match_all('/<a href="(&#91;^"&#93;+)" id="p-(\d+)">(.+?)<\/a>.+?<br>(.+?)<br>/si', 
		$html, $matches, PREG_SET_ORDER)
	) return $results;
	
	foreach($matches as $match){
		$results[] = array(
			'rank' => $match[2],
			'url' => $match[1],
			'title' => trim($match[3]), 
			'description' => trim($match[4])
		);
	}
	
	return $results;	
}

Have fun 🙂

Related posts :

3 Responses to “Scrape Google Blog Search With PHP”

  1. Anterse says:

    hmm. funny..

  2. Interested Party says:

    Google has updated their blogsearch URL – this will work if you update it to:
    blogsearch.google.com

    Hope this saves someone else a few mins!

Leave a Reply