Scrape Google Blog Search With PHP
I’m currently lacking real “bloggable” ideas, so here’s something simple and hopefully useful – a PHP script to get the blog search results from Google. The script is provided strictly for educational purposes, blah blah blah.
And by the way, if you only need the top X results, it would be simpler to use the Blogsearch’s RSS feed and parse it with something like MagpieRSS. Or just a regexp.
Here’s the script :
function googleBlogsearch( $keyword, $page=1, $per_page=10, $sort_by_date=false ){ $url = 'http://www.google.com/blogsearch?hl=en&ie=UTF-8&sa=N&q='.urlencode($keyword); if ($per_page!=10) $url.="&num=$per_page"; if ($page!=1) $url.="&start=".(($page-1)*$per_page); if ($sort_by_date) $url.='&scoring=d'; $hc = new eHttpClient; $html = $hc->get($url); if (empty($html)) return false; if (function_exists('normalizeHtml')){ $html = normalizeHtml($html); } $results = array(); if (!preg_match_all('/<a href="([^"]+)" id="p-(\d+)">(.+?)<\/a>.+?<br>(.+?)<br>/si', $html, $matches, PREG_SET_ORDER) ) return $results; foreach($matches as $match){ $results[] = array( 'rank' => $match[2], 'url' => $match[1], 'title' => trim($match[3]), 'description' => trim($match[4]) ); } return $results; }
Have fun 🙂
Related posts :
hmm. funny..
Google has updated their blogsearch URL – this will work if you update it to:
blogsearch.google.com
Hope this saves someone else a few mins!
Api not longer