Get Google Image Search Results With PHP

Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS ;)

PHP Script for Google Image Search

Note that this script requires the eHttpClient cURL class by 5ubliminal.

function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){
    $page--;
    $perpage = 21;
    $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N",
        $dc,urlencode($query),$page*$perpage,$safe);

    $hc=new eHttpClient();
    $hc->setReferer("http://".$dc."/");
    $html=$hc->get($url);
    $code = $hc->getInfo(CURLINFO_HTTP_CODE);
    if ($code != '200') return false;

    if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER))
        return array();
    $results=array();
    foreach($matches as $match){
        if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue;

        if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/',
		$parts[1][0], $url_parts)
	) continue;
        $refUrl = urldecode($url_parts[1]);
        $height = intval($url_parts[2]);
        $width = intval($url_parts[3]);
        $rank = intval($url_parts[5]);
        //check if we've already passed the last page of results
        if($rank < ($page * $perpage + 1)) break;
        $imgUrl = urldecode($parts[1][3]);
        $refDomain = $parts[1][11];
        $imgText = $parts[1][6];
        $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText);
        $imgText = strip_tags(html_entity_decode($imgText));
        $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl;

        $one_result=array(
            'Rank' => $rank,
            'RefUrl' => $refUrl,
            'ImgText' => $imgText,
            'ImgUrl' => $imgUrl,
            'Height' => $height,
            'Width' => $width,
            'Host' => $refDomain,
            'ThumbUrl' => $thumbUrl,
        );
        array_push($results,$one_result);
    }
    return $results;
}

How To Use It

I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).

Here’s an example –

$results = googleImageResults('headcrab', 1);
print_r($results);

The output looks something like this –

Array
(
    [0] => Array
        (
            [Rank] => 1
            [RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/
            [ImgText] => Headcrab-Chappe
            [ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
            [Height] => 297
            [Width] => 450
            [Host] => bjoern.amherd.net
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
        )

    [1] => Array
        (
            [Rank] => 2
            [RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15
            [ImgText] => The Common Headcrab
            [ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg
            [Height] => 423
            [Width] => 750
            [Host]=>www.penny-arcade.com
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg
        )
   ......

Notes And Caveats

  • The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader :P
  • As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
  • If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.
  • Whatever you do, don’t do this.

Disclaimer

Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.

Related posts :

44 Responses to “Get Google Image Search Results With PHP”

  1. flowmymo says:

    forgot to mention that the “fixed” version returns the links to the google thumbs not the original image.

  2. ahmet says:

    The code returns an empty array without an error. Is that code still valid?

  3. Ebbot says:

    The fixed version obviously have some coding errors.
    -while($i1), when $i1 is not defined.
    -the curly brackets are mismatched.

    Can you please post a version that is working?

  4. luis says:

    acctually not working!!!! please fix it!!!

Leave a Reply