Get Google Image Search Results With PHP

Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS 😉

PHP Script for Google Image Search

Note that this script requires the eHttpClient cURL class by 5ubliminal.

function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){
    $page--;
    $perpage = 21;
    $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N",
        $dc,urlencode($query),$page*$perpage,$safe);

    $hc=new eHttpClient();
    $hc->setReferer("http://".$dc."/");
    $html=$hc->get($url);
    $code = $hc->getInfo(CURLINFO_HTTP_CODE);
    if ($code != '200') return false;

    if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER))
        return array();
    $results=array();
    foreach($matches as $match){
        if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue;

        if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/',
		$parts[1][0], $url_parts)
	) continue;
        $refUrl = urldecode($url_parts[1]);
        $height = intval($url_parts[2]);
        $width = intval($url_parts[3]);
        $rank = intval($url_parts[5]);
        //check if we've already passed the last page of results
        if($rank < ($page * $perpage + 1)) break;
        $imgUrl = urldecode($parts[1][3]);
        $refDomain = $parts[1][11];
        $imgText = $parts[1][6];
        $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText);
        $imgText = strip_tags(html_entity_decode($imgText));
        $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl;

        $one_result=array(
            'Rank' => $rank,
            'RefUrl' => $refUrl,
            'ImgText' => $imgText,
            'ImgUrl' => $imgUrl,
            'Height' => $height,
            'Width' => $width,
            'Host' => $refDomain,
            'ThumbUrl' => $thumbUrl,
        );
        array_push($results,$one_result);
    }
    return $results;
}

How To Use It

I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).

Here’s an example –

$results = googleImageResults('headcrab', 1);
print_r($results);

The output looks something like this –

Array
(
    [0] => Array
        (
            [Rank] => 1
            [RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/
            [ImgText] => Headcrab-Chappe
            [ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
            [Height] => 297
            [Width] => 450
            [Host] => bjoern.amherd.net
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
        )

    [1] => Array
        (
            [Rank] => 2
            [RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15
            [ImgText] => The Common Headcrab
            [ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg
            [Height] => 423
            [Width] => 750
            [Host]=>www.penny-arcade.com
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg
        )
   ......

Notes And Caveats

  • The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader 😛
  • As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
  • If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.

Disclaimer

Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.

Related posts :

45 Responses to “Get Google Image Search Results With PHP”

  1. flowmymo says:

    forgot to mention that the “fixed” version returns the links to the google thumbs not the original image.

  2. ahmet says:

    The code returns an empty array without an error. Is that code still valid?

  3. Ebbot says:

    The fixed version obviously have some coding errors.
    -while($i1), when $i1 is not defined.
    -the curly brackets are mismatched.

    Can you please post a version that is working?

  4. luis says:

    acctually not working!!!! please fix it!!!

  5. d3bug says:

    In your code above posted on april 18 2011, has an extra curly bracket and you are clearly returning the $matches outside of the function which is fine but this code implies that there is yet another function that encapsulates this. There is a variable in the for loop which doesnt even exist. Lastly, I am sure you did this to stump the newbs, but query variable is also completely gone.

    Below is the fixed version but still not sure why the results are returned OUTSIDE of the googleImageResults function. Can you repost the actually working code so I can verify what I have found?

Leave a Reply