Get Google Image Search Results With PHP

Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS 😉

PHP Script for Google Image Search

Note that this script requires the eHttpClient cURL class by 5ubliminal.

function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){
    $page--;
    $perpage = 21;
    $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N",
        $dc,urlencode($query),$page*$perpage,$safe);

    $hc=new eHttpClient();
    $hc->setReferer("http://".$dc."/");
    $html=$hc->get($url);
    $code = $hc->getInfo(CURLINFO_HTTP_CODE);
    if ($code != '200') return false;

    if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER))
        return array();
    $results=array();
    foreach($matches as $match){
        if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue;

        if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/',
		$parts[1][0], $url_parts)
	) continue;
        $refUrl = urldecode($url_parts[1]);
        $height = intval($url_parts[2]);
        $width = intval($url_parts[3]);
        $rank = intval($url_parts[5]);
        //check if we've already passed the last page of results
        if($rank < ($page * $perpage + 1)) break;
        $imgUrl = urldecode($parts[1][3]);
        $refDomain = $parts[1][11];
        $imgText = $parts[1][6];
        $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText);
        $imgText = strip_tags(html_entity_decode($imgText));
        $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl;

        $one_result=array(
            'Rank' => $rank,
            'RefUrl' => $refUrl,
            'ImgText' => $imgText,
            'ImgUrl' => $imgUrl,
            'Height' => $height,
            'Width' => $width,
            'Host' => $refDomain,
            'ThumbUrl' => $thumbUrl,
        );
        array_push($results,$one_result);
    }
    return $results;
}

How To Use It

I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).

Here’s an example –

$results = googleImageResults('headcrab', 1);
print_r($results);

The output looks something like this –

Array
(
    [0] => Array
        (
            [Rank] => 1
            [RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/
            [ImgText] => Headcrab-Chappe
            [ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
            [Height] => 297
            [Width] => 450
            [Host] => bjoern.amherd.net
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
        )

    [1] => Array
        (
            [Rank] => 2
            [RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15
            [ImgText] => The Common Headcrab
            [ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg
            [Height] => 423
            [Width] => 750
            [Host]=>www.penny-arcade.com
            [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg
        )
   ......

Notes And Caveats

  • The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader 😛
  • As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
  • If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.

Disclaimer

Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.

Related posts :

45 Responses to “Get Google Image Search Results With PHP”

  1. abedi98 says:

    hi , i can not use google api , JUST PHP + CURL 😀

    New Warning :

    Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 207

    Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 210

    Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 118
    Array ( )

  2. abedi98 says:

    this is my source ( function + class )

    link : http://www.doctorsina.com/google_image.zip

  3. abedi98 says:

    hi , plz check this file : http://www.doctorsina.com/google_image.zip

    (your function + httpclient Class)

  4. abedi98 says:

    alooooooooooooooooooooooooooooooooooooo plz check this file

  5. cadoad says:

    humm did this script stopped working ?
    it was working till a couple weeks ago i believe 🙂

  6. White Shadow says:

    Eh, Google probably changed their results HTML again and that made the script break. You could probably fix it by tweaking the regexps.

  7. Pix says:

    Hi all!
    sorry for my english and for my stupid question…

    where is the variable where i put the words to find?

    Pix

  8. madi says:

    excuse me. i cant download the eHttpClient cURL class file, please give a proper link i realy need this code.
    wating for a +ve relpy.

  9. White Shadow says:

    Looks like the site that hosted it is down. Here’s a backup.

  10. Almas says:

    Any have working script now? Google again changed html pattern?!
    Plz reply 🙂

  11. Shane Rutter says:

    Can some people update this script, google has changed there site again and it has stopped working.

  12. abdel says:

    Please i want this script but he does’t work, i have juste

    Array ( )

  13. thanks for the great info

  14. abdel says:

    Are you going to solve the problem?

  15. Ankit Shah says:

    $matches is not defined i got error & blank array
    It returns same
    Array ( )

  16. SZA says:

    this script is not working?

  17. flowmymo says:

    fixed version, but less options:

    function googleImageResults($query, $page=1, $safe=’off’, $dc=”images.google.com”){
    $page–;
    $perpage = 21;
    $url=sprintf(“http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N”,
    $dc,urlencode($query),$page*$perpage,$safe);

    $hc=new eHttpClient();
    $hc->setReferer(“http://”.$dc.”/”);
    $html=$hc->get($url);
    $code = $hc->getInfo(CURLINFO_HTTP_CODE);
    if ($code != ‘200’) return false;
    $i=0;
    while($i1){
    $image_url_1=explode(““,$split[1]);
    $image_url_2=explode(‘”‘,$image_url_1[0]);
    $height=$image_url_2[2];
    $width=$image_url_2[4];
    $matches[]=array(“http://t$i.gstatic”.$image_url_2[0],$height,$width);
    }
    $i+=1;
    }
    $results=$matches;
    return $results;
    }

  18. Gokul says:

    It displays error Fatal error: Class ‘eHttpClient’ not found

  19. Gokul says:

    what can i do change in this script

  20. flowmymo says:

    you need the eHttpClient cURL class by 5ubliminal.
    Link:
    http://w-shadow.com/files/ecurl.class.phps

    What do you want to change?

Leave a Reply