Get Google Image Search Results With PHP
Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS ๐
PHP Script for Google Image Search
Note that this script requires the eHttpClient cURL class by 5ubliminal.
function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){ $page--; $perpage = 21; $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N", $dc,urlencode($query),$page*$perpage,$safe); $hc=new eHttpClient(); $hc->setReferer("http://".$dc."/"); $html=$hc->get($url); $code = $hc->getInfo(CURLINFO_HTTP_CODE); if ($code != '200') return false; if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER)) return array(); $results=array(); foreach($matches as $match){ if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue; if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/', $parts[1][0], $url_parts) ) continue; $refUrl = urldecode($url_parts[1]); $height = intval($url_parts[2]); $width = intval($url_parts[3]); $rank = intval($url_parts[5]); //check if we've already passed the last page of results if($rank < ($page * $perpage + 1)) break; $imgUrl = urldecode($parts[1][3]); $refDomain = $parts[1][11]; $imgText = $parts[1][6]; $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText); $imgText = strip_tags(html_entity_decode($imgText)); $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl; $one_result=array( 'Rank' => $rank, 'RefUrl' => $refUrl, 'ImgText' => $imgText, 'ImgUrl' => $imgUrl, 'Height' => $height, 'Width' => $width, 'Host' => $refDomain, 'ThumbUrl' => $thumbUrl, ); array_push($results,$one_result); } return $results; }
How To Use It
I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).
Here’s an example –
$results = googleImageResults('headcrab', 1); print_r($results);
The output looks something like this –
Array ( [0] => Array ( [Rank] => 1 [RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/ [ImgText] => Headcrab-Chappe [ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg [Height] => 297 [Width] => 450 [Host] => bjoern.amherd.net [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg ) [1] => Array ( [Rank] => 2 [RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15 [ImgText] => The Common Headcrab [ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg [Height] => 423 [Width] => 750 [Host]=>www.penny-arcade.com [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg ) ......
Notes And Caveats
- The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader ๐
- As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
- If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.
Disclaimer
Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.
Related posts :
hi , i can not use google api , JUST PHP + CURL ๐
New Warning :
Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 207
Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 210
Warning: Supplied argument is not a valid resource handle in C:\xampp\webdav\xampp\google image\httpclient.php on line 118
Array ( )
this is my source ( function + class )
link : http://www.doctorsina.com/google_image.zip
hi , plz check this file : http://www.doctorsina.com/google_image.zip
(your function + httpclient Class)
alooooooooooooooooooooooooooooooooooooo plz check this file
humm did this script stopped working ?
it was working till a couple weeks ago i believe ๐
Eh, Google probably changed their results HTML again and that made the script break. You could probably fix it by tweaking the regexps.
Hi all!
sorry for my english and for my stupid question…
where is the variable where i put the words to find?
Pix
excuse me. i cant download the eHttpClient cURL class file, please give a proper link i realy need this code.
wating for a +ve relpy.
Looks like the site that hosted it is down. Here’s a backup.
Any have working script now? Google again changed html pattern?!
Plz reply ๐
Can some people update this script, google has changed there site again and it has stopped working.
Please i want this script but he does’t work, i have juste
Array ( )
thanks for the great info
Are you going to solve the problem?
$matches is not defined i got error & blank array
It returns same
Array ( )
this script is not working?
fixed version, but less options:
function googleImageResults($query, $page=1, $safe=’off’, $dc=”images.google.com”){
$page–;
$perpage = 21;
$url=sprintf(“http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N”,
$dc,urlencode($query),$page*$perpage,$safe);
$hc=new eHttpClient();
$hc->setReferer(“http://”.$dc.”/”);
$html=$hc->get($url);
$code = $hc->getInfo(CURLINFO_HTTP_CODE);
if ($code != ‘200’) return false;
$i=0;
while($i1){
$image_url_1=explode(““,$split[1]);
$image_url_2=explode(‘”‘,$image_url_1[0]);
$height=$image_url_2[2];
$width=$image_url_2[4];
$matches[]=array(“http://t$i.gstatic”.$image_url_2[0],$height,$width);
}
$i+=1;
}
$results=$matches;
return $results;
}
It displays error Fatal error: Class ‘eHttpClient’ not found
what can i do change in this script
you need the eHttpClient cURL class by 5ubliminal.
Link:
http://w-shadow.com/files/ecurl.class.phps
What do you want to change?