Get Google Image Search Results With PHP
Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS
PHP Script for Google Image Search
Note that this script requires the eHttpClient cURL class by 5ubliminal.
function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){ $page--; $perpage = 21; $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N", $dc,urlencode($query),$page*$perpage,$safe); $hc=new eHttpClient(); $hc->setReferer("http://".$dc."/"); $html=$hc->get($url); $code = $hc->getInfo(CURLINFO_HTTP_CODE); if ($code != '200') return false; if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER)) return array(); $results=array(); foreach($matches as $match){ if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue; if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/', $parts[1][0], $url_parts) ) continue; $refUrl = urldecode($url_parts[1]); $height = intval($url_parts[2]); $width = intval($url_parts[3]); $rank = intval($url_parts[5]); //check if we've already passed the last page of results if($rank < ($page * $perpage + 1)) break; $imgUrl = urldecode($parts[1][3]); $refDomain = $parts[1][11]; $imgText = $parts[1][6]; $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText); $imgText = strip_tags(html_entity_decode($imgText)); $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl; $one_result=array( 'Rank' => $rank, 'RefUrl' => $refUrl, 'ImgText' => $imgText, 'ImgUrl' => $imgUrl, 'Height' => $height, 'Width' => $width, 'Host' => $refDomain, 'ThumbUrl' => $thumbUrl, ); array_push($results,$one_result); } return $results; }
How To Use It
I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).
Here’s an example -
$results = googleImageResults('headcrab', 1); print_r($results);
The output looks something like this -
Array
(
[0] => Array
(
[Rank] => 1
[RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/
[ImgText] => Headcrab-Chappe
[ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
[Height] => 297
[Width] => 450
[Host] => bjoern.amherd.net
[ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
)
[1] => Array
(
[Rank] => 2
[RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15
[ImgText] => The Common Headcrab
[ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg
[Height] => 423
[Width] => 750
[Host]=>www.penny-arcade.com
[ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg
)
......Notes And Caveats
- The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader
- As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
- If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.
- Whatever you do, don’t do this.
Disclaimer
Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.
Related posts :
heh and why not do that?
Aside from slightly obnoxious comments coming through of course.
That’s just reverse psychology or something
Personally, I got me a nice database with the help of your tutorial and this script.
Haha alright
Apparently my sarcasm detection is a bit off today. Glad you enjoyed, and nice writeup yourself!
Yeah … my class throws warnings as I enjoy using php functions with undefined parameters which do throw warnings but allow me to play with them as I need to. Best is to dump them (warnings and notices) using:
error_reporting(E_ALL^(E_WARNING|E_NOTICE));
Not every warning is an error but they do look bad on sites
Regards.
PS: I’m glad you figured out how to use my class well. Many struggle too much with it
@Shady: I did notice the sarcasm but a
at the end of the line would have made it more obvious