Get Google Image Search Results With PHP
Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS
PHP Script for Google Image Search
Note that this script requires the eHttpClient cURL class by 5ubliminal.
function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){ $page--; $perpage = 21; $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N", $dc,urlencode($query),$page*$perpage,$safe); $hc=new eHttpClient(); $hc->setReferer("http://".$dc."/"); $html=$hc->get($url); $code = $hc->getInfo(CURLINFO_HTTP_CODE); if ($code != '200') return false; if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER)) return array(); $results=array(); foreach($matches as $match){ if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue; if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/', $parts[1][0], $url_parts) ) continue; $refUrl = urldecode($url_parts[1]); $height = intval($url_parts[2]); $width = intval($url_parts[3]); $rank = intval($url_parts[5]); //check if we've already passed the last page of results if($rank < ($page * $perpage + 1)) break; $imgUrl = urldecode($parts[1][3]); $refDomain = $parts[1][11]; $imgText = $parts[1][6]; $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText); $imgText = strip_tags(html_entity_decode($imgText)); $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl; $one_result=array( 'Rank' => $rank, 'RefUrl' => $refUrl, 'ImgText' => $imgText, 'ImgUrl' => $imgUrl, 'Height' => $height, 'Width' => $width, 'Host' => $refDomain, 'ThumbUrl' => $thumbUrl, ); array_push($results,$one_result); } return $results; }
How To Use It
I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).
Here’s an example -
$results = googleImageResults('headcrab', 1); print_r($results);
The output looks something like this -
Array
(
[0] => Array
(
[Rank] => 1
[RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/
[ImgText] => Headcrab-Chappe
[ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
[Height] => 297
[Width] => 450
[Host] => bjoern.amherd.net
[ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg
)
[1] => Array
(
[Rank] => 2
[RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15
[ImgText] => The Common Headcrab
[ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg
[Height] => 423
[Width] => 750
[Host]=>www.penny-arcade.com
[ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg
)
......Notes And Caveats
- The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader
- As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
- If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches - that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.
- Whatever you do, don’t do this.
Disclaimer
Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.
Related posts :
February 28th, 2008 at 12:27 am
heh and why not do that?
Aside from slightly obnoxious comments coming through of course.
February 28th, 2008 at 1:20 am
That’s just reverse psychology or something
Personally, I got me a nice database with the help of your tutorial and this script.
February 28th, 2008 at 2:40 am
Haha alright
Apparently my sarcasm detection is a bit off today. Glad you enjoyed, and nice writeup yourself!
February 28th, 2008 at 3:32 am
Yeah … my class throws warnings as I enjoy using php functions with undefined parameters which do throw warnings but allow me to play with them as I need to. Best is to dump them (warnings and notices) using:
error_reporting(E_ALL^(E_WARNING|E_NOTICE));
Not every warning is an error but they do look bad on sites
Regards.
PS: I’m glad you figured out how to use my class well. Many struggle too much with it
@Shady: I did notice the sarcasm but a
at the end of the line would have made it more obvious
February 28th, 2008 at 3:34 am
The photos on DA are awesome. Actually … your pussycats make’em look so nice but I’ll give you a bit of credit too.
February 28th, 2008 at 2:03 pm
I prefer to set error_reporting to E_ALL when developing and get rid of all warnings & notices by changing the code. I think this leads to a more stable implementation, but that’s just IMHO. So I set up default values for the function parameters that your class treats as optional.
BTW… I have more than 13 cats (seriously).
February 29th, 2008 at 12:24 am
I write my C++ code warning free but, as PHP has no real rules regarding data types and so on, notices can seem silly sometimes so I just ignore them and make sure it all works.
13 cats … wow … my folks have 2 of them
April 16th, 2008 at 3:04 pm
[...] warum gerade mein Beitrag über die Headcrab so häufig besucht wird: Meine Website wird als PHP-Script-Ergebnis Beispiel missbraucht. Sachen [...]
April 23rd, 2008 at 8:40 am
Hi, firstly thanks for (what I’m sure is) a great script. I’ve been having a look through 5ubliminal’s code as well as yours and I just can’t work out where I’m going wrong. I have 5ubliminal’s class directly above your code and have tested his class which works.
I have literally copy and pasted your code but all I get returned is Array() when I call :
$results = googleImageResults(’headcrab’, 1);
print_r($results);
as you recommend.
Can you perhaps point me in the right direction? http://rafb.net/p/BM6bPe79.html is the sourcecode of what I’m using. Thanks very much in advance
April 23rd, 2008 at 12:16 pm
Google probably changed their code, so one of the regexps in the function no longer works. I’ve modified it and it works again, at least for me. The post has also been fixed, so you can just copy the new version.
In particular, it was the first preg_match_all() that needed to be changed.
April 24th, 2008 at 2:01 am
Hi, just to let you know that the alteration works here also. Thank you very much for the really quick reply and wonderful class. When I get time to actually implement it on my own website and do with it what I want, I shall let you know.
Thanks again
Stickytape
June 23rd, 2008 at 12:48 pm
I think its a nice work if it work. I will check it when i will back to home.
thanks
June 24th, 2008 at 11:46 am
It works fine.
thanks