Get Google Image Search Results With PHP
Google Image Search doesn’t get as much time in the spotlight as the “normal” Web Search, but it’s still useful for things like finding suitable illustrations for an article (Flickr also comes to mind). Whatever you use it for, you can often get results faster with a bit of automation. So here’s a simple PHP script that can parse and return the results of any Image Search query. It’s strictly for education purposes though, as actually using it would probably constitute a violation of Google ToS 😉
PHP Script for Google Image Search
Note that this script requires the eHttpClient cURL class by 5ubliminal.
function googleImageResults($query, $page=1, $safe='off', $dc="images.google.com"){ $page--; $perpage = 21; $url=sprintf("http://%s/images?q=%s&gbv=2&start=%d&hl=en&ie=UTF-8&safe=%s&sa=N", $dc,urlencode($query),$page*$perpage,$safe); $hc=new eHttpClient(); $hc->setReferer("http://".$dc."/"); $html=$hc->get($url); $code = $hc->getInfo(CURLINFO_HTTP_CODE); if ($code != '200') return false; if(!preg_match_all('/dyn.Img\((.+)\);/Uis', $html, $matches, PREG_SET_ORDER)) return array(); $results=array(); foreach($matches as $match){ if(!preg_match_all( '/"([^"]*)",/i', $match[1], $parts)) continue; if(!preg_match('/(.+?)&h=(\d+)&w=(\d+)&sz=(\d+)&hl=[^&]*&start=(\d+)(?:.*)/', $parts[1][0], $url_parts) ) continue; $refUrl = urldecode($url_parts[1]); $height = intval($url_parts[2]); $width = intval($url_parts[3]); $rank = intval($url_parts[5]); //check if we've already passed the last page of results if($rank < ($page * $perpage + 1)) break; $imgUrl = urldecode($parts[1][3]); $refDomain = $parts[1][11]; $imgText = $parts[1][6]; $imgText = preg_replace('/\\\x(\w\w)/', '&#x\1;', $imgText); $imgText = strip_tags(html_entity_decode($imgText)); $thumbUrl = $parts[1][14].'?q=tbn:'.$parts[1][2].$imgUrl; $one_result=array( 'Rank' => $rank, 'RefUrl' => $refUrl, 'ImgText' => $imgText, 'ImgUrl' => $imgUrl, 'Height' => $height, 'Width' => $width, 'Host' => $refDomain, 'ThumbUrl' => $thumbUrl, ); array_push($results,$one_result); } return $results; }
How To Use It
I think all the parameters are self-explanatory. The function will return an array of results if it’s successful, or an empty array if there are no results for the query. It can also return a boolean false in case of a really bad error (e.g. a “403 Forbidden” result).
Here’s an example –
$results = googleImageResults('headcrab', 1); print_r($results);
The output looks something like this –
Array ( [0] => Array ( [Rank] => 1 [RefUrl]=>http://bjoern.amherd.net/2006/12/15/headcrab-chappe/ [ImgText] => Headcrab-Chappe [ImgUrl]=>http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg [Height] => 297 [Width] => 450 [Host] => bjoern.amherd.net [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:2drTKLkzK4KZQM:http://bjoern.amherd.net/wp-content/uploads/2006/12/headcrab.jpg ) [1] => Array ( [Rank] => 2 [RefUrl]=>http://www.penny-arcade.com/comic/2004/11/15 [ImgText] => The Common Headcrab [ImgUrl]=>http://www.penny-arcade.com/images/2004/20041115h.jpg [Height] => 423 [Width] => 750 [Host]=>www.penny-arcade.com [ThumbUrl]=>http://tbn0.google.com/images?q=tbn:A2d3zKEpYJe0FM:http://www.penny-arcade.com/images/2004/20041115h.jpg ) ......
Notes And Caveats
- The original eHttpClient class tends to throw some warnings due to omitted function parameters, so I use a slightly modified version in my own projects. Fixing the class is left as an exercise to the reader 😛
- As far as I know, you can’t specify the number of results per page for Image Search. It always returns 21 images, even though it only displays 18 of those to human visitors. Weird.
- If you send a lot of queries in a short time you will get a temporary ban. Theoretically you could overcome this by using proxies and/or appropriate timeouts between searches – that is, if you could bring yourself to commit such an insiduous breach of the Terms of Service, which I’m not advocating in any way.
Disclaimer
Image search code provided AS-IS, with no warranty of anything. And so on. Good luck.
Related posts :
heh and why not do that?
Aside from slightly obnoxious comments coming through of course.
That’s just reverse psychology or something 😉 Personally, I got me a nice database with the help of your tutorial and this script.
Haha alright 🙂
Apparently my sarcasm detection is a bit off today. Glad you enjoyed, and nice writeup yourself!
Yeah … my class throws warnings as I enjoy using php functions with undefined parameters which do throw warnings but allow me to play with them as I need to. Best is to dump them (warnings and notices) using:
error_reporting(E_ALL^(E_WARNING|E_NOTICE));
Not every warning is an error but they do look bad on sites 🙂
Regards.
PS: I’m glad you figured out how to use my class well. Many struggle too much with it 🙂
@Shady: I did notice the sarcasm but a 😉 at the end of the line would have made it more obvious 🙂
The photos on DA are awesome. Actually … your pussycats make’em look so nice but I’ll give you a bit of credit too.
I prefer to set error_reporting to E_ALL when developing and get rid of all warnings & notices by changing the code. I think this leads to a more stable implementation, but that’s just IMHO. So I set up default values for the function parameters that your class treats as optional.
BTW… I have more than 13 cats (seriously).
I write my C++ code warning free but, as PHP has no real rules regarding data types and so on, notices can seem silly sometimes so I just ignore them and make sure it all works.
13 cats … wow … my folks have 2 of them 🙂
[…] warum gerade mein Beitrag über die Headcrab so häufig besucht wird: Meine Website wird als PHP-Script-Ergebnis Beispiel missbraucht. Sachen […]
Hi, firstly thanks for (what I’m sure is) a great script. I’ve been having a look through 5ubliminal’s code as well as yours and I just can’t work out where I’m going wrong. I have 5ubliminal’s class directly above your code and have tested his class which works.
I have literally copy and pasted your code but all I get returned is Array() when I call :
$results = googleImageResults(‘headcrab’, 1);
print_r($results);
as you recommend.
Can you perhaps point me in the right direction? http://rafb.net/p/BM6bPe79.html is the sourcecode of what I’m using. Thanks very much in advance
Google probably changed their code, so one of the regexps in the function no longer works. I’ve modified it and it works again, at least for me. The post has also been fixed, so you can just copy the new version.
In particular, it was the first preg_match_all() that needed to be changed.
Hi, just to let you know that the alteration works here also. Thank you very much for the really quick reply and wonderful class. When I get time to actually implement it on my own website and do with it what I want, I shall let you know.
Thanks again
Stickytape
I think its a nice work if it work. I will check it when i will back to home.
thanks
It works fine.
thanks
[…] Results With PHP – Google AJAX API And The SEO Perspective If you’ve ever tried to write a program that fetches search results from Google, you’ll no doubt be familiar with the excrutiating annoyances of parsing the […]
Thanks. Exactly what i needed. 🙂
hi , plz eHttpClient Class !??????
PLZ check it and repaire this code !
Warning: Missing argument 2 for eHttpClient::get(), called in C:\apache2triad\htdocs\Learn\google image\test.php on line 11 and defined in C:\apache2triad\htdocs\Learn\google image\ehttpclient.php on line 128 Warning: Missing argument 3 for eHttpClient::get(), called in C:\apache2triad\htdocs\Learn\google image\test.php on line 11 and defined in C:\apache2triad\htdocs\Learn\google image\ehttpclient.php on line 128 Array ( )
Dogged persistence and horrible grammar are sure ways to win friends and influence people.
Give it up, the function wouldn’t work anyway because Google search results have a different HTML structure now. The regexps won’t work. Use the proper API.
hi , i can not use google api becuse i from iran
plz check it your function
i dont underestand where is problem
Why not? I don’t see any country-based restrictions for the API. You don’t even need to sign up; the API-key is optional.
The problem is that even if I “fix” the function so that it doesn’t generate an error message, it still won’t give you any search results. This is because the function was written to parse and interpret the search results page as it was then – in February 2008. However, the search page is now different and the function won’t “understand” it. You could modify the function to parse the current page structure, but that would be a lot of work (relatively).