Get Google Search Results With PHP – Google AJAX API And The SEO Perspective
If you’ve ever tried to write a program that fetches search results from Google, you’ll no doubt be familiar with the excrutiating annoyances of parsing the results and getting blocked periodically. Run a couple hundred queries in a row and bam! – your script is banned until proven innocent by entering an captcha. Even that would provide only a short reprieve, as you’d soon get blocked again.
Luckily there’s an official Google search API that will let you avoid that hassle. In this post you’ll find an example PHP script and a (mainly) SEO-oriented review of the API.
Using the AJAX API in PHP
I must confess that until yesterday I didn’t know you could use the Google AJAX search API in languages other than JavaScript. The documentation didn’t even mention the possibility when the API was first released. Well, it does now, and PHP is among the supported languages. Oh, the joy.
The API is already pretty well documented, so I won’t waste your time with another lengthy tutorial. Instead, here’s a simple example of how you could use it in PHP :
/** * google_search_api() * Query Google AJAX Search API * * @param array $args URL arguments. For most endpoints only "q" (query) is required. * @param string $referer Referer to use in the HTTP header (must be valid). * @param string $endpoint API endpoint. Defaults to 'web' (web search). * @return object or NULL on failure */ function google_search_api($args, $referer = 'http://localhost/test/', $endpoint = 'web'){ $url = "http://ajax.googleapis.com/ajax/services/search/".$endpoint; if ( !array_key_exists('v', $args) ) $args['v'] = '1.0'; $url .= '?'.http_build_query($args, '', '&'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // note that the referer *must* be set curl_setopt($ch, CURLOPT_REFERER, $referer); $body = curl_exec($ch); curl_close($ch); //decode and return the response return json_decode($body); } $rez = google_search_api(array( 'q' => 'antique shoes', )); print_r($rez);
That’s it for the programming part.
So should we really throw away our lovingly crafted SERP scrapers and embrace the “official” API? Perhaps not. There are some peculiar things I’ve noticed after trying out the new API.
The Good
Lets start with the positive aspects. First, it looks like you can indeed safely use the API without getting blocked – I successfully ran about 1800 API queries in ~2 hours. Due to my crappy connection I was unable to test how it would behave if you turn it up to eleven and send hundreds of requests per second, but the rate limiter is definitely more lenient on API users than on plain SERP scrapers. This is a major plus for people who don’t like throttling their software to one request per minute or hunting for working proxies to get around bans.
The API also makes it easy to parse the results. All queries return JSON-encoded data, so you just json_decode() it and go. No need to invent complicated regexps that must be rewritten every time Google changes the HTML structure of the search results page.
The Bad
Of course, with a cliche megacorporation like Google it’s never all fun and games. You can only get 8 search results at a time, and no more than 64 results in total for any particular keyword. Whether this is a problem depends on what you intend to do with the API, but it’s certainly an unpleasant limitation.
The really peculiar – nay, insidious – thing is how the search results returned by the API differ from normal SERPs. A site that is #10 in a normal Google search may suddenly turn up as #1 in the API results. The typical #5 result may be moved to the second page. Basically, the API results look like they’ve been shuffled around a bit – the same URLs are returned but in slightly different order. Also, the “estimated result count” provided by the API is consistently much lower than what a normal search shows. All this makes the API useless for rank checking and similar SEO applications.
According to my tests you can’t just write off these discrepancies as a sideffect of geo-targeting.
It Depends
Overall, the API is either great or it kind-of sucks, depending on what you want to do with it.
At the risk of sounding like a conspiracy theorist, I must say the API seems to be cleverly engineered to be useful for “normal” purposes and somewhat useless for SEO. After all, only SEO workers really need accurate ranking data and more than 64 results per keyword phrase. Typical search engine users rarely move beyond the first page of results, so the limitations don’t hurt them. The various mashup makers that cater to the common user are also unaffected. It’s only the SEOs (and the rare academic researcher) that would be dissatisfied with the imposed constraints.
Of course, I’m sure you can still imagine a few interesting uses for the API
This was a very informative article. I will read your blog often.
Yea the AJAX API is a pain in the butt because of the huge variation in search result counts. For my software (SEnuke) what I did was I bought a huge proxy list of around 200 proxies (not too expensive) and just change the proxy on Google every time it throws back the “automated search” message. Make sure to wipe out the cookie that you send to Google for the search request every time you flip the proxy on them. Works a charm!
But of course there is a little bit of monetary investment involved, but it can work real well if you are desperate for this info.
Guess you could achieve the same thing for free if you set up Tor to anonimise your IP… until all the endpoints get blocked.
Cam.
I’ve actually tried using it, but in my experience I get blocked even faster with Tor. Switching identities helps for a while, but the new identity is also soon blocked. I’m guessing this is because lots of other people also had the same bright idea and try to use Tor for Google scraping.
This is what i ve been been searching for the past 2 weeks. thanks.
Why this only gives 4 results??
Please help
Read the documentation. The API only returns 4 or 8 results per call.
Thank you for quick reply!
How can we get 8 instead of 4? What is that parameter which does this?
I surmise you still didn’t read the documentation.
Anyway, append “rsz=large” to the request URL to get 8 results.
ah,
My fault!
Thanks for quick help.
Does anyone know whether the results returned from API would generate revenue from pay per click?
No, I don’t think they generate PPC revenue.
Pity… thanks to all for a great contributions – got the code up and running in minutes.
if the estimated result count it lower than it is on the web, than the api is useless for me. So there is still no good solution for people like me. The API is inaccurate and i’m constantly getting banned by scrapers or having to wait patiently……..
True, but there isn’t much else one can do without resorting to using massive amounts of anonymous proxies.
Thank you so much!!! You saved me tons of time!
Very nice article man! This information has helped me a bunch. My brother and I are just beginning some work on a mash up website and I can’t wait to apply this. Thanks again!
how i get the estimated result count to keep variable?
I’m afraid I don’t understand the question.
This may be related to php, but is there a way to get just the urls. Right now, I get things beginning with
“stdClass Object ( [responseData] => stdClass Object ( [results] => Array ( [0] => stdClass Object ( [GsearchResultClass] => GwebSearch [unescapedUrl] =”
and it goes on to display a visible url, a cached url, etc. I guess I need the list of visible urls. Hope you can help.
The function returns an object that contains the search results and all kinds of additional information. If you want just the URLs, you could get them by iterating over the $rez->responseData->results array and grabbing the unescapedUrl field from each result. Like this :
That was fast…and Accurate ! Thanks alot
It seems that the results are always from a country-specific domain, the defult being ‘us’. Is it possible to get results from google.com instead of google.country ?
It doesn’t appear to be possible, at least not when querying the API from PHP. See the
glargument in the API docs.The example given in the gl argument states “google.loader.GoogleLocale = ‘www.google.com’;”. Does this mean that there is a way to override the domain, and can it be used in your script ? I tried adding this verbatim but doesn’t help. I appreciate your time.
As I said, I don’t see any way to get non-locale-specific results via this particular API. It seems you can set the locale to other countries (e.g. gl=uk for United Kingdom), but not turn it off – it would default to “us”.
Nice article. I will definately try this on a website where i want to display the website rankings.
Hello. Two more last things
– What would be ideal to write as ‘referer’ – It seems that anything works, even the ‘localhost’ as mentioned in your script. Wanted to know if it will have any effect.
– The object ‘rez’ prints info abou the url like title, meta description, but not meta keywords. Is there any way so that even the meta keywords are fetched ?
Thanks
I don’t think “referer” is very significant, it just needs to be set to something valid. And no, the API doesn’t return meta keywords. If I remember correctly, the returned description is also not guaranteed to be the meta description – it could be an automatically generated page excerpt instead.
[...] ø Get Google Search Results With PHP – Google AJAX API And The SEO Perspective | W-Shadow.com ø If you’ve ever tried to write a program that fetches search results from Google, you’ll no doubt be familiar with the excrutiating annoyances of parsing the results and getting blocked periodically. Run a couple hundred queries in a row and bam! – your script is banned until proven innocent by entering an captcha. Even that would provide only a short reprieve, as you’d soon get blocked again. (tags: todo google screenscraping api) [...]
Hey!
Thank you for this explanation. The js API is quite powerful but the args that can be used in the queries from other languajes is a bit poor.
After reading the documentation from google I cannot see how could I make a site restriction from php.
The js API is quite powerful but the args that can be used in the queries from other languajes is a bit poor.
Do you know how could I make a query with site restriction from php? I want to find term A in z.com, y.com and x.com…
Thanks in advance!
For a single site, you could simply add ” site:example.com ” to your search query. I don’t know about restricting the query to multiple sites; AFAIK that’s only possible with custom search engines.
[...] [...]
Hi
thanck for your script, i’m currently using it but the output is just a bunch of assorted text. please help.
for more info see the page on my website @
link:
http://saeed-x.co.cc/qq.php
?
That is the expected result. That “bunch of text” actually represents the PHP object that contains the search results. To format the output more readably, add “echo ‘<pre>’;” somewhere above the “print_r($rez)” line.
Long story short, the $rez->responseData->results array is probably what you’re after – it contains the actual results. Each array item is an object, with several fields like “title”, “url”, “content” (and so on) that describe the search result.
Well done, very useful article. Thank you!
hello,
i have been using your script for a few months now and it’s been really helpful, but since a few days ago the script is not working anymore, no results is returned, just a white page, see below:
http://www.saeedx.danagig.ir/nn.php
any help is greatly appreciated.
Still works fine here.
Why don’t you try adding some “echo” statements in strategic places to see where the script fails? That would probably be more helpful for debugging than a blank page.
thank you very much for your answer,
i checked again, and eventually found out it was Google that was banning my IP address from its service. I run a website with an average 20.000 visitors per day and i think this (large number of requests per day) is causing the ban. I also added the UserIP and Referrer arguments, only this time I get temporary bans. Google’s documentation in this regard is not really helpful. i really would like to hear your opinion on this.
thank you very much.
As far as I know, the only semi-reliable way to avoid the bans and still be able to send a large number of requests is to use lots and lots of proxies.
hello,
can any one tell me how can get more rankings with this api.i want to get upto 500 rankings for a keyword.pls help me in this..
thanks in advance.
You can’t. This API is very limited. If you want to get 500 rankings you’ll have to write your own parser that extracts the rankings from normal search results pages.
[...] sprzedaży.Powtarzaj za mną: Dywersyfikacja dochodów jest ważna! Potem pobawiłem się chwilę skryptem, który pobiera wyniki z wyszukiwarki google. Radość, gdy kilkoma linijkami kodu możesz wyświetlić pasujące obrazy z wyszukiwarki google [...]
hi,
are sure the difference between google ajax api and web browser is only the order and url format ? I test some keywords for my own site, it seems the difference is not only order. Some url can not been retrieved by google ajax api but I can find them through web browser
Could you tell me what keywords you use for testing ?
Well, I didn’t say those were the only differences – they’re just the ones that are most immediately obvious. There are others – for example, the normal search results use something called Universal Search, while the API is purely web search (AFAIK).
I was apply these tips… its great and safe… Thanks and Peace.
But I wanna more clear about.. declaring the the result on the web page and I want the linkable url. Peace
Hey… it’s only returning 4 results, would you know why?
Yes, it does that. Check the API docs, they’ll tell you how to increase the number of results per query.
As the author said”64 results in total for any particular keyword”
How can I get more than 64 results?
The API won’t give you more results no matter how nicely you ask. You need to write a scraper instead.
Hey… I am trying to do the site:URL search in the google ajax API with PHP using your script. I tried URLencode of the search string, but still nothing.
Any tips?
URLencode shouldn’t be necessary as http_build_query() does that automatically.
Try echo’ing the generated $url to see if it’s encoding the arguments correctly.
http://ajax.googleapis.com/ajax/services/search/web?q=site%3Alifehacker.com%2F5599508%2Fuse-a-camera-to-help-see-your-clutter-trouble-spots&v=1.0&rsz=large
is the output with:
site:lifehacker.com/5599508/use-a-camera-to-help-see-your-clutter-trouble-spots
as the keyword to search.
I get no results, even though when I do the site: search on Google, the results show up.
I am wanting a way to check many URLs to see if they are indexed or not.
Did the URL actually contain the
entities, or is that a side-effect of my comment form?
&v=1.0&rsz=large shows as the actual character on the 7 key, not the HTML code for it…. those show even if the keyword isnt a site:www.url.com
Well, in that case I have no idea what’s wrong.
Echo $body before the “return json_decode($body);” line and see if there are any error messages in the returned JSON?
Thanks in advance for all of your help, I definitely appreciate it…
Odd development… that specific URL started working with no change at all in the code…. another verified indexed URL is showing nothing, much like the previous one.
The echo $body is here: http://pastebin.com/rY33WYxr
Perhaps the servers that handle AJAX queries don’t use the same index as “normal” Google servers, so it takes a while until results start showing up.
Also, I don’t see anything wrong with that JSON response – except that there are no results, of course.
Alright, just wanting to make sure I was not screwing something up. So I will move forward with the script and simply keeping that in mind… thanks!
I already try your script.. and place in search of wp.. by replacing “shoes” with $s but I ve been banned by Google due to violation. Do you know … what are the reason..?
I was wondering, if you can tell me how to use this to find the rankings for a site.
For Example, when I run the script I would like the result to be something like:
Search Term: ‘mgpwr’ with site: ‘mgpwr.co.uk’ is ranked: 1
Search Term: ‘designer’ with site: ‘mgpwr.co.uk’ is ranked: 13
Search Term: ‘freelance’ with site: ‘mgpwr.co.uk’ is ranked: 100
see my point? I know the SOAP API now has been cancelled. I do need something similar to this: http://www.further.co.uk/tools/search-position-check/ but just google UK
I appreciate all help
You could iterate over the returned results until you find one where the URL contains the domain name you’re looking for. To determine its position, just keep count of how many results you’ve already examined.
The format and structure of the results are described on this page. Since the API only returns up to 8 results at a time, you’ll probably need to put the whole thing in a loop – check the first 8 results first, then the next 8, and so on.
Hello everyone;
I need a lot using this API in my system. But the principle would be a site, I want to get automatic words and sends them to the API and print the results in a document. Doc.
Please someone can tell me if I do that?
Abrs
[...] Get Google Search Results With PHP – Google AJAX API And The SEO Perspective 这篇文章中也说道这个问题: The really peculiar – nay, insidious – thing is how the search results returned by the API differ from normal SERPs. [...]