How To Get Redirect URL In PHP
HTTP redirects usually have the response status 301 or 302 and provide the redirection URL in the “Location” header. I’ve written three complementary PHP functions that you can use to find out where an URL redirects to (based on a helpful thread at WebmasterWorld). You don’t even need CURL for this – fsockopen() will do just fine.
The PHP script
/** * get_redirect_url() * Gets the address that the provided URL redirects to, * or FALSE if there's no redirect. * * @param string $url * @return string */ function get_redirect_url($url){ $redirect_url = null; $url_parts = @parse_url($url); if (!$url_parts) return false; if (!isset($url_parts['host'])) return false; //can't process relative URLs if (!isset($url_parts['path'])) $url_parts['path'] = '/'; $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30); if (!$sock) return false; $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; $request .= 'Host: ' . $url_parts['host'] . "\r\n"; $request .= "Connection: Close\r\n\r\n"; fwrite($sock, $request); $response = ''; while(!feof($sock)) $response .= fread($sock, 8192); fclose($sock); if (preg_match('/^Location: (.+?)$/m', $response, $matches)){ if ( substr($matches[1], 0, 1) == "/" ) return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]); else return trim($matches[1]); } else { return false; } } /** * get_all_redirects() * Follows and collects all redirects, in order, for the given URL. * * @param string $url * @return array */ function get_all_redirects($url){ $redirects = array(); while ($newurl = get_redirect_url($url)){ if (in_array($newurl, $redirects)){ break; } $redirects[] = $newurl; $url = $newurl; } return $redirects; } /** * get_final_url() * Gets the address that the URL ultimately leads to. * Returns $url itself if it isn't a redirect. * * @param string $url * @return string */ function get_final_url($url){ $redirects = get_all_redirects($url); if (count($redirects)>0){ return array_pop($redirects); } else { return $url; } }
Here’s an example that lists all URLs that a given address redirects to (in order) :
$rez = get_all_redirects('http://daerils.gtrends.hop.clickbank.net/'); print_r($rez);
Known Issues
Most likely you won’t ever run into one of these, but here they are anyway :
- The script doesn’t recognize infinite redirects that don’t form a loop. However, it can handle normal redirection loops – get_all_redirects() exits as soon as it encounters an URL that it has already seen.
- Relative redirects multiple (e.g. “Location: go.php?asdf”) won’t be fully followed by get_all_redirects().
- Not an issue per-se, yet something to note : these functions won’t tell you if an URL is valid, just what it redirects to (if anything).
On a related note, check out the Firefox extension Redirect Remover.
Related posts :
Nice Script dude – thanks 😀
hi, this looks good, but can you tell me why it doesnt work with links on this newspapers site (try clicking one): http://www.onlinenewspapers.com/denmark.htm
the link destination is something like http://lt.webwombat.com/lt.php?15660 but you immediately get forwarded to the actual news site… but i cant seem to extract the redirected url.
thanks
I think that site purposefully detects automated scripts and handles them differently than normal users, causing this problem. You might be able to fool them by setting the User-Agend header to some common web browser’s UA.
thanks for the reply. i tried lots of methods, but don’t really know what im doing, and i didnt get it working. i don’t suppose there’s a simple edit to your get_redirect_url() that you could share? i can follow this site’s links from my browser, and my $HTTP_SERVER_VARS[‘HTTP_USER_AGENT’] returns: “Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1”
so that must be a good user-agent to use…
don’t bother if it’s hassle…
cheers
How about adding something like this after the “…Host:…” line in get_redirect_url() :
Just an idea, I haven’t tested it.
hey, yer i tried that (have actually edited my php.ini to set the user agent to that), but it still doesnt work. i’ve given up on it tbh. i had already moved on to wikipedia, which originally gave me
Warning: file_get_contents(http://ar.wikipedia.org/wiki/1955) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in…
but that user agent fix worked there, so im ok!
thanks for your time.
Awesome script! Saved me some time from writing this myself. Keep up the great coding! Thanks!
It was a nice script.
I was looking a function like this.
Thank you so much…
Thank You so much for sharing this. I have learned a lot by trying to get this to work on my own with libxml, xmllint, xpath, curl, wget and the like, with only segments working properly. Now your script shows a direct way to get this done, all within PHP as it should be, without filesystem hacks or workarounds. Thank you for teaching me something new. I’ll be studying how and why this works for a while so I’m not using someone’s work without understanding it. I dub thee Yoda, lol.
Hehe, thanks 🙂
Very nice script man… Worked here like a charm! Thanks!
But now i’m trying something else and not getting it right… I’m trying to capture the redirect url for some IMDB queries, like:
http://www.imdb.com/find?s=tt&q=deus+%E9+brasileiro&x=0&y=0
I know that only registered users can disable redirect on IMDB, so, i need to know whats the new URL… any chances?
I don’t have time to investigate this right now, but here’s something : add
after the fclose($sock) line. This will output the server’s response and that should provide some clues why the function doesn’t work.
I’m guessing IMDB has some kind of protection against bots.
exactly what i’ve been trying to find, thanks a lot
Hi,
First of all, thank you every much, this code works like a charm for me,
But have a small question, how do you make it work if the query has some thing like this “query=vla1+val2. Some how it does not pick up val2. any thing the, if there is a space that come out with a +.
Heres what i did,
I have a form that a user inputs queryinto and it i sent to the php file like this
http://www.myserver.com/test.php?query=val1+val2 = url in the brower.
i see that if i have one one variable it works fine, but if the variale is some thing like a+b only “a” gets captured and not “b”.
How can i make this work..
thanks again for this life saving script..
Hmm, I’m not sure I completely understand what you’re trying to do, but I don’t see why the second part of the variable wouldn’t be captured. Have you tried doing something like
to verify that it’s really discarded by the server, instead of a bug somewhere in your script?
hi,
i did a echo $response; after the fclose($sock) and i saw that the first url was shown as query=val1 val2, noticed that the “+” was gone, i tried to replace the space with%20 still same only val1 was picked. any comments
I’ve no idea why this would happen. I tried a few tests myself using a query that contains a space, and it was processed correctly. Maybe you can give me a real example that I could test?
Bug fix:
if (preg_match(‘/^Location: (.+?)$/m’, $response, $matches)){
if ( substr($matches[1], 0, 1) == “/” )
return $url_parts[‘scheme’] . “://” . $url_parts[‘host’] . trim($matches[1]);
else
return trim($matches[1]);
} else {
return false;
}
Thanks for the fix. I hadn’t considered relative URLs at all.
Tried and tested function (follows javascript redirects as well):