How To Get Redirect URL In PHP
HTTP redirects usually have the response status 301 or 302 and provide the redirection URL in the “Location” header. I’ve written three complementary PHP functions that you can use to find out where an URL redirects to (based on a helpful thread at WebmasterWorld). You don’t even need CURL for this - fsockopen() will do just fine.
The PHP script
/** * get_redirect_url() * Gets the address that the provided URL redirects to, * or FALSE if there's no redirect. * * @param string $url * @return string */ function get_redirect_url($url){ $redirect_url = null; $url_parts = @parse_url($url); if (!$url_parts) return false; if (!isset($url_parts['host'])) return false; //can't process relative URLs if (!isset($url_parts['path'])) $url_parts['path'] = '/'; $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30); if (!$sock) return false; $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; $request .= 'Host: ' . $url_parts['host'] . "\r\n"; $request .= "Connection: Close\r\n\r\n"; fwrite($sock, $request); $response = ''; while(!feof($sock)) $response .= fread($sock, 8192); fclose($sock); if (preg_match('/^Location: (.+?)$/m', $response, $matches)){ return trim($matches[1]); } else { return false; } } /** * get_all_redirects() * Follows and collects all redirects, in order, for the given URL. * * @param string $url * @return array */ function get_all_redirects($url){ $redirects = array(); while ($newurl = get_redirect_url($url)){ if (in_array($newurl, $redirects)){ break; } $redirects[] = $newurl; $url = $newurl; } return $redirects; } /** * get_final_url() * Gets the address that the URL ultimately leads to. * Returns $url itself if it isn't a redirect. * * @param string $url * @return string */ function get_final_url($url){ $redirects = get_all_redirects($url); if (count($redirects)>0){ return array_pop($redirects); } else { return $url; } }
Here’s an example that lists all URLs that a given address redirects to (in order) :
$rez = get_all_redirects('http://daerils.gtrends.hop.clickbank.net/'); print_r($rez);
Known Issues
Most likely you won’t ever run into one of these, but here they are anyway :
- The script doesn’t recognize infinite redirects that don’t form a loop. However, it can handle normal redirection loops - get_all_redirects() exits as soon as it encounters an URL that it has already seen.
- Relative redirects multiple (e.g. “Location: go.php?asdf”) won’t be fully followed by get_all_redirects().
- Not an issue per-se, yet something to note : these functions won’t tell you if an URL is valid, just what it redirects to (if anything).
On a related note, check out the Firefox extension Redirect Remover.
Related posts:
July 6th, 2008 at 11:39 am
Nice Script dude - thanks
August 18th, 2008 at 6:44 pm
hi, this looks good, but can you tell me why it doesnt work with links on this newspapers site (try clicking one): http://www.onlinenewspapers.com/denmark.htm
the link destination is something like http://lt.webwombat.com/lt.php?15660 but you immediately get forwarded to the actual news site… but i cant seem to extract the redirected url.
thanks
August 18th, 2008 at 7:02 pm
I think that site purposefully detects automated scripts and handles them differently than normal users, causing this problem. You might be able to fool them by setting the User-Agend header to some common web browser’s UA.
August 18th, 2008 at 7:31 pm
thanks for the reply. i tried lots of methods, but don’t really know what im doing, and i didnt get it working. i don’t suppose there’s a simple edit to your get_redirect_url() that you could share? i can follow this site’s links from my browser, and my $HTTP_SERVER_VARS['HTTP_USER_AGENT'] returns: “Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1″
so that must be a good user-agent to use…
don’t bother if it’s hassle…
cheers
August 18th, 2008 at 8:54 pm
How about adding something like this after the “…Host:…” line in get_redirect_url() :
Just an idea, I haven’t tested it.
August 18th, 2008 at 9:15 pm
hey, yer i tried that (have actually edited my php.ini to set the user agent to that), but it still doesnt work. i’ve given up on it tbh. i had already moved on to wikipedia, which originally gave me
Warning: file_get_contents(http://ar.wikipedia.org/wiki/1955) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in…
but that user agent fix worked there, so im ok!
thanks for your time.