How To Get Redirect URL In PHP

HTTP redirects usually have the response status 301 or 302 and provide the redirection URL in the “Location” header. I’ve written three complementary PHP functions that you can use to find out where an URL redirects to (based on a helpful thread at WebmasterWorld). You don’t even need CURL for this – fsockopen() will do just fine.

The PHP script

/**
 * get_redirect_url()
 * Gets the address that the provided URL redirects to,
 * or FALSE if there's no redirect. 
 *
 * @param string $url
 * @return string
 */
function get_redirect_url($url){
	$redirect_url = null; 

	$url_parts = @parse_url($url);
	if (!$url_parts) return false;
	if (!isset($url_parts['host'])) return false; //can't process relative URLs
	if (!isset($url_parts['path'])) $url_parts['path'] = '/';
	 
	$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
	if (!$sock) return false;
	 
	$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; 
	$request .= 'Host: ' . $url_parts['host'] . "\r\n"; 
	$request .= "Connection: Close\r\n\r\n"; 
	fwrite($sock, $request);
	$response = '';
	while(!feof($sock)) $response .= fread($sock, 8192);
	fclose($sock);

	if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
		if ( substr($matches[1], 0, 1) == "/" )
			return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
		else
			return trim($matches[1]);
 
	} else {
		return false;
	}
	
}

/**
 * get_all_redirects()
 * Follows and collects all redirects, in order, for the given URL. 
 *
 * @param string $url
 * @return array
 */
function get_all_redirects($url){
	$redirects = array();
	while ($newurl = get_redirect_url($url)){
		if (in_array($newurl, $redirects)){
			break;
		}
		$redirects[] = $newurl;
		$url = $newurl;
	}
	return $redirects;
}

/**
 * get_final_url()
 * Gets the address that the URL ultimately leads to. 
 * Returns $url itself if it isn't a redirect.
 *
 * @param string $url
 * @return string
 */
function get_final_url($url){
	$redirects = get_all_redirects($url);
	if (count($redirects)>0){
		return array_pop($redirects);
	} else {
		return $url;
	}
}

Here’s an example that lists all URLs that a given address redirects to (in order) :

$rez = get_all_redirects('http://daerils.gtrends.hop.clickbank.net/');
print_r($rez);

Known Issues

Most likely you won’t ever run into one of these, but here they are anyway :

  • The script doesn’t recognize infinite redirects that don’t form a loop. However, it can handle normal redirection loops – get_all_redirects() exits as soon as it encounters an URL that it has already seen.
  • Relative redirects multiple (e.g. “Location: go.php?asdf”) won’t be fully followed by get_all_redirects().
  • Not an issue per-se, yet something to note : these functions won’t tell you if an URL is valid, just what it redirects to (if anything).

On a related note, check out the Firefox extension Redirect Remover.

Related posts :

66 Responses to “How To Get Redirect URL In PHP”

  1. underworld says:

    Nice Script dude – thanks 😀

  2. jack says:

    hi, this looks good, but can you tell me why it doesnt work with links on this newspapers site (try clicking one): http://www.onlinenewspapers.com/denmark.htm

    the link destination is something like http://lt.webwombat.com/lt.php?15660 but you immediately get forwarded to the actual news site… but i cant seem to extract the redirected url.

    thanks

  3. White Shadow says:

    I think that site purposefully detects automated scripts and handles them differently than normal users, causing this problem. You might be able to fool them by setting the User-Agend header to some common web browser’s UA.

  4. jack says:

    thanks for the reply. i tried lots of methods, but don’t really know what im doing, and i didnt get it working. i don’t suppose there’s a simple edit to your get_redirect_url() that you could share? i can follow this site’s links from my browser, and my $HTTP_SERVER_VARS[‘HTTP_USER_AGENT’] returns: “Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1”

    so that must be a good user-agent to use…

    don’t bother if it’s hassle…
    cheers

  5. White Shadow says:

    How about adding something like this after the “…Host:…” line in get_redirect_url() :

    $request .= "User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1\r\n";

    Just an idea, I haven’t tested it.

  6. jack says:

    hey, yer i tried that (have actually edited my php.ini to set the user agent to that), but it still doesnt work. i’ve given up on it tbh. i had already moved on to wikipedia, which originally gave me

    Warning: file_get_contents(http://ar.wikipedia.org/wiki/1955) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in…

    but that user agent fix worked there, so im ok!

    thanks for your time.

  7. tyzy says:

    Awesome script! Saved me some time from writing this myself. Keep up the great coding! Thanks!

  8. Salsan Jose says:

    It was a nice script.
    I was looking a function like this.
    Thank you so much…

  9. Kris Rosario says:

    Thank You so much for sharing this. I have learned a lot by trying to get this to work on my own with libxml, xmllint, xpath, curl, wget and the like, with only segments working properly. Now your script shows a direct way to get this done, all within PHP as it should be, without filesystem hacks or workarounds. Thank you for teaching me something new. I’ll be studying how and why this works for a while so I’m not using someone’s work without understanding it. I dub thee Yoda, lol.

  10. White Shadow says:

    Hehe, thanks 🙂

  11. febox says:

    Very nice script man… Worked here like a charm! Thanks!

    But now i’m trying something else and not getting it right… I’m trying to capture the redirect url for some IMDB queries, like:

    http://www.imdb.com/find?s=tt&q=deus+%E9+brasileiro&x=0&y=0

    I know that only registered users can disable redirect on IMDB, so, i need to know whats the new URL… any chances?

  12. White Shadow says:

    I don’t have time to investigate this right now, but here’s something : add

    echo $response;

    after the fclose($sock) line. This will output the server’s response and that should provide some clues why the function doesn’t work.

    I’m guessing IMDB has some kind of protection against bots.

  13. max says:

    exactly what i’ve been trying to find, thanks a lot

  14. Rohit says:

    Hi,
    First of all, thank you every much, this code works like a charm for me,
    But have a small question, how do you make it work if the query has some thing like this “query=vla1+val2. Some how it does not pick up val2. any thing the, if there is a space that come out with a +.
    Heres what i did,
    I have a form that a user inputs queryinto and it i sent to the php file like this
    http://www.myserver.com/test.php?query=val1+val2 = url in the brower.

    i see that if i have one one variable it works fine, but if the variale is some thing like a+b only “a” gets captured and not “b”.
    How can i make this work..
    thanks again for this life saving script..

  15. White Shadow says:

    Hmm, I’m not sure I completely understand what you’re trying to do, but I don’t see why the second part of the variable wouldn’t be captured. Have you tried doing something like

    print_r($_GET);

    to verify that it’s really discarded by the server, instead of a bug somewhere in your script?

  16. Rohit says:

    hi,
    i did a echo $response; after the fclose($sock) and i saw that the first url was shown as query=val1 val2, noticed that the “+” was gone, i tried to replace the space with%20 still same only val1 was picked. any comments

  17. White Shadow says:

    I’ve no idea why this would happen. I tried a few tests myself using a query that contains a space, and it was processed correctly. Maybe you can give me a real example that I could test?

  18. N says:

    Bug fix:

    if (preg_match(‘/^Location: (.+?)$/m’, $response, $matches)){
    if ( substr($matches[1], 0, 1) == “/” )
    return $url_parts[‘scheme’] . “://” . $url_parts[‘host’] . trim($matches[1]);
    else
    return trim($matches[1]);

    } else {
    return false;
    }

  19. White Shadow says:

    Thanks for the fix. I hadn’t considered relative URLs at all.

  20. Emprivo says:

    Tried and tested function (follows javascript redirects as well):

    function get_final_url( $url, $timeout = 5 )
    {
    	$url = str_replace( "&", "&", urldecode(trim($url)) );
    
    	$cookie = tempnam ("/tmp", "CURLCOOKIE");
        $ch = curl_init();
        curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
    	curl_setopt( $ch, CURLOPT_URL, $url );
    	curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
    	curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    	curl_setopt( $ch, CURLOPT_ENCODING, "" );
    	curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    	curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
    	curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
    	curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
    	curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
    	$content = curl_exec( $ch );
    	$response = curl_getinfo( $ch );
    	curl_close ( $ch );
    
    	if ($response['http_code'] == 301 || $response['http_code'] == 302)
    	{
    		ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
    		$headers = get_headers($response['url']);
    
    		$location = "";
    		foreach( $headers as $value )
    		{
    			if ( substr( strtolower($value), 0, 9 ) == "location:" )
    				return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
    		}
    	}
    
    
    	if (	preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
    			preg_match("/window\.location\=\"(.*)\"/i", $content, $value)
    	)
    	{
    		return get_final_url ( $value[1] );
    	}
    	else
    	{
    		return $response['url'];
    	}
    }

Leave a Reply