<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To Extract All URLs From A Page Using PHP</title>
	<atom:link href="http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/feed/" rel="self" type="application/rss+xml" />
	<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/</link>
	<description>Slightly Advanced Computer Stuff (and some magic)</description>
	<lastBuildDate>Thu, 18 Mar 2010 17:05:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33318</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Sun, 10 Jan 2010 17:53:25 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33318</guid>
		<description>To make the script use a proxy, add &lt;pre lang=&quot;php&quot;&gt;curl_setopt($ch, CURLOPT_PROXY, proxyip);&lt;/pre&gt; and &lt;pre lang=&quot;php&quot;&gt;curl_setopt($ch, CURLOPT_PROXYPORT, portnumber)&lt;/pre&gt; after the curl_init() call in the crawl_page() function. More information can be found in the &lt;a href=&quot;http://php.net/manual/en/function.curl-setopt.php&quot; rel=&quot;nofollow&quot;&gt;curl_setopt documentation&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>To make the script use a proxy, add</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_PROXY<span style="color: #339933;">,</span> proxyip<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p> and</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_PROXYPORT<span style="color: #339933;">,</span> portnumber<span style="color: #009900;">&#41;</span></pre></div></div>

<p> after the curl_init() call in the crawl_page() function. More information can be found in the <a href="http://php.net/manual/en/function.curl-setopt.php" rel="nofollow">curl_setopt documentation</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Your fan</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33316</link>
		<dc:creator>Your fan</dc:creator>
		<pubDate>Sun, 10 Jan 2010 17:25:07 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33316</guid>
		<description>Hi....
You did a great work....
but plz will you help me in some problem....
I am using net whose traffic go through proxy server:

HTTP proxy: 172.16.0.9
Port:8080

Above script run fines when i use direct connection with no proxy,but doesnt works when i use it in my college where proxy is needed...
plz help me how to tunnel traffic of your code through this proxy.......
Will be greatly thankful....
You are really gr8,i am using wamp5...</description>
		<content:encoded><![CDATA[<p>Hi&#8230;.<br />
You did a great work&#8230;.<br />
but plz will you help me in some problem&#8230;.<br />
I am using net whose traffic go through proxy server:</p>
<p>HTTP proxy: 172.16.0.9<br />
Port:8080</p>
<p>Above script run fines when i use direct connection with no proxy,but doesnt works when i use it in my college where proxy is needed&#8230;<br />
plz help me how to tunnel traffic of your code through this proxy&#8230;&#8230;.<br />
Will be greatly thankful&#8230;.<br />
You are really gr8,i am using wamp5&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33081</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Tue, 22 Dec 2009 17:01:25 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33081</guid>
		<description>You can do that by defining the $url variable as global in the function, like this : 
&lt;pre lang=&quot;php&quot;&gt;
function real_links($matches){
    global $url;
    return &#039;src=&quot;&#039;.get_src($url, $matches[1]).&#039;&quot;&#039;;
}
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>You can do that by defining the $url variable as global in the function, like this :</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> real_links<span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">global</span> <span style="color: #000088;">$url</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000ff;">'src=&quot;'</span><span style="color: #339933;">.</span>get_src<span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #339933;">,</span> <span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&quot;'</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Trigger</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33051</link>
		<dc:creator>Neil Trigger</dc:creator>
		<pubDate>Sat, 19 Dec 2009 22:48:13 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33051</guid>
		<description>function real_links($matches){
	return &#039;src=&quot;&#039;.get_src(&#039;http://www.google.com/one/two/&#039;,$matches[1]).&#039;&quot;&#039;;
}
$page=preg_replace_callback(&#039;~src=&quot;(.*?)&quot;~&#039;,&#039;real_links&#039;,$page);
echo $page;</description>
		<content:encoded><![CDATA[<p>function real_links($matches){<br />
	return &#8217;src=&#8221;&#8216;.get_src(&#8216;http://www.google.com/one/two/&#8217;,$matches[1]).&#8217;&#8221;&#8216;;<br />
}<br />
$page=preg_replace_callback(&#8216;~src=&#8221;(.*?)&#8221;~&#8217;,'real_links&#8217;,$page);<br />
echo $page;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Trigger</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33050</link>
		<dc:creator>Neil Trigger</dc:creator>
		<pubDate>Sat, 19 Dec 2009 22:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33050</guid>
		<description>This shout box seems to cut off my code. Here&#039;s the last part:
</description>
		<content:encoded><![CDATA[<p>This shout box seems to cut off my code. Here&#8217;s the last part:</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Trigger</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33049</link>
		<dc:creator>Neil Trigger</dc:creator>
		<pubDate>Sat, 19 Dec 2009 22:45:27 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33049</guid>
		<description>Sorry, yes... I&#039;m making an application which can take a website code and render it in the browser, with some changes made to it (kinda like a spell-check).

I managed to get most of it working, but have an issue with putting a variable into the get_src function at the bottom. I want to replace the hard-coded URL for google with the pre-defined one at the top of the code.
So far I have this:

&lt;?php
$url=&#039;http://www.google.com/one_level_up/2_levels_up/3levels_up/&#039;;
$path = &#039;../../intl/en/images/logo.gif&#039;; #this should display
$path2 = &#039;../../../intl/en/images/logo.gif&#039;; # this should not
$page=&#039; Don\&#039;t Display This: &#039;;
#------------------------- Function Below --------------------------------
function get_src($tmp_url,$path){
$tmp_url = rtrim($tmp_url, &#039;/&#039;);
$path = ltrim($path, &#039;\\&#039;);
if(($num_of_them = substr_count($path, &#039;../&#039;)) &gt; 0) {
    $tmp_url = preg_replace(&quot;#(/[a-z0-9-]+){{$num_of_them}}$#iD&quot;, &#039;&#039;, $tmp_url);
    $path = $tmp_url . &#039;/&#039;. str_replace(&#039;../&#039;, &#039;&#039;, $path);
}
else{
	$path=str_replace($path,($tmp_url.&#039;/&#039;.$path),$path);
}
    return $path;
	}
?&gt;Display This:


I&#039;ve been searching for something like this for ages, so hopefully someone may find it useful.</description>
		<content:encoded><![CDATA[<p>Sorry, yes&#8230; I&#8217;m making an application which can take a website code and render it in the browser, with some changes made to it (kinda like a spell-check).</p>
<p>I managed to get most of it working, but have an issue with putting a variable into the get_src function at the bottom. I want to replace the hard-coded URL for google with the pre-defined one at the top of the code.<br />
So far I have this:</p>
<p>&lt;?php<br />
$url=&#039;http://www.google.com/one_level_up/2_levels_up/3levels_up/&#039;;<br />
$path = &#039;../../intl/en/images/logo.gif&#039;; #this should display<br />
$path2 = &#039;../../../intl/en/images/logo.gif&#039;; # this should not<br />
$page=&#039; Don\&#8217;t Display This: &#8216;;<br />
#&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- Function Below &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
function get_src($tmp_url,$path){<br />
$tmp_url = rtrim($tmp_url, &#8216;/&#8217;);<br />
$path = ltrim($path, &#8216;\\&#8217;);<br />
if(($num_of_them = substr_count($path, &#8216;../&#8217;)) &gt; 0) {<br />
    $tmp_url = preg_replace(&#8220;#(/[a-z0-9-]+){{$num_of_them}}$#iD&#8221;, &#8221;, $tmp_url);<br />
    $path = $tmp_url . &#8216;/&#8217;. str_replace(&#8216;../&#8217;, &#8221;, $path);<br />
}<br />
else{<br />
	$path=str_replace($path,($tmp_url.&#8217;/&#8217;.$path),$path);<br />
}<br />
    return $path;<br />
	}<br />
?&gt;Display This:</p>
<p>I&#8217;ve been searching for something like this for ages, so hopefully someone may find it useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33046</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Sat, 19 Dec 2009 21:04:24 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33046</guid>
		<description>Well, I still don&#039;t get what you&#039;re trying to do. However, look into preg_replace_callback, it lets you use a function for replacements.</description>
		<content:encoded><![CDATA[<p>Well, I still don&#8217;t get what you&#8217;re trying to do. However, look into preg_replace_callback, it lets you use a function for replacements.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Trigger</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33045</link>
		<dc:creator>Neil Trigger</dc:creator>
		<pubDate>Sat, 19 Dec 2009 20:59:29 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33045</guid>
		<description>I managed to work this much out:


 0) {
    $url = preg_replace(&quot;#(/[a-z0-9-]+){{$num_of_them}}$#iD&quot;, &#039;&#039;, $url);
    $path = $url . &#039;/&#039;. str_replace(&#039;../&#039;, &#039;&#039;, $path);
}
else{
	$path=str_replace($path,($url.&#039;/&#039;.$path),$path);
}
    return $path;
	}
	
	echo &#039;&#039;;
?&gt;

Now I just need to work out how to make this pseudo code work:
$pattern=&#039;~src=&quot;(.?*)&quot;~&#039;;
	$new_url=&#039;~src=&quot;get_src($1,$path)&quot;~&#039;;
	$page = preg_replace($pattern, $new_url, $page);</description>
		<content:encoded><![CDATA[<p>I managed to work this much out:</p>
<p> 0) {<br />
    $url = preg_replace(&#8220;#(/[a-z0-9-]+){{$num_of_them}}$#iD&#8221;, &#8221;, $url);<br />
    $path = $url . &#8216;/&#8217;. str_replace(&#8216;../&#8217;, &#8221;, $path);<br />
}<br />
else{<br />
	$path=str_replace($path,($url.&#8217;/&#8217;.$path),$path);<br />
}<br />
    return $path;<br />
	}</p>
<p>	echo &#8221;;<br />
?&gt;</p>
<p>Now I just need to work out how to make this pseudo code work:<br />
$pattern=&#8217;~src=&#8221;(.?*)&#8221;~&#8217;;<br />
	$new_url=&#8217;~src=&#8221;get_src($1,$path)&#8221;~&#8217;;<br />
	$page = preg_replace($pattern, $new_url, $page);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33044</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Sat, 19 Dec 2009 20:55:55 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33044</guid>
		<description>I don&#039;t think there is an easy way to do that. The function isn&#039;t really suited for such use; you&#039;d probably have to figure out how it works and write a custom function for your specific situation.

Are you, by any chance, trying to download an entire site and rewrite the link paths so that it displays properly, etc? If so, I&#039;m pretty sure there are already existing applications that can do that. Maybe it would be easier to use one of those instead of writing your own.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t think there is an easy way to do that. The function isn&#8217;t really suited for such use; you&#8217;d probably have to figure out how it works and write a custom function for your specific situation.</p>
<p>Are you, by any chance, trying to download an entire site and rewrite the link paths so that it displays properly, etc? If so, I&#8217;m pretty sure there are already existing applications that can do that. Maybe it would be easier to use one of those instead of writing your own.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Trigger</title>
		<link>http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/comment-page-2/#comment-33040</link>
		<dc:creator>Neil Trigger</dc:creator>
		<pubDate>Sat, 19 Dec 2009 14:54:09 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/#comment-33040</guid>
		<description>I&#039;m trying to work out how to do this for the whole site... Currently I have the script working with 2 simple variables, but I need this function to work within a preg-replace. most of what I want is working, but finding the absolute path is vital to make sure the output of CSS layouts works properly.

I&#039;m currently doing this:
$page = preg_replace($pattern, $replacement, $page);

How do I add your function in there?</description>
		<content:encoded><![CDATA[<p>I&#8217;m trying to work out how to do this for the whole site&#8230; Currently I have the script working with 2 simple variables, but I need this function to work within a preg-replace. most of what I want is working, but finding the absolute path is vital to make sure the output of CSS layouts works properly.</p>
<p>I&#8217;m currently doing this:<br />
$page = preg_replace($pattern, $replacement, $page);</p>
<p>How do I add your function in there?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
