<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To Check If Page Exists With CURL</title>
	<atom:link href="http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/feed/" rel="self" type="application/rss+xml" />
	<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/</link>
	<description>Slightly Advanced Computer Stuff (and some magic)</description>
	<lastBuildDate>Sat, 21 Nov 2009 04:22:17 +0200</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: sagoral</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-30939</link>
		<dc:creator>sagoral</dc:creator>
		<pubDate>Sat, 25 Jul 2009 16:16:17 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-30939</guid>
		<description>These functions always send OK so i couldn&#039;t validate the urls then i used fopen but it didn&#039;t work, too...</description>
		<content:encoded><![CDATA[<p>These functions always send OK so i couldn&#8217;t validate the urls then i used fopen but it didn&#8217;t work, too&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12203</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Tue, 08 Jul 2008 18:05:16 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12203</guid>
		<description>Okay, done.</description>
		<content:encoded><![CDATA[<p>Okay, done.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12202</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Tue, 08 Jul 2008 17:22:16 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12202</guid>
		<description>Hey, i hate to keep posting but please replace this line for me, it was wrong:

this is correct:
    $code=intval($matches[1][(count($matches[1])-1)]);</description>
		<content:encoded><![CDATA[<p>Hey, i hate to keep posting but please replace this line for me, it was wrong:</p>
<p>this is correct:<br />
    $code=intval($matches[1][(count($matches[1])-1)]);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12192</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 20:42:05 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12192</guid>
		<description>I see. BTW, I edited your comment to add some syntax highlighting.</description>
		<content:encoded><![CDATA[<p>I see. BTW, I edited your comment to add some syntax highlighting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12191</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 20:23:22 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12191</guid>
		<description>Hey sorry scratch that you were actually correct there is no point in adding CURLOPT_HTTPGET because that just basically nulls out NOBODY and returns the whole response heh.

The only reason I&#039;m posting again is because I found another bug in the function, i hope you don&#039;t mind WShadow, but I&#039;m posting a complete fixed version of your function to help others out..

i modified it a bit for my personal usage, because a timeout doesn&#039;t necessarily mean the page is invalid completely..could just be a temp server problem.

note: this function does not only grab headers, to me it&#039;s not worth gaining the efficiency by running a script that sometimes fails..heh
&lt;pre lang=&#039;php&#039;&gt;
function url_exists($url){
  /* this script will return:
    1 for a valid page
    2 for a timed out page
    3 for an invalid page
  */

  $parts=parse_url($url);
  if(!$parts) {
    return 3; /* the URL was seriously wrong */
  }

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);

  /* set the user agent - might help, doesn&#039;t hurt */
  curl_setopt($ch, CURLOPT_USERAGENT, &#039;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&#039;);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

  /* try to follow redirects */
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

  /* timeout after the specified number of seconds. assuming that this script runs
    on a server, 20 seconds should be plenty of time to verify a valid URL.  */
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);

  /* don&#039;t download the page, just the header (much faster in this case) */
  curl_setopt($ch, CURLOPT_HEADER, true);

  /* handle HTTPS links */
  if($parts[&#039;scheme&#039;]==&#039;https&#039;){
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  }

  $response = curl_exec($ch);
  $error = curl_error($ch);
  curl_close($ch);

  /*  get the LAST status code from HTTP headers */
  if(preg_match_all(&#039;/HTTP\/1\.\d+\s+(\d+)/&#039;, $response, $matches)){
    $code=intval($matches[1][(count($matches[1])-1)]);
  } else {
    if(eregi(&#039;operation timed out&#039;, $error)){
      //timed out
      return 2;
    }else {
      //not found
      return 3;
    }
  };

  /* see if code indicates success */
  if(($code&gt;=200) &amp;&amp; ($code&lt;400)){
    //success
    return 1;
  }else {
    //not found
    return 3;
  }
}&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>Hey sorry scratch that you were actually correct there is no point in adding CURLOPT_HTTPGET because that just basically nulls out NOBODY and returns the whole response heh.</p>
<p>The only reason I&#8217;m posting again is because I found another bug in the function, i hope you don&#8217;t mind WShadow, but I&#8217;m posting a complete fixed version of your function to help others out..</p>
<p>i modified it a bit for my personal usage, because a timeout doesn&#8217;t necessarily mean the page is invalid completely..could just be a temp server problem.</p>
<p>note: this function does not only grab headers, to me it&#8217;s not worth gaining the efficiency by running a script that sometimes fails..heh</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> url_exists<span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
  <span style="color: #666666; font-style: italic;">/* this script will return:
    1 for a valid page
    2 for a timed out page
    3 for an invalid page
  */</span>
&nbsp;
  <span style="color: #000088;">$parts</span><span style="color: #339933;">=</span><span style="color: #990000;">parse_url</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span><span style="color: #000088;">$parts</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">3</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">/* the URL was seriously wrong */</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000088;">$ch</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_init</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_URL<span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* set the user agent - might help, doesn't hurt */</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_USERAGENT<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_RETURNTRANSFER<span style="color: #339933;">,</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* try to follow redirects */</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_FOLLOWLOCATION<span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* timeout after the specified number of seconds. assuming that this script runs
    on a server, 20 seconds should be plenty of time to verify a valid URL.  */</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_CONNECTTIMEOUT<span style="color: #339933;">,</span> <span style="color: #cc66cc;">15</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_TIMEOUT<span style="color: #339933;">,</span> <span style="color: #cc66cc;">20</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* don't download the page, just the header (much faster in this case) */</span>
  <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_HEADER<span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* handle HTTPS links */</span>
  <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$parts</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'scheme'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">==</span><span style="color: #0000ff;">'https'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
        <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_SSL_VERIFYHOST<span style="color: #339933;">,</span>  <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_SSL_VERIFYPEER<span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">false</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000088;">$response</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_exec</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #000088;">$error</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_error</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #990000;">curl_close</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/*  get the LAST status code from HTTP headers */</span>
  <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/HTTP\/1\.\d+\s+(\d+)/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$response</span><span style="color: #339933;">,</span> <span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    <span style="color: #000088;">$code</span><span style="color: #339933;">=</span><span style="color: #990000;">intval</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">count</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">-</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">eregi</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'operation timed out'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$error</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">//timed out</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #666666; font-style: italic;">//not found</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">3</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #666666; font-style: italic;">/* see if code indicates success */</span>
  <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$code</span><span style="color: #339933;">&gt;=</span><span style="color: #cc66cc;">200</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$code</span><span style="color: #339933;">&lt;</span><span style="color: #cc66cc;">400</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">//success</span>
    <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span><span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">//not found</span>
    <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">3</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12188</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 18:49:27 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12188</guid>
		<description>Well no, CURLOPT_NOBODY is a good idea..if you&#039;re using this for a script verifying a database of thousands of URLs CURLOPT_NOBODY can significantly speed up the process.

Fact is i used the original function and i got invalid results, with urls that were prefectly fine being rejected as not valid. Leave NOBODY in tact and add HTTPGET and you have yourself a flawless script, great job.</description>
		<content:encoded><![CDATA[<p>Well no, CURLOPT_NOBODY is a good idea..if you&#8217;re using this for a script verifying a database of thousands of URLs CURLOPT_NOBODY can significantly speed up the process.</p>
<p>Fact is i used the original function and i got invalid results, with urls that were prefectly fine being rejected as not valid. Leave NOBODY in tact and add HTTPGET and you have yourself a flawless script, great job.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12187</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 18:17:19 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12187</guid>
		<description>Or you could just comment out the CURLOPT_NOBODY line. 

I know some servers handle HEAD requests incorrectly, but I didn&#039;t think it was common enough to leave out this option.</description>
		<content:encoded><![CDATA[<p>Or you could just comment out the CURLOPT_NOBODY line. </p>
<p>I know some servers handle HEAD requests incorrectly, but I didn&#8217;t think it was common enough to leave out this option.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12186</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 17:59:20 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12186</guid>
		<description>I wanted to post to help others who may come across the same issues I did with this function.

By setting CURLOPT_NOBODY to true, CURL will use HEAD for the request, which some servers don&#039;t like (for example, forbes) and will return &quot;Emply reply from server&quot;. To fix you need to also set CURLOPT_HTTPGET to reset back to GET request.

  /* don&#039;t download the page, just the header (much faster in this case) */
  curl_setopt($ch, CURLOPT_NOBODY, true);
  curl_setopt($ch, CURLOPT_HEADER, true);
  curl_setopt($ch, CURLOPT_HTTPGET, true);    //this is needed to fix the issue

Hope this helps!</description>
		<content:encoded><![CDATA[<p>I wanted to post to help others who may come across the same issues I did with this function.</p>
<p>By setting CURLOPT_NOBODY to true, CURL will use HEAD for the request, which some servers don&#8217;t like (for example, forbes) and will return &#8220;Emply reply from server&#8221;. To fix you need to also set CURLOPT_HTTPGET to reset back to GET request.</p>
<p>  /* don&#8217;t download the page, just the header (much faster in this case) */<br />
  curl_setopt($ch, CURLOPT_NOBODY, true);<br />
  curl_setopt($ch, CURLOPT_HEADER, true);<br />
  curl_setopt($ch, CURLOPT_HTTPGET, true);    //this is needed to fix the issue</p>
<p>Hope this helps!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: james</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-11568</link>
		<dc:creator>james</dc:creator>
		<pubDate>Thu, 20 Mar 2008 07:50:12 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-11568</guid>
		<description>Great function, and really helpful suggestions about the other methods to do it (both of which have drawbacks I&#039;ve found - fopen hates a lot of pages, and I couldn&#039;t get https:// to work with fsockopen). Much appreciated.</description>
		<content:encoded><![CDATA[<p>Great function, and really helpful suggestions about the other methods to do it (both of which have drawbacks I&#8217;ve found &#8211; fopen hates a lot of pages, and I couldn&#8217;t get https:// to work with fsockopen). Much appreciated.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Usmeamr</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-11430</link>
		<dc:creator>Usmeamr</dc:creator>
		<pubDate>Wed, 27 Feb 2008 23:07:03 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-11430</guid>
		<description>Good</description>
		<content:encoded><![CDATA[<p>Good</p>
]]></content:encoded>
	</item>
</channel>
</rss>
