<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To Check If Page Exists With CURL</title>
	<atom:link href="http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/feed/" rel="self" type="application/rss+xml" />
	<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/</link>
	<description>A blog about web development, software business, and WordPress</description>
	<lastBuildDate>Wed, 08 Feb 2012 21:10:53 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Randell</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-168736</link>
		<dc:creator>Randell</dc:creator>
		<pubDate>Thu, 14 Apr 2011 08:17:07 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-168736</guid>
		<description>The condition &quot;if($parts[&#039;scheme&#039;]==&#039;https&#039;){&quot; needs to be removed since the block inside it will not take effect if an HTTP URL is then redirected to an HTTPS URL.</description>
		<content:encoded><![CDATA[<p>The condition &#8220;if($parts['scheme']==&#8217;https&#8217;){&#8221; needs to be removed since the block inside it will not take effect if an HTTP URL is then redirected to an HTTPS URL.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lösning när bloggtoppen är nere - Mind Game Media blogg</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-159290</link>
		<dc:creator>Lösning när bloggtoppen är nere - Mind Game Media blogg</dc:creator>
		<pubDate>Tue, 04 Jan 2011 15:36:59 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-159290</guid>
		<description>[...] nere med php utan att behöva ladda ner hela hemsidan. Jo efter en enkel googling hittade jag detta scriptet som använder CURL Lite [...]</description>
		<content:encoded><![CDATA[<p>[...] nere med php utan att behöva ladda ner hela hemsidan. Jo efter en enkel googling hittade jag detta scriptet som använder CURL Lite [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-34245</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Thu, 08 Apr 2010 15:38:25 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-34245</guid>
		<description>Thanks for sharing this, White Shadow.

It seems that CURLOPT_FAILONERROR obeys CURLOPT_FOLLOWLOCATION so if you set them both to true and just check if $response equals FALSE, a 302 code will be considered valid.</description>
		<content:encoded><![CDATA[<p>Thanks for sharing this, White Shadow.</p>
<p>It seems that CURLOPT_FAILONERROR obeys CURLOPT_FOLLOWLOCATION so if you set them both to true and just check if $response equals FALSE, a 302 code will be considered valid.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sagoral</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-30939</link>
		<dc:creator>sagoral</dc:creator>
		<pubDate>Sat, 25 Jul 2009 16:16:17 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-30939</guid>
		<description>These functions always send OK so i couldn&#039;t validate the urls then i used fopen but it didn&#039;t work, too...</description>
		<content:encoded><![CDATA[<p>These functions always send OK so i couldn&#8217;t validate the urls then i used fopen but it didn&#8217;t work, too&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12203</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Tue, 08 Jul 2008 18:05:16 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12203</guid>
		<description>Okay, done.</description>
		<content:encoded><![CDATA[<p>Okay, done.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12202</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Tue, 08 Jul 2008 17:22:16 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12202</guid>
		<description>Hey, i hate to keep posting but please replace this line for me, it was wrong:

this is correct:
    $code=intval($matches[1][(count($matches[1])-1)]);</description>
		<content:encoded><![CDATA[<p>Hey, i hate to keep posting but please replace this line for me, it was wrong:</p>
<p>this is correct:<br />
    $code=intval($matches[1][(count($matches[1])-1)]);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12192</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 20:42:05 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12192</guid>
		<description>I see. BTW, I edited your comment to add some syntax highlighting.</description>
		<content:encoded><![CDATA[<p>I see. BTW, I edited your comment to add some syntax highlighting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12191</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 20:23:22 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12191</guid>
		<description>Hey sorry scratch that you were actually correct there is no point in adding CURLOPT_HTTPGET because that just basically nulls out NOBODY and returns the whole response heh.

The only reason I&#039;m posting again is because I found another bug in the function, i hope you don&#039;t mind WShadow, but I&#039;m posting a complete fixed version of your function to help others out..

i modified it a bit for my personal usage, because a timeout doesn&#039;t necessarily mean the page is invalid completely..could just be a temp server problem.

note: this function does not only grab headers, to me it&#039;s not worth gaining the efficiency by running a script that sometimes fails..heh
&lt;pre lang=&#039;php&#039;&gt;
function url_exists($url){
  /* this script will return:
    1 for a valid page
    2 for a timed out page
    3 for an invalid page
  */

  $parts=parse_url($url);
  if(!$parts) {
    return 3; /* the URL was seriously wrong */
  }

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);

  /* set the user agent - might help, doesn&#039;t hurt */
  curl_setopt($ch, CURLOPT_USERAGENT, &#039;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&#039;);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

  /* try to follow redirects */
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

  /* timeout after the specified number of seconds. assuming that this script runs
    on a server, 20 seconds should be plenty of time to verify a valid URL.  */
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);

  /* don&#039;t download the page, just the header (much faster in this case) */
  curl_setopt($ch, CURLOPT_HEADER, true);

  /* handle HTTPS links */
  if($parts[&#039;scheme&#039;]==&#039;https&#039;){
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  }

  $response = curl_exec($ch);
  $error = curl_error($ch);
  curl_close($ch);

  /*  get the LAST status code from HTTP headers */
  if(preg_match_all(&#039;/HTTP\/1\.\d+\s+(\d+)/&#039;, $response, $matches)){
    $code=intval($matches[1][(count($matches[1])-1)]);
  } else {
    if(eregi(&#039;operation timed out&#039;, $error)){
      //timed out
      return 2;
    }else {
      //not found
      return 3;
    }
  };

  /* see if code indicates success */
  if(($code&gt;=200) &amp;&amp; ($code&lt;400)){
    //success
    return 1;
  }else {
    //not found
    return 3;
  }
}&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>Hey sorry scratch that you were actually correct there is no point in adding CURLOPT_HTTPGET because that just basically nulls out NOBODY and returns the whole response heh.</p>
<p>The only reason I&#8217;m posting again is because I found another bug in the function, i hope you don&#8217;t mind WShadow, but I&#8217;m posting a complete fixed version of your function to help others out..</p>
<p>i modified it a bit for my personal usage, because a timeout doesn&#8217;t necessarily mean the page is invalid completely..could just be a temp server problem.</p>
<p>note: this function does not only grab headers, to me it&#8217;s not worth gaining the efficiency by running a script that sometimes fails..heh</p>
<pre lang='php'>
function url_exists($url){
  /* this script will return:
    1 for a valid page
    2 for a timed out page
    3 for an invalid page
  */

  $parts=parse_url($url);
  if(!$parts) {
    return 3; /* the URL was seriously wrong */
  }

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);

  /* set the user agent - might help, doesn't hurt */
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

  /* try to follow redirects */
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

  /* timeout after the specified number of seconds. assuming that this script runs
    on a server, 20 seconds should be plenty of time to verify a valid URL.  */
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);

  /* don't download the page, just the header (much faster in this case) */
  curl_setopt($ch, CURLOPT_HEADER, true);

  /* handle HTTPS links */
  if($parts['scheme']=='https'){
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  }

  $response = curl_exec($ch);
  $error = curl_error($ch);
  curl_close($ch);

  /*  get the LAST status code from HTTP headers */
  if(preg_match_all('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){
    $code=intval($matches[1][(count($matches[1])-1)]);
  } else {
    if(eregi('operation timed out', $error)){
      //timed out
      return 2;
    }else {
      //not found
      return 3;
    }
  };

  /* see if code indicates success */
  if(($code>=200) &#038;&#038; ($code&lt;400)){
    //success
    return 1;
  }else {
    //not found
    return 3;
  }
}</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Renfrow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12188</link>
		<dc:creator>Kyle Renfrow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 18:49:27 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12188</guid>
		<description>Well no, CURLOPT_NOBODY is a good idea..if you&#039;re using this for a script verifying a database of thousands of URLs CURLOPT_NOBODY can significantly speed up the process.

Fact is i used the original function and i got invalid results, with urls that were prefectly fine being rejected as not valid. Leave NOBODY in tact and add HTTPGET and you have yourself a flawless script, great job.</description>
		<content:encoded><![CDATA[<p>Well no, CURLOPT_NOBODY is a good idea..if you&#8217;re using this for a script verifying a database of thousands of URLs CURLOPT_NOBODY can significantly speed up the process.</p>
<p>Fact is i used the original function and i got invalid results, with urls that were prefectly fine being rejected as not valid. Leave NOBODY in tact and add HTTPGET and you have yourself a flawless script, great job.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12187</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Mon, 07 Jul 2008 18:17:19 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/#comment-12187</guid>
		<description>Or you could just comment out the CURLOPT_NOBODY line. 

I know some servers handle HEAD requests incorrectly, but I didn&#039;t think it was common enough to leave out this option.</description>
		<content:encoded><![CDATA[<p>Or you could just comment out the CURLOPT_NOBODY line. </p>
<p>I know some servers handle HEAD requests incorrectly, but I didn&#8217;t think it was common enough to leave out this option.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

