<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To Extract HTML Tags And Their Attributes With PHP</title>
	<atom:link href="http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/feed/" rel="self" type="application/rss+xml" />
	<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/</link>
	<description>A blog about web development, software business, and WordPress</description>
	<lastBuildDate>Wed, 08 Feb 2012 21:10:53 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: تست 2 &#124; برنامه نویسی</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-179232</link>
		<dc:creator>تست 2 &#124; برنامه نویسی</dc:creator>
		<pubDate>Sat, 17 Sep 2011 20:56:43 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-179232</guid>
		<description>[...] gerard agber says:  July 13, 2010 at 19:21 [...]</description>
		<content:encoded><![CDATA[<p>[...] gerard agber says:  July 13, 2010 at 19:21 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Raj</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-170383</link>
		<dc:creator>Raj</dc:creator>
		<pubDate>Mon, 13 Jun 2011 11:22:04 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-170383</guid>
		<description>Thanks... Really helpfull</description>
		<content:encoded><![CDATA[<p>Thanks&#8230; Really helpfull</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: urimm</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-169785</link>
		<dc:creator>urimm</dc:creator>
		<pubDate>Thu, 26 May 2011 07:58:18 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-169785</guid>
		<description>Those snippets are AWESOME. 
Thank you very much, that helped me a lot!</description>
		<content:encoded><![CDATA[<p>Those snippets are AWESOME.<br />
Thank you very much, that helped me a lot!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-168728</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Wed, 13 Apr 2011 15:01:59 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-168728</guid>
		<description>It appears you are unfamiliar with PHP (to say the least). You need to quote your strings. Use &#039;script&#039; and &#039;src&#039;, not just script and src without quotes.</description>
		<content:encoded><![CDATA[<p>It appears you are unfamiliar with PHP (to say the least). You need to quote your strings. Use &#8216;script&#8217; and &#8216;src&#8217;, not just script and src without quotes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: osama</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-168727</link>
		<dc:creator>osama</dc:creator>
		<pubDate>Wed, 13 Apr 2011 14:49:24 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-168727</guid>
		<description>i cannt extract script  tag and its attribute using this code why!!!!!!!!!
 
$dom = new DOMDocument;
 
//Parse the HTML. The @ is used to suppress any parsing errors
 //that will be thrown if the $html string isn’t valid XHTML.
 @$dom-&gt;loadHTML($html);
 
//Get all links. You could also use any other tag name here,
 //like ‘img’ or ‘table’, to extract other tags.
 $links = $dom-&gt;getElementsByTagName(script);
 
//Iterate over the extracted links and display their URLs
 foreach ($links as $link){
 //Extract and show the “href” attribute.
 echo $link-&gt;getAttribute(src), ”;
 }
 please can anyone help me to extract the script tag ???
 thanx</description>
		<content:encoded><![CDATA[<p>i cannt extract script  tag and its attribute using this code why!!!!!!!!!</p>
<p>$dom = new DOMDocument;</p>
<p>//Parse the HTML. The @ is used to suppress any parsing errors<br />
 //that will be thrown if the $html string isn’t valid XHTML.<br />
 @$dom-&gt;loadHTML($html);</p>
<p>//Get all links. You could also use any other tag name here,<br />
 //like ‘img’ or ‘table’, to extract other tags.<br />
 $links = $dom-&gt;getElementsByTagName(script);</p>
<p>//Iterate over the extracted links and display their URLs<br />
 foreach ($links as $link){<br />
 //Extract and show the “href” attribute.<br />
 echo $link-&gt;getAttribute(src), ”;<br />
 }<br />
 please can anyone help me to extract the script tag ???<br />
 thanx</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andrewp</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-65768</link>
		<dc:creator>andrewp</dc:creator>
		<pubDate>Sat, 17 Jul 2010 22:19:27 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-65768</guid>
		<description>Nice Code. Thanks! Here is my JavaScript version of your code with some improvements.
1. Ignore commented tags (optional)
2. Convert names of attributes to lower case (optional)
3. Extract non-value attributes ()

&lt;pre lang=&quot;javascript&quot;&gt;////////////////////////////////////////////////////////////////////////////
// flags = &quot;i&#124;[s&#124;S]&#124;r&#124;o&quot;;
// i - ignore commented
// s - self closing - yes
// S - self closing - no
// r - return the entire tag&quot;;
// o - do not convert attr names to lower case
////////////////////////////////////////////////////////////////////////////
function html_tags(html, tag, flags)
{
    flags = AP.is_set(flags, &quot;&quot;);
    
    // Remove all comment
    if ( flags.match(/i/) )
    {
        html = html.replace(/()/g, &quot;&quot;);
    }
    
    if ( AP.is_set(tag.join) )
    {
        tag = tag.join(&quot;&#124;&quot;);
    }
    
    //If the user didn&#039;t specify if $tag is a self-closing tag we try to auto-detect it
    //by checking against a list of known self-closing tags.
    var selfclosing = null;
    if ( flags.match(/s/) )
    {
        selfclosing = true;
    }
    else
    if ( flags.match(/S/) )
    {
        selfclosing = false;
    }
    else
    {
        var selfclosing_tags = &quot;&quot;;
        selfclosing = (selfclosing_tags.indexOf(&quot;&quot;) != -1 ? true : false);
    }
	
    //The regexp is different for normal and self-closing tags because I can&#039;t figure out 
    //how to make a sufficiently robust unified one.
    var tag_pattern = &quot;&quot;;
    tag_pattern += &quot;&lt;(&quot; + tag + &quot;)&quot;; // ]+)?&quot;; // attributes, if any
    if ( selfclosing )
    {
        tag_pattern += &quot;\\s*/?&gt;&quot;; // /&gt; or just &gt;, being lenient here 
    }
    else
    {
        tag_pattern += &quot;\\s*&gt;&quot;; // &gt;
        tag_pattern += &quot;((.&#124;\r&#124;\n)*?)&quot;; // tag contents
        tag_pattern += &quot;&quot;; // the closing 
    }
    tag_pattern = new RegExp(tag_pattern, &quot;ig&quot;);
    
    var attribute_pattern = &quot;&quot;;
    attribute_pattern += &quot;(\\w+)&quot;; // attribute name
    attribute_pattern += &quot;(&quot;;
    attribute_pattern += &quot;\\s*=\\s*&quot;;
    attribute_pattern += &quot;(&quot;;
    attribute_pattern +=   &quot;([\\\&quot;&#039;])((.&#124;\r&#124;\n)*?)\\4&quot;; // a quoted value
    attribute_pattern +=   &quot;&#124;&quot;; // or
    attribute_pattern +=   &quot;([^\\s\&quot;&#039;]+?)(?:\\s+&#124;$)&quot;; // an unquoted value (terminated by whitespace or EOF) 
    attribute_pattern += &quot;)&quot;;
    attribute_pattern += &quot;)?&quot;;
    attribute_pattern = new RegExp(attribute_pattern, &quot;ig&quot;);
    
	// Find all tags
    var matches = html.match(tag_pattern);
    if ( !matches )
    {
        //Return an empty array if we didn&#039;t find anything
        return [];
    }
    
    var tags = [];
    for ( var loop = 0; loop value array
                for ( var loopA = 0; loopA &lt; attributes.length; loopA++ )
                {
                    var name = attributes[loopA].replace(attribute_pattern, &quot;$1&quot;);
                    if ( !flags.match(/o/) )
                    {
                        name = name.toLowerCase();
                    }
                    if ( AP.empty(attributes[loopA].replace(attribute_pattern, &quot;$2&quot;)) )
                    {   // if value does not exists (f.e. )
                        tag.attr[name] = null;
                    }
                    else
                    {
                        var value = attributes[loopA].replace(attribute_pattern, &quot;$5&quot;); // a quoted value
                        if ( AP.empty(value) )
                        {
                            value = attributes[loopA].replace(attribute_pattern, &quot;$7&quot;); // an unquoted value
                        }
                        tag.attr[name] = html_entity_decode(value, &quot;ENT_QUOTES&quot;);
                    }
                }

            }
        }
        
        tags.push(tag);
    }
    
    return tags;
}&lt;/pre&gt;

Usage example:
html_tags(html, &quot;input&quot;, &quot;ir&quot;);</description>
		<content:encoded><![CDATA[<p>Nice Code. Thanks! Here is my JavaScript version of your code with some improvements.<br />
1. Ignore commented tags (optional)<br />
2. Convert names of attributes to lower case (optional)<br />
3. Extract non-value attributes ()</p>
<pre lang="javascript">////////////////////////////////////////////////////////////////////////////
// flags = "i|[s|S]|r|o";
// i - ignore commented
// s - self closing - yes
// S - self closing - no
// r - return the entire tag";
// o - do not convert attr names to lower case
////////////////////////////////////////////////////////////////////////////
function html_tags(html, tag, flags)
{
    flags = AP.is_set(flags, "");

    // Remove all comment
    if ( flags.match(/i/) )
    {
        html = html.replace(/()/g, "");
    }

    if ( AP.is_set(tag.join) )
    {
        tag = tag.join("|");
    }

    //If the user didn't specify if $tag is a self-closing tag we try to auto-detect it
    //by checking against a list of known self-closing tags.
    var selfclosing = null;
    if ( flags.match(/s/) )
    {
        selfclosing = true;
    }
    else
    if ( flags.match(/S/) )
    {
        selfclosing = false;
    }
    else
    {
        var selfclosing_tags = "";
        selfclosing = (selfclosing_tags.indexOf("") != -1 ? true : false);
    }

    //The regexp is different for normal and self-closing tags because I can't figure out
    //how to make a sufficiently robust unified one.
    var tag_pattern = "";
    tag_pattern += "&lt;(&quot; + tag + &quot;)&quot;; // ]+)?"; // attributes, if any
    if ( selfclosing )
    {
        tag_pattern += "\\s*/?&gt;"; // /&gt; or just &gt;, being lenient here
    }
    else
    {
        tag_pattern += "\\s*&gt;"; // &gt;
        tag_pattern += "((.|\r|\n)*?)"; // tag contents
        tag_pattern += ""; // the closing
    }
    tag_pattern = new RegExp(tag_pattern, "ig");

    var attribute_pattern = "";
    attribute_pattern += "(\\w+)"; // attribute name
    attribute_pattern += "(";
    attribute_pattern += "\\s*=\\s*";
    attribute_pattern += "(";
    attribute_pattern +=   "([\\\"'])((.|\r|\n)*?)\\4"; // a quoted value
    attribute_pattern +=   "|"; // or
    attribute_pattern +=   "([^\\s\"']+?)(?:\\s+|$)"; // an unquoted value (terminated by whitespace or EOF)
    attribute_pattern += ")";
    attribute_pattern += ")?";
    attribute_pattern = new RegExp(attribute_pattern, "ig");

	// Find all tags
    var matches = html.match(tag_pattern);
    if ( !matches )
    {
        //Return an empty array if we didn't find anything
        return [];
    }

    var tags = [];
    for ( var loop = 0; loop value array
                for ( var loopA = 0; loopA &lt; attributes.length; loopA++ )
                {
                    var name = attributes[loopA].replace(attribute_pattern, &quot;$1&quot;);
                    if ( !flags.match(/o/) )
                    {
                        name = name.toLowerCase();
                    }
                    if ( AP.empty(attributes[loopA].replace(attribute_pattern, &quot;$2&quot;)) )
                    {   // if value does not exists (f.e. )
                        tag.attr[name] = null;
                    }
                    else
                    {
                        var value = attributes[loopA].replace(attribute_pattern, "$5"); // a quoted value
                        if ( AP.empty(value) )
                        {
                            value = attributes[loopA].replace(attribute_pattern, "$7"); // an unquoted value
                        }
                        tag.attr[name] = html_entity_decode(value, "ENT_QUOTES");
                    }
                }

            }
        }

        tags.push(tag);
    }

    return tags;
}</pre>
<p>Usage example:<br />
html_tags(html, &#8220;input&#8221;, &#8220;ir&#8221;);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gerard agber</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-64054</link>
		<dc:creator>gerard agber</dc:creator>
		<pubDate>Tue, 13 Jul 2010 16:21:52 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-64054</guid>
		<description>Your code is the best way to find any tag! so helpful. With the print_r can see all the concept

echo &quot;&lt;pre&gt;&quot;;
print_r($nodes);
echo &quot;&lt;/pre&gt;&quot;;

thank u very much friend!</description>
		<content:encoded><![CDATA[<p>Your code is the best way to find any tag! so helpful. With the print_r can see all the concept</p>
<p>echo &#8220;
<pre>";
print_r($nodes);
echo "</pre>
<p>&#8220;;</p>
<p>thank u very much friend!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-32263</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Wed, 04 Nov 2009 21:46:16 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-32263</guid>
		<description>When using DOM, you can get the link text like this : 
&lt;pre lang=&quot;php&quot;&gt;$text = $link-&gt;textContents;&lt;/pre&gt;
With the extract_tags() function, it&#039;s also very simple : 
&lt;pre lang=&quot;php&quot;&gt;$text = $link[&#039;contents&#039;];&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>When using DOM, you can get the link text like this : </p>
<pre lang="php">$text = $link->textContents;</pre>
<p>With the extract_tags() function, it&#8217;s also very simple : </p>
<pre lang="php">$text = $link['contents'];</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: alex</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-32262</link>
		<dc:creator>alex</dc:creator>
		<pubDate>Wed, 04 Nov 2009 21:09:03 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-32262</guid>
		<description>hi, nice script, it works for me! but i&#039;m also interested in the title of a href 
example &lt;span class=&quot;removed_link&quot;&gt; title &lt;/span&gt;
how do i get the title too ? thanks alot</description>
		<content:encoded><![CDATA[<p>hi, nice script, it works for me! but i&#8217;m also interested in the title of a href<br />
example <span class="removed_link"> title </span><br />
how do i get the title too ? thanks alot</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: White Shadow</title>
		<link>http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/comment-page-1/#comment-32222</link>
		<dc:creator>White Shadow</dc:creator>
		<pubDate>Tue, 03 Nov 2009 13:17:55 +0000</pubDate>
		<guid isPermaLink="false">http://w-shadow.com/?p=1375#comment-32222</guid>
		<description>Fixed the script.

It was a bug in my regexp syntax. The &quot;(?&lt;name&gt;...)&quot; syntax that I used for marking named subgroups apparently only works in some versions of PHP and not others. I&#039;ve rewritten the regexps to use the proper syntax - &quot;(?&lt;strong&gt;P&lt;/strong&gt;&lt;name&gt;...)&quot;.</description>
		<content:encoded><![CDATA[<p>Fixed the script.</p>
<p>It was a bug in my regexp syntax. The &#8220;(?&lt;name&gt;&#8230;)&#8221; syntax that I used for marking named subgroups apparently only works in some versions of PHP and not others. I&#8217;ve rewritten the regexps to use the proper syntax &#8211; &#8220;(?<strong>P</strong>&lt;name&gt;&#8230;)&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

