Comment Spam Down By 96% – It’s Simple

For months I’ve been getting dozens of spam comments, sometimes over a hundred per day. Most of them are caught by Akismet, which is great. The problem is that with high numbers of spam comments it’s virtually impossible to look through the spam filter logs and de-spam false positives. The bandwidth and processing power wasted on identifying spammy comments were also bothering me slightly.

That’s why two days ago I finally got around to implementing a simple WordPress hack that brought the amount of blog spam down by 96 percent almost immediately. Yesterday I got no comment spam and only two instances of trackback spam – that’s because this hack has no effect on trackback spam.

Spam comments per day (including trackback spam)

The Theory of Spam

Most of the spam is generated by automated scripts and bots. Some people do post spammy messages manually, but they’re relatively rare and probably account for less than 1% of the total amount of spam. So finding a way to filter out the bots also means getting rid of the majority of spam.

This is usually accomplished by some form of CAPTCHA or a Turing test (“add 2+2″, etc) – both fairly effective techniques, except they tend to annoy human visitors. There’s also at least one interesting plugin that uses client-side scripting (Javascript) to verify that comments are posted by real humans. This relies on the fact that spam bots don’t usually execute Javascript. On the other hand, some users also have JS disabled or restricted.

The technique I used relies on the assumption that most spam bots just send pre-defined values to the comment script without bothering to check what the comment form is like. A highly advanced bot might actually parse the form and fill all fields to the best of its ability, guessing the content of each field from their names and so on. Still, even sophisticated bots wouldn’t run Javascript or… CSS.

Implementing a Comment Hack

So this is what I did – I edited my blog template to make the original “email” field invisible using CSS (but I didn’t delete it!). I then added a new email field that looks just like the old field and gave it a random name (“flycatcher”). This name is specified in the HTML source and not visible to the user. I also modified the script that handles comments to get the email from the new field and abort execution if the original email field isn’t empty (see below for source code).

This works well because :

  • Human visitors will use the new email field and leave the old one empty (because it’s invisible).
  • Stupid bots will always use the old field without checking for any new fields.
  • Advanced bots will fill in all fields to best of their ability, including the invisible “email” field (because they don’t process CSS and can’t tell it’s invisible).
  • Therefore, anyone who fills in the old “email” field must be an evil bot.

To implement this on my WordPress blog, I first opened the “comments.php” file from my theme, found the source for the email field (it begins with something like <input type=’text’ name=’email’…) and replaced it with the code below :

&lt;div style='display:none'&gt;&lt;p&gt;
&lt;!--the old email field (invisible to humans)--&gt;
&lt;input type=&quot;text&quot; name=&quot;email&quot; id=&quot;email&quot; 
	value=&quot;&quot; size=&quot;22&quot; tabindex=&quot;2&quot; /&gt;
&lt;label for=&quot;email&quot;&gt;
&lt;small&gt;Mail (will not be published) 
	&lt;?php if ($req) echo &quot;(required)&quot;; ?&gt;
&lt;/small&gt;&lt;/label&gt;&lt;/p&gt;&lt;/div&gt;

&lt;p&gt;&lt;!--the new, real email field--&gt;
&lt;input type=&quot;text&quot; name=&quot;flycatcher&quot; id=&quot;flycatcher&quot; 
 value=&quot;&lt;?php echo $comment_author_email; ?&gt;&quot; size=&quot;22&quot; tabindex=&quot;2&quot; /&gt;
&lt;label for=&quot;flycatcher&quot;&gt;&lt;small&gt;&lt;!--encoded to further confuse bots--&gt;
&amp;#77;&amp;#97;&amp;#105;&amp;#108; &amp;#40;&amp;#119;&amp;#105;&amp;#108;&amp;#108; 
&amp;#110;&amp;#111;&amp;#116; &amp;#98;&amp;#101; 
&amp;#112;&amp;#117;&amp;#98;&amp;#108;&amp;#105;&amp;#115;&amp;#104;&amp;#101;&amp;#100;&amp;#41; 
&lt;?php if ($req) echo &quot;(&amp;#114;&amp;#101;&amp;#113;&amp;#117;&amp;#105;&amp;#114;&amp;#101;&amp;#100;)&quot;; ?&gt;
&lt;/small&gt;&lt;/label&gt;&lt;/p&gt;

I’ve wrapped the original field in <div style=’display:none’>…</div> tags to make it invisible, and I’ve made a copy of it, changing name and id to “flycatcher”. I also encoded the “Mail (…)” text as numerical HTML entities which might be a bit of overkill… eh, probably not ;)

As the second and final step I modified my wp-comments-post.php file, adding this after the line $comment_author_email = trim($_POST['email']); :

if (strlen($comment_author_email)&gt;0) {
	wp_die( __('Error: please enter a valid email address.') );
}
$comment_author_email = trim($_POST['flycatcher']);

The above code will terminate the script if the old email field contains any text.

Final Notes

If you want to implement this antispam technique, remember that you’re doing it at your own risk! Make a backup first, et cetera.

By the way, I didn’t invent this hack – I first saw it discussed somewhere on BlueHatSEO a while ago, but I can’t find the relevant URL now.

Related posts :

4 Responses to “Comment Spam Down By 96% – It’s Simple”

  1. [...] recently blogged about my anti-spam experiment (which has been going great!). The one problem with the method I explained in that post is that it [...]

  2. [...] that the introduction of the rel=’nofollow’ completely failed to stop, or at least decrease link spam on blogs, forums and similar sites. The possible reasons, and the negative effects of Nofollow on [...]

  3. How to train the spam filter? Can anyone send me it’s simple source code in PHP?

  4. Finally, I built the spam filter based on Naive Bayesian’s approach……

Leave a Reply