tech.agilitynerd.com

scratching that itch... 
Filed under

spam

 

Refererblock Version 0.2

I came up with two improvements to the first release of my refererblock plugin:

  • If the referer string matches the site's URL it passed immediately and isn't checked against the blacklist.
  • The blacklist.txt file was being read even if the referer string was empty. Now it is only read if the referer string is not empty and it isn't for the site's URL.

These optimizations do improve the performance of the plugin. My testing on a PIII 800MHz running Fedora Core 3 Linux with Apache 2.0 showed the following average latencies:

  • 1.5 ms - Empty referer string or current domain.
  • 2.0 ms - Referer string matching the first regex of the example blacklist file.
  • 3.0 ms - Referer string matching the final regex of the example blacklist file.

I was kind of surprised at how little additional time was required to load the blacklist file and process the regular expressions. This is probably due to the file remaining in the disk cache for subsequent requests. Of course your mileage may vary.

Download version 0.2 of the plugin here.

See my original plugin description for installation, configuration, and testing information. Please let me know if you use this plugin or if you have comments or suggestions for improving it.

Filed under  //   blosxom   plugin   referrer   spam  

Comments [0]

Blog Spammers Using URL Encoding

I was getting hit by comment spammers in the last week who were using URL-encoding of their addresses to get around the comment blacklisting filter I use. By replacing regular characters with the multi-character encoded representation of those characters within the URL the spammers were able to post comment spam with links to casino and porn websites to my blog.

This type of spam is a reversal of the same method for hiding your own email address in a web page so it won't be harvested by email spammers (see for example Chip Rosenthal's Blog entry).

I just added a regular expressions in my blacklist file to block the use of URL encoded characters in all links. If your blog comment software supports this approach you might want to do the same thing.

Filed under  //   spam  

Comments [0]

Those Darn SpammersBlosxom - Hit Counter and Writeback Changes

Over the past year I'd noticed that comment and trackback spammers had been hitting the same dozen or so pages of my blog multiple times daily. (It is probably the same person/group who took a snapshot of the articles on my front page at that time and just reuses those URLs for all of their different domains). Last I calculated, about 50% of my overall website traffic is due to spammers. This constant barrage of hits skews my AWStats statistics and, more importantly, skews the results on my Favorites page. So I took a little time to work on this problem.

FWIW I'm also starting to see a lot of spam coming from "blogs" being setup on BlogSpot that are a single page all of whose links point to the real site. Some of these bogus blogs use the BlogSpot temmplate which contains a flag used to alert BlogSpot admins to content in violation of their Terms of Service. This allows anyone to report the bogus blog by just clicking on the flag. Other "blogs" just use their own HTML and BlogSpot support would have to be sent an email with the offending site's URL.

So anyway, I've taken the following steps:

  • Disabled trackback comments entirely using the configuration variable in the writeback plugin. I've never received a legitimate trackback ping.
  • Modified my modified version of the writeback plugin to set a variable $rejected with a 1 if the comment was rejected or if trackback was attempted.
  • Modified my HitCounter plugin to read the $writeback::rejected variable and then not increase the counter for the spammed page.
  • I had to change the ordering of the hitcounter plugin to run after writeback so the variable would be set correctly when hitcounter ran.
  • Set the $hitcounter::reset_count variable and reset the counts of the spammed pages back to "reasonable" counts.

Another couple hours wasted messing around against spammers.

I've previously written how I've been using comment content blacklisting to reject comment, trackback, and referer (sic) spam. My current blacklist file has over 40 regular expressions containing over 250 words and patterns. I update it whenever a spam comment slips through. So that is an ongoing almost daily effort. I might just have to go to the trouble of getting a CAPTCHA plugin to work.

Filed under  //   blosxom   plugin   spam  

Comments [0]

Blosxom Plugin to Block Referer Spam

Comment and Referer Spam

Like so many other bloggers who allow comments on their websites and blog articles, I was facing increasing comment spam as my blog got noticed by more spammers. The size of this problem is illustrated by this Google query for "comment spam" that returned 1.5 million hits. For the uninitiated comment spam is like email spam for blogs; the spammer inserts fake comments in a blog where either the comment text contains from one to dozens of links to the spammer's websites. When web search sites "spider" the blog the links to the spammer's site are treated as "endorsements" of the spammer's sites and the spammer's sites are raised to the top of the search site's result lists.

There is another growing type of blog spam called referer spam (yes it is officially misspelled). When a web surfer clicks on a link in a web page that sends them to another web page most web browsers fill in the URL of the referring page into the request called the HTTP_REFERER. Some websites and blogs capture that page link information when they are on the receiving end of a web request. These sites might have a section on each page indicating the sites that link to that page. These links are referer links.

Referer spam uses the same mechanism as comment spam to raise the search sites ranking of the spammer's websites. But referer spammer's don't post comments; they post fake referrals to a website. The are hoping that the website or blog displays links of the sites that refer to them. So when the website is spidered the search ranking is raised.

Blosxom Plugins Addressing Spam

Like so many bloggers once I started getting comment spam I was able to manually delete them as they occurred. But that got old fast. After some Googling I discovered Doug Alcorn's Blosxom writeback blacklist plugin. I had been using the original writeback plugin. Doug's improvements provided enough protection (so far) with less than a dozen regular expressions removing all my comment spam.

Referer spam started hitting me three weeks ago. What was most infuriating was that I don't display any links of referring sites on my site at all. So all these spammers were succeeding in doing was skewing my site statistics and using my bandwidth with their fake referer attacks several times a day.

Of course all the referer spam site's addresses contained one or more of the same dozen blacklisted words I had already configured for comment spam. About the same time I saw Jason Clark post his deferer plugin to return a 301 permanent redirect for the IP address of one particular referer spammer who was attacking his site. I thought that by combining Doug's blacklisting plugin with Jason's immediate redirect plugin I could reduce the referer spam from my log files.

This plugin hasn't removed the entries entirely from my logs, since the initial request is still logged with a 301 status. But it has stopped the subsequent downloading of images for the pages whose content is now not served. Now that the log contains 301 status messages for these requests they are ignored by my host's statistics program (Advanced Web Statistics - AWStats).

On a re-reading of Jason's blog entry for deferer he also mentions the idea of white and black lists for referer filtering. So I might have subliminally remembered his idea and implemented my refererblock plugin based on his idea. In any event, I fully credit Jason and Doug for giving me the ideas and code with which to put together my plugin.

Referer Block Plugin

The refererblock plugin's tar file can be downloaded here. It contains the refererblock plugin already named 000refererblock so that it runs before all other plugins (you want to discard the blacklisted requests before all legitimate requests). A sample blacklist.txt file is provided and contains some example regexs. It uses the same blacklist file format and file name as Doug's writeback modification (I took the code from his plugin with only cosmetic changes). See Doug's website for links to the Movable Type and other blacklists.

The only configuration variable you can set is $log_blacklisted. If set to a full path file name the script logs the UTC date/time, referer string, and the page to which they were referring. You could use the frequency of words in the rejected referer strings to fine tune the content and ordering of your blacklist.txt file to match the spammers hitting your site. Be aware that this file isn't trimmed so you might want to keep an eye on its size.

Lastly, the zip file contains a simple Perl script you can use to test the plugin. Execute it as:
referer_test.pl http://example.com http://referer-spam.com
where the first URL is your website and the second URL is the referer to be sent in the request. This script uses HTTP::Request to send the request. The script returns the status "200 OK" for the requested page if it isn't blacklisted and "301 Moved Permanently" if the referer is blacklisted.

Please let me know if you use this plugin or if you have comments or suggestions for improving it.

Filed under  //   blosxom   plugin   spam  

Comments [0]