Sorting SPAM
28 Feb 2007I been using SpamAssassin for a while to help identify SPAM. About a week ago, I started seeing all messages that were being flagged as SPAM by SpamAssassin show up in my Inbox instead of in my SPAM folder.
Well, it irritated me enough a moment ago to actually take a look at the full headers of just such a message. Here are the headers added by SpamAssassin:
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on
dark-templar.lamontpeterson.net
X-Spam-Level: ***********************
X-Spam-Status: Yes, score=23.0 required=4.0 tests=BAYES_80,DRUGS_ERECTILE,
DRUGS_ERECTILE_OBFU,HTML_MESSAGE,RCVD_IN_BL_SPAMCOP_NET,URIBL_AB_SURBL,
URIBL_JP_SURBL,URIBL_SBL,URIBL_SC_SURBL,VIA_GAP_GRA autolearn=no version=3.1.8
X-Spam-Report:
* 2.5 VIA_GAP_GRA BODY: Attempts to disguise the word 'viagra'
* 2.0 BAYES_80 BODY: Bayesian spam probability is 80 to 95%
* [score: 0.8180]
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.6 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
* [Blocked - see <http ://www.spamcop.net/bl.shtml?201.83.176.249>]
* 1.6 URIBL_SBL Contains an URL listed in the SBL blocklist
* [URIs: tersho.com]
* 3.8 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist
* [URIs: tersho.com]
* 4.1 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
* [URIs: tersho.com]
* 4.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist
* [URIs: tersho.com]
* 2.4 DRUGS_ERECTILE_OBFU Obfuscated reference to an erectile drug
* 0.5 DRUGS_ERECTILE Refers to an erectile drug
(Now that’s one spammy piece of SPAM!)
OK, so I took a look at my ~/.mailfilter file on the server:
### SPAM
if ( /^X-Spam-Flag: *(yes|YES) / )
{
to "$HOME/mail/.SPAM/"
}
Many of my readers may be eagle-eyed enough to spot the problem right away. If you said, “Hey, you’ve got a superfluous space after your closing parenthesis in your regular expression there,” then you got it.
That regex would match either “yes” or “YES” (they are case sensitive). I did this because at some point long ago, I had a rule on a system that used “yes”, but SpamAssassin today produces “YES” and I just didn’t want to have it missing stuff because of something like that.
I decided to further improve this regex so that it might be less likely I’ll have to “fix” it again:
### SPAM
if ( /^X-Spam-Flag: *[yY][eE][sS]/ )
{
to "$HOME/mail/.SPAM/"
}
Problem solved.
BTW: the term SPAM originally came to be used in the computer world because of the Monty Python Spam sketch.





