Articles / SpamAssassin vs. Spastic

SpamAssassin vs. Spastic

SpamAssassin has emerged as the most popular antispam tool in the Open Source world. It has gained such momentum that it has even crossed over into the commercial world as SpamKiller by Network Associates, and other commercial products are also based on it. This article is a short comparison of real world results between two antispam tools, SpamAssassin and Spastic.

Disclaimer: I am the current project leader and main developer for Spastic.

Types of antispam programs

Without getting into all the intricacies of email RFCs, I should mention that spam can be fought in many places throughout the system. Most mail servers, or Mail Transfer Agents (MTAs), have some antispam capabilities, but most users don't have the ability or desire to run their own mail servers. The Mail Delivery Agents (MDAs) are programs that take mail from an MTA and deliver it to local mailboxes. procmail is a very popular MDA and is the means by which both SpamAssassin and Spastic are usually invoked. Finally, many mail clients, or Mail User Agents (MUAs), have some antispam capabilities. One promising new trend is Bayesian filtering, which is built into the latest version of the Mozilla mail client (among others). However, this article is focused on two tools which filter at the MDA level using procmail.

Overview of SpamAssassin

SpamAssassin is a collection of Perl modules which test elements of an email message and assign a numeric ranking to it. The higher the ranking, the more likely that the message is spam. The default settings define a spam message as anything with a score of 5.0 or higher. SpamAssassin also checks Realtime Blackhole Lists and has many other advanced features. It is usually called through procmail, although newer versions come with a powerful spamd/spamc client-server interface as well.

Overview of Spastic

About two years ago, the level of spam I began to receive crossed my pain threshold, and I was motivated to take control of the problem. I tried several Open Source spam solutions, including SpamAssassin. At the time, the numeric ranking method of determining spam by SpamAssassin seemed counterintuitive. How do you know how to effectively weigh each setting? In time, I stumbled across SPAST, which was a relatively simple-to-understand procmail script which used word lists to match against elements of an incoming message. It was simple to set up, understand, and customize. The problem was that SPAST was no longer supported by its author, Chrissie LeMaire. I tracked Chrissie down and asked her permission to take over the SPAST project and develop it. Thus, Spastic was born.

Spastic uses procmail and common system utilities like formail, dig, and egrep to scan elements of an email message for patterns, check for valid domains and address formats, etc. One big difference between Spastic and SpamAssassin is that Spastic rules are binary. When a Spastic rule fires, the message is flagged as spam. If a message passes all the tests, it is not flagged. There is no ranking system. The Spastic distribution also includes bash scripts for reporting statistics and rotating spam archives.

Testing Method

The way I tested each program was to set it up to filter all incoming email for a seven day period and log the success rate of each. I made no configuration changes or tweaks to either program during the test. The main configuration I did for SpamAssassin was setting up my whitelist and a couple of cosmetic settings. Since I am on several mailing lists, I receive about 300 messages a day. In this mix is usually a small number of spam messages which come from a variety of sources. I usually receive about 10-20 spam messages a week, which I consider low by most standards today.

I tested SpamAssassin from April 14-20, 2003 and Spastic from April 21-27, 2003.

While my test results are accurate for the email I typically receive, I can't generalize my results to other email users. Please keep in mind that your results may vary.

Test Results

SpamAssassin Spastic
Correctly stopped 16 spam messages. Correctly stopped 10 spam messages.
1 false positive. 0 false positives.
1 missed spam message. 2 missed spam messages.
Total messages processed outside of whitelists: 51 Total messages processed outside of whitelists: 49
2 out of 69 incorrect = 67/69 = 97.10% correct 2 out of incorrect = 63/65 = 96.92% correct

Unfortunately, I realized too late that I should have saved the messages with which each program made an error and cross-tested them against the other one to see if it would have done better. I made a note of it for the next time I run a comparison test.

The results were very close, with SpamAssassin ending with a slightly higher percentage for correctly processing messages. If you are more concerned with false positives, Spastic came out slightly ahead, since many people would rather see a spam message slip through their filter than take a chance on losing an important message. Keep in mind that these sample sets are very small, so drawing firm conclusions is difficult.

Strengths and Weaknesses

After using the programs back-to-back, I have some observations about the strengths and weaknesses of each.

SpamAssassin strengths

A nice spamd/spamc interface, efficient, easy to use.
This feature is intended to make the program easier to use and improve performance. It is one of my favorite features.
It's easy to customize the whitelist and add other rules in the ~/.spamassassinrc/user_prefs file.
By adding or modifying rules in your personal user_prefs, you can customize the behavior and weightings if you don't like the defaults.
More tests, more generalized, more accurate.
It is much more sophisticated in testing elements for spam qualities than Spastic, and a better generalized solution for filtering an entire site.
Very easy to implement under Red Hat 9 by selecting it during installation.
This makes installation in Red Hat 9 drop dead easy.
A large community supporting and testing it.
A more detailed report of spam triggers.
SpamAssassin provides a detailed report of each spam rule that adds up to the final ranking for the message.
Support for other antispam tools like Vipul's Razor and RBLs.

SpamAssassin weaknesses

Depends on Perl.
Since SpamAssassin is written in Perl, it requires a recent version of Perl to be installed on the local machine. It depends on many modules and Perl packages, and may be effected if Perl is upgraded on the machine.
You may not be able to use it if you do not have rights to install Perl.
If you don't have rights to install Perl on the target machine, you can't use SpamAssassin. In most cases, this is not an issue, since Perl is installed on the majority of *nix machines.
The default setting mangles messages flagged as spam (by changing MIME types).
I hesitate to mention this as a drawback because the default setting is this way to protect users from Web bugs and malicious HTML content like Javascript. When SpamAssassin flags a message as spam, it changes the MIME type of all attachments to text so they are no longer executable. However, if the mail was a false positive, it may be difficult to recover the original message format if it was base64 encoded or was a multi-part MIME message. This default can be changed by setting the "defang_mime" option to 0.

Spastic strengths

Very easy to implement on any Linux distribution, easy on most *nixes.
In most cases, you can download the 60k tar.gz file, unzip it, run the setup script, and be ready to filter spam in about 5-10 minutes.
Depends on common system utilities (procmail, grep, and dig).
Since Spastic uses procmail and common system utilities, it is unlikely that additional software installation or configuration will be required to run it. Unless it is used as a site filter, root access is not required. It may be the best choice to use on a hosted server if Perl/SpamAssassin is not available.
It's easy to customize the whitelist and change rules and filter lists.
Customizing the whitelists and rules is a simple matter of editing a few text files.
A rotate-spam script to archive spam folders and produce statistical reports.
Spastic includes an optional bash script which can be run from cron to rotate the spam mailbox and keep up to nine archives. It also summarizes the reasons that messages were flagged and provides totals so you can see who is sending you the most spam. Note: with a few small tweaks, I was able to use the rotate-spam script with SpamAssassin to provide similar functions.
Basic antivirus recipes.
Spastic can flag any message carrying executable content to prevent it from reaching a vulnerable Windows box and causing damage.

Spastic Weaknesses

Not as accurate as SpamAssassin.
SpamAssassin does more tests and is more thorough. The default weights (determined by a genetic algorithm, no less) in SpamAssassin are very good and proved to be slightly more accurate in my testing. For a sitewide antispam solution, I have no doubt that SpamAssassin is more accurate than Spastic. For individuals who tune their filter files to the email they receive, Spastic and SpamAssassin are about equally effective.
Small community supporting and testing it.
Since SpamAssassin has a much larger community, it is better tested and supported.

Conclusion

SpamAssassin is the king of spam filtering for a reason. It is very sophisticated, well designed, and effective. For a sitewide filtering solution, I would strongly recommend SpamAssassin over Spastic. If you can't use SpamAssassin on a particular box (like a hosted box), or if you want a simpler solution for a small number of users, Spastic will also serve you well.

If you want to explore further, here are two other interesting antispam tools:

Editor's Note

This is just the tip of the growing iceberg of antispam tools in circulation today. I've been very happy with SpamAssassin for the last year or so. What are you using? What's your experience been with it? What's still slipping through? Where do you think the spam war is headed?

Recent comments

28 Aug 2003 16:58 Avatar kewjhoe

Re: worthless?

> I understand the need for scalability
> and faster code would be great. I would
> however challange the notition that you
> can't afford 10 more MX servers. I
> think that it's more like "management
> does not want to pay for 10 more MX
> servers."

I suppose we're getting off topic, but here goes anyway.

Yes it's management, but even I agree with them (for a change). We're talking about serving 50k customers, not 50k employees. The cost of providing the service outweighs the benefits gained from the cost. As we've all seen in the telecom bust, providing services at a lower cost than what you pay is a bad idea :). funny that the engineers realized it and marketing didn't... oh well. Anyway, it's basically an ROI decision.

25 Aug 2003 21:48 Avatar sergek

17 spam in 6 days?
The sample size is so minimal, the tests are pretty much meaningless. But more importantly, you're getting 3 spam a day and you care about spam email?

24 Aug 2003 23:30 Avatar era

SpamAssassin vs. SPASTIC vs. Bayesian in another posting
You'll notice that a more proper test is now at
http://freshmeat.net/articles/view/964/ (http://freshmeat.net/articles/view/964/)

11 Aug 2003 14:23 Avatar macdaddy

Re: A four letter word

> TMDA, actually, the best Spam reducer
> tool. Clean, professional, accurate.


I too am thinking of a 4-letter word that describes TMDA. Unfortunately it's not "TMDA."

10 Aug 2003 10:12 Avatar rmemmons

Re: worthless?

> My problem then
> becomes scalability. In a large
> production environment, say upwards of
> 50k users, i can't afford for a spam
> filter solution to drop my CPU resources
> to zero. I can't (literally) afford to
> add 10 more MX servers because my spam
> solution hogs all of the resources.


I understand the need for scalability and faster code would be great. I would however challange the notition that you can't afford 10 more MX servers. I think that it's more like "management does not want to pay for 10 more MX servers."

I don't know your costs, but if you just take some simple numbers regarding the true man-hour costs of spam on your end users you'll get into the millions, or 10's of millions of dollars a year in lost man hours. For example 10 emails a day and 10 seconds an email and 50000 users 10.1 hours a year per employeed which translated to 75 million. This is just an example--and it is huge.

If this is true, you have the money, your organization just lacks the will.

I bring this up because I work for a company of similar size, and I constantly see IT saying "too expensive"... but at the same time being happy to push much larger costs on the buget of others due to inaction or stupid policies. I don't know if your org is like that, but mine certainly is.

Rob

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.