Comments for Apache SpamAssassin

21 Jan 2004 03:07 crippler

some of this discussion is outdated
SpamAssassin has come a long way since this discussion started. The concept of whitelisting & blacklisting messages has gotten a whole lot easier.

Now that SA has Bayesian filters, training your SA with a large corpus of mail can be pretty easy (though necessarily tedious). I've set up two folders; one called "Ham" and the other called "Spam". Go through and move messages in your mailbox to one of those two folders. Good mail goes to Ham, junkmail goes to Spam. The larger the corpus of mail you pull from the better.

Next I set up a couple of cron jobs that look like:

20 2 * * * /usr/bin/sa-learn --ham --mbox ~/Ham
20 4 * * * /usr/bin/sa-learn --spam --mbox ~/Spam

Once a day, SpamAssassin goes through my Ham & Spam folders and learns what good & bad mail tend to look like. The more I feed it, the better it gets at catching it.

Some types of spam were still getting through despite this filtering. The scores were significant but below the minimum score I had set to mark a message as spam. Many of these were Nigerian scam mails. Here are some lines I added to my global SpamAssassin config to take a big chunk out of incoming Spam:

# New blacklist not included in
# default configuration
header RCVD_IN_BNBL eval:check_rbl('bl', 'bl.blueshore.net.')
describe RCVD_IN_BNBL Listed by BNBL
tflags RCVD_IN_BNBL net
score RCVD_IN_BNBL 2
# Higher scoring for Nigerian scams
score NIGERIAN_BODY1 3
score NIGERIAN_BODY2 2
# Known high-volume spammers that
# I have no interest in hearing from.
body PHARMAWHAREHOUSE /pharmawharehouse.biz/
describe PHARMAWHAREHOUSE Link to pharmawharehouse.biz
body PHARMACOURT /pharmacourt.biz/
describe PHARMACOURT Link to pharmacourt.biz
body VALUEPOINTMEDS /valuepointmeds.biz/
describe VALUEPOINTMEDS Link to valuepointmeds.biz
score PHARMAWHAREHOUSE 10
score PHARMACOURT 10
score VALUEPOINTMEDS 10

08 May 2003 03:22 weissel

Re: Almost Amazing!

> > Won't work well --- if at all --- for
> >
> > * Mailing lists
> > * automated mailings (freshmeat's new version mailings, most
> > buying over the internet stuff, Bounces, etc.)
%
> Legitimate mailing lists and automated mailings are usually
> easy to differentiate from spam;

I got a 'please go to this website' (where you have to enter a
20 char long string to let the message pass through) ... which
looked so much like the spam I usually get that the spam filter
treated as spam.

In the end I had to re-write the message, before it passed
through, as it timed out the first time before I looked through
the spam heap. I would not have done this if the email had not
been important _for me_ to arrive. Helping others is _not_ that
important, as I do this on my free time.

If I had countered with a confirmation request instead of
throwing it on the spam heap, I'd never known that my mail never
made it. Instead I would have grumbled over the recipient's
silence.

Easy to differentiate, indeed.


> also, if you know ahead of time that you are subscribing to
> something, you can add it to a whitelist.

So I just got a mail from a guy 'noreply@freshmeat.net' which
notified me of your answer. Never got that mail before. So how
can I whitelist that in advance? How is noreply@freshmeat.net
gonna read, much less respond to a confirmation request?

How is _that_ low maintenance?

(The same goes, as I said, for many online shopping cases.)


> > * people who don't like jumping through hoops to get mail
> > through (unfortunately these are usually the people who
> > give answers).
%
> First, you can safely whitelist everybody you send to, so as
> not to inconvenience them.

i.e. even more work for me to integrate that into my mail client.

And if they answer me from a different (e.g. preferred or new)
address, they'll be inconvenienced again --- when all they try to
do is making me reach them better/faster.

This can be real fun if you use sneakemail.com (I do).
If you send me a mail to my sneakemail address (say
xxxxx@sneakemail.com), I get a temporary yyyyy@sneakemail.com
(which will expire in a few days).

You send me another mail in a week ... and I'll get a
yzyzyz@sneakemail.com. A new confirmation is clearly neccessary,
right? So you'll have to parse the X-Sneakemail-From: header
instead of just the From header, where it applies.


> Also, if you apply this, say, only to messages tagged by
> spamassassin as 'probable spam', only your friends trying to
> sell you penis enlargements will be asked to confirm :-)

So we are still stuck on the case --- which I, personally,
experienced --- where a confirm mail will be asked to confirm
itself. At best, you'll never ever see that mail. Really a good
thing if the mail was somewhat important.


> > * Senders where the anti-spam system fires such a message
> > right back to you --- you can get a nice mail flood if that
> > goes over a mailing list. For 3 parties you'll get a very
> > very impressive snowball effect! (Can you say 'complete
> > meltdown'?)
%
> Oh, come on now. Sending one message per address is a simple
> thing to do.

You are implying a world where nobody's 'out of office' mails
will be send as answer to their own 'out of office' mails.

Welcome to reality.

I have seen that at 100 mails/hour on a mailing list. More than
once. So much that the mailing list finally stopped Reply-To
munging. It won't help, either, if the sender address keeps
changing. Like some peope who regularly change their mail
addresses to avoid spam.


> To see two systems that are successful with the confirmation
> technique, read up on these: TMDA and Active Spam Killer.
> Remember that you can combine this with a spam identifier like
> spamassassin to only request confirmation from messages that
> look like spam.

So you'll be part of a DDoS on some poor schmuck who's address
was faked into the mail.

If but 0.5% of the recipients of a modest 5 mio. spam use such
a thing, you'll have 25k mails on you on the day your address
appears in the From of a spam. And often enough it is somebody's
spam. Ask the owners of test.com. With luck, you'll fire off
another 25k mails if the confirmation request includes the
original spam "for your convenience".

And now imagine 1% and 20 million recipients. 200k mails is fun
and a half.

Again, it's your choice, I believe that these things can
harm others, badly, and thus should not be used without deep
understanding. But go right ahead, time will show if DDoSsing
innocent bystanders will help the fight against spam.

07 May 2003 12:15 markthomas

Re: Almost Amazing!

> % How about this: set up an
> autoresponder
> % that says, "I'm sorry, your message
> has
> % been trapped by my spam filter. If
> this
> % is a legitimate email message, please
> % put the word PASSWORD in the subject.
> %[...]
> %
> % I guarantee that spammers are not
> going
> % to bother putting your password in
> the
> % subject.
>
>
> Won't work well --- if at all --- for
> * Mailing lists
> * automated mailings (freshmeat's new
> version mailings, most buying over the
> internet stuff, Bounces, etc.)


Legitimate mailing lists and automated mailings are usually easy to differentiate from spam; also, if you know ahead of time that you are subscribing to something, you can add it to a whitelist.


> * people who don't like jumping through
> hoops to get mail through (unfortunately
> these are usually the people who give
> answers).


First, you can safely whitelist everybody you send to, so as not to inconvenience them.
Also, if you apply this, say, only to messages tagged by spamassassin as 'probable spam', only your friends trying to sell you penis enlargements will be asked to confirm :-)


> * Senders where the anti-spam system
> fires such a message right back to you
> --- you can get a nice mail flood if
> that goes over a mailing list. For 3
> parties you'll get a very very
> impressive snowball effect! (Can you
> say 'complete meltdown'?)


Oh, come on now. Sending one message per address is a simple thing to do.


> * If a mailing list rewrites the header
> enough (reply-to munging comes to mind)
> you could even start answering your own
> "put PASSWORD in subject line" for all
> the mailing list to see. Fun (and that
> has happened with vacation mails before,
> at 100 mails/h)!
>
> You will have to decide yourself if
> these restrictions and dangers are
> acceptable to you, your mailing list
> reputation and your environment; you
> also have to think about how to avoid
> vicious circles as outlines above.
> Dropping mails you'll always risk
> dropping information, if that risk is
> acceptable to you, go ahead.


To see two systems that are successful with the confirmation technique, read up on these: TMDA (http://www.tmda.net/) and Active Spam Killer (http://sourceforge.net/projects/a-s-k). Remember that you can combine this with a spam identifier like spamassassin to only request confirmation from messages that look like spam.

20 Mar 2003 06:49 weissel

Re: Almost Amazing!

%
> % % How about this: set up an autoresponder
> % % that says, "I'm sorry, your message has
> % % been trapped by my spam filter. If this
> % % is a legitimate email message, please
> % % put the word PASSWORD in the subject.
> % % [...]
%
> % % I guarantee that spammers are not going
> % % to bother putting your password in the
> % % subject.
%
%
[shortened]
> % Won't work well --- if at all --- for
> % * Mailing lists
> % * automated mailings
> % * people who don't like jumping through
> % hoops
> % * Senders where the anti-spam system
> % fires such a message right back to you
> % * If a mailing list rewrites the header
> % enough
[leading to endless mail loops and other fun things]
%
> % You will have to decide yourself if
> % these restrictions and dangers are
> % acceptable to you, your mailing list
> % reputation and your environment;
[...]
%
> Most of what you are asking for can be resolved
> using the user_prefs file. You can find
> a free Windows utility for creating and
> editing user_prefs files here:
%
> http://www.CleanMyMailbox.com/sa


As a non-Windows-User I cannot use that program (not that I'd need it).

Also, there is no way the user_prefs file can prevent the problems outlined above if you use an autoresponder telling people to put something specific into the subject.

19 Mar 2003 22:21 jhalbrook

Re: Almost Amazing!

>
> % How about this: set up an
> autoresponder
> % that says, "I'm sorry, your message
> has
> % been trapped by my spam filter. If
> this
> % is a legitimate email message, please
> % put the word PASSWORD in the subject.
> %[...]
> %
> % I guarantee that spammers are not
> going
> % to bother putting your password in
> the
> % subject.
>
>
> Won't work well --- if at all --- for
> * Mailing lists
> * automated mailings (freshmeat's new
> version mailings, most buying over the
> internet stuff, Bounces, etc.)
> * people who don't like jumping through
> hoops to get mail through (unfortunately
> these are usually the people who give
> answers).
> * Senders where the anti-spam system
> fires such a message right back to you
> --- you can get a nice mail flood if
> that goes over a mailing list. For 3
> parties you'll get a very very
> impressive snowball effect! (Can you
> say 'complete meltdown'?)
> * If a mailing list rewrites the header
> enough (reply-to munging comes to mind)
> you could even start answering your own
> "put PASSWORD in subject line" for all
> the mailing list to see. Fun (and that
> has happened with vacation mails before,
> at 100 mails/h)!
>
> You will have to decide yourself if
> these restrictions and dangers are
> acceptable to you, your mailing list
> reputation and your environment; you
> also have to think about how to avoid
> vicious circles as outlines above.
> Dropping mails you'll always risk
> dropping information, if that risk is
> acceptable to you, go ahead.


Most of what you are asking for can be resolved

using the user_prefs file. You can find a free
Windows utility for creating and editing user_prefs files here:

http://www.CleanMyMailbox.com/sa

23 Feb 2003 17:56 weissel

Re: Almost Amazing!

> How about this: set up an autoresponder
> that says, "I'm sorry, your message has
> been trapped by my spam filter. If this
> is a legitimate email message, please
> put the word PASSWORD in the subject.
%[...]
>
> I guarantee that spammers are not going
> to bother putting your password in the
> subject.


Won't work well --- if at all --- for
* Mailing lists
* automated mailings (freshmeat's new version mailings, most buying over the internet stuff, Bounces, etc.)
* people who don't like jumping through hoops to get mail through (unfortunately these are usually the people who give answers).
* Senders where the anti-spam system fires such a message right back to you --- you can get a nice mail flood if that goes over a mailing list. For 3 parties you'll get a very very impressive snowball effect! (Can you say 'complete meltdown'?)
* If a mailing list rewrites the header enough (reply-to munging comes to mind) you could even start answering your own "put PASSWORD in subject line" for all the mailing list to see. Fun (and that has happened with vacation mails before, at 100 mails/h)!

You will have to decide yourself if these restrictions and dangers are acceptable to you, your mailing list reputation and your environment; you also have to think about how to avoid vicious circles as outlines above. Dropping mails you'll always risk dropping information, if that risk is acceptable to you, go ahead.

25 Jul 2002 10:57 marktranchant

Re: Almost Amazing!

How do you whitelist a subject line..?!? I read the docs and do not see that feature..


This must be the slowest discussion I've ever participated in.


To whitelist a subject line, set up your procmailrc to not send mail with that subject line through spamassassin.

13 Jul 2002 03:31 csyntax

Re: Almost Amazing!
How do you whitelist a subject line.. ?!?
I read the docs and do not see that feature..

> > it also manages
> % to occasionally tag genuine e-mail as
> % spam. Not very often, but often
> enough
> % that I can't automatically bin
> anything
> % tagged as spam. The best I can do is
> % syphon it off into another folder
>
> How about this: set up an autoresponder
> that says, "I'm sorry, your message has
> been trapped by my spam filter. If this
> is a legitimate email message, please
> put the word PASSWORD in the subject.
> Thank you." Then you allow messages with
> your chosen password in the subject. You
> can change the password as often as you
> like, and/or make them expire after a
> certain number of days.
>
> I guarantee that spammers are not going
> to bother putting your password in the
> subject.


13 May 2002 14:59 markthomas

Re: Almost Amazing!
> it also manages
> to occasionally tag genuine e-mail as
> spam. Not very often, but often enough
> that I can't automatically bin anything
> tagged as spam. The best I can do is
> syphon it off into another folder


How about this: set up an autoresponder that says, "I'm sorry, your message has been trapped by my spam filter. If this is a legitimate email message, please put the word PASSWORD in the subject. Thank you." Then you allow messages with your chosen password in the subject. You can change the password as often as you like, and/or make them expire after a certain number of days.

I guarantee that spammers are not going to bother putting your password in the subject.

21 Feb 2002 04:26 jynxzero

Re: Almost Amazing!

> You could always turn the sensitivity
> down - read the documentation...


I have read the documentation. (Why would I post such a comment if I had not?)

I did experiment with the settings, and to some extent this made things better. But the problem still remains. If I turn down the sensitivity then spamassassin does not tag as much legitemate mail as spam, but still it does occasionally. And it also starts letting through a lot more spam.

Screenshot

Project Spotlight

ReciJournal

An open, cross-platform journaling program.

Screenshot

Project Spotlight

Veusz

A scientific plotting package.