This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bayesian filtering in SpamAssassin

I have some queries on Bayesian filtering in SpamAssassin in latest ASL.

Tom wrote this a while ago:
"The next Up2Date has SpamAssassin 2.55, that has a self-learning bayesian component. It does not require user interaction, so it is easy to implement. We do not currently plan to put in a bayesian component that requires user feedback."

1) Is this currently implemented ?
2) Tom states "self-learning" how does this work with no user feedback ?
3) Would Astaro reconsider the "no user feedback" statement ? It seems to me a few extra options in the SMTP proxy content manager such as "Delete as Spam" etc would be easy to implement ?

I have implemented SpamBayes (see SourceForge) in my MS Outlook client and it works FANTASTICALLY. I'd say it catches 99% of all spam at a guess and I think something like this would be well worth the effort.

(I trained SpamBayes on around 600 spam emails and 800 or so "ham" messages).

This thread was automatically locked due to age.

Parents

0 drees over 22 years ago

If Astaro is using SpamAssassin as is with it's default auto-learning capabilities, here's how it works:

If the SA rules generate enough positive points on an email where it's positive that it's SPAM, it will parse and insert the message into the bayes database.

Conversely, if the SA rules generate enough of negative points on an email, it will automatically train it's bayes database with the message as ham.

In theory, what this will do is re-inforce the rules that SA comes with while adding new words which may be detected as spam or ham.

In practice, I haven't found that this makes SA much more effective than stock.  In order to really prevent SA from generating false positives and negatives, you need to train most messages into the bayes database.

FWIW, I've also found that SA 2.55 lets a lot more spam through than SA 2.60.

IMO, Auto-learning bayes filters are pretty useless.  If you want to try an adaptable Bayes filter give POPFile a shot.  If you really want to use spamassassin, use it on your internal mail server there where you can train it a bit easier.

The Spamassassin website has tons of online documentation describing how things work in detail, check it out.
Cancel
Vote Up 0 Vote Down

Cancel
0 Simon Shaw over 22 years ago in reply to drees

Thanks for the post, you confirmed what I suspected, without actual training the Bayesian filter lacks a lot of punch.

A few extra options in the SMTP queue page to delete mail as spam or dequeue as ham would probably make a world of difference.

SpamBayes works brilliantly after training. I have had maybe 2-3 false positives out of 1000's of mails and 99% hits on spam.
Cancel
Vote Up 0 Vote Down

Cancel
0 jpelzer over 22 years ago in reply to Simon Shaw

I agree, autolearning bayesian is not so great... It'd be nice if the proxy had a facility to submit folders of ham and spam to train the filter... It's a simple command-line in spamassassin... Although I guess once I see the installation I could write up the commands for you guys, just log in and do it once. Generally, about 1000 messages of each will give good results.

Another thing about SpamAssassin... Are they going to have Razor2 support in the Up2Date version? I find that Razor2 is a GREAT addition to the base SA rules. On my mail server I have it bumped to 5.0 points, so any spam coming in that is listed as spam with a confidence > 50% gets immediately past the threshold, and that much closer to bayesian auto-learning.

Of course, I'd like to see 2.60 support instead of 2.55... So many great additions, seems like it'd be worth it to skip 2.55.
Cancel
Vote Up 0 Vote Down

Cancel
0 gnujuba over 22 years ago in reply to jpelzer

i would prefer dcc (http://www.rhyolite.com/anti-spam/dcc/) as online database for spam. we had more hits with dcc than with razor.

gnjb
Cancel
Vote Up 0 Vote Down

Cancel
0 jpelzer over 22 years ago in reply to gnujuba

gnujuba
Oooh, DCC... You know, I didn't know SA supported DCC. I'm gonna go install that on my box right away. Thanks!
Cancel
Vote Up 0 Vote Down

Cancel
0 William Warren over 22 years ago in reply to jpelzer

autolearning bayesian not that great? I use it here on mozilla 1.5 and it works great..[:)]

Owner: Emmanuel Technology Consulting

http://etc-md.com

Former Sophos SG(Astaro) advocate/researcher/Silver Partner

PfSense w/Suricata, ntopng,

Other addons to follow
Cancel
Vote Up 0 Vote Down

Cancel
0 Frank_H over 22 years ago in reply to William Warren

I think it would improve the spamdetection system in ASL a great deal.

If the nice people at astaro would include AWL, Bayesian and DCC in the ASL. It would be super!! [:)]
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Frank_H over 22 years ago in reply to William Warren

I think it would improve the spamdetection system in ASL a great deal.

If the nice people at astaro would include AWL, Bayesian and DCC in the ASL. It would be super!! [:)]
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data