Training SpamAssassin

I use SquirrelMail and seperate out my missed spam email to a "Train/SPAM" folder and the wrongly tagged ones to "Train/HAM" folder and automate the process of training SpamAssassin via a daily cron.

Put the "SAtrain.sh" file in "/etc/cron.daily" directory.

  #!/bin/bash
  # SAtrain.sh

  # Trains SpamAssassin with Spam and Ham mbox feeds...
  # Put this file in "/etc/cron.daily" directory.
  # chmod 700 "/etc/cron.daily/SAtrain.sh"

  echo "Training SpamAssassin Begin:"
  echo "Learning from Spam..."
  /usr/bin/sa-learn --spam --configpath=/etc/mail/spamassassin --showdots --mbox /home/user/mail/Train/Spam
  sleep 10
  echo "Learning from Ham..."
  /usr/bin/sa-learn --ham --configpath=/etc/mail/spamassassin --showdots --mbox /home/user/mail/Train/Ham
  sleep 10
  echo "Clearing Spam Feed..."
  cat /dev/null > /home/user/mail/Train/Spam
  echo "Clearning Ham Feed..."
  cat /dev/null > /home/user/mail/Train/Ham
  echo "Training SpamAssassin Completed!"
  

With spamassassin trained daily and my "required_hits set to 1.9" in my "~/.spamassassin/user_prefs" file, I have been able to get an accuracy of emails being tagged as spam for about 99.9% of my emails... Woohooo!!

Related Reading:

Bayes In SpamAssassin
Bayes FAQ

Comment