Apache SpamAssassin is an extensible email filter that is used to identify spam. Once identified, the mail can then be optionally tagged as spam for later filtering. It provides a command line tool to perform filtering, a client-server system to filter large volumes of mail, and Mail::SpamAssassin, a set of Perl modules allowing Apache SpamAssassin to be used in a wide variety of email systems.
FuzzyOcr is a plugin for SpamAssassin that can be used on image spam. It supports optical character recognition using different engines and settings, a fuzzy word matching algorithm applied to OCR results, an image hashing system to learn the unique properties of known spam images, dimension, size, and integrity checking of images, and content-type verification for the containing email message.
PDFassassin is a module for SpamAssassin that allows for the scanning of PDF files in email message attachments. Email bodies are scanned upon connection and checked for PDF attachments. Text is extracted from the PDF via pdftotext and scanned by SpamAssassin. Should the PDF contain images, the gocr program is called to extract the text content. The total spam score of the PDF is compared against the global required_score setting; if it's higher, a score equal to the one specified in pdf.cf is appended to the overall score of the email message.