Apache SpamAssassin is an extensible email filter that is used to identify spam. Once identified, the mail can then be optionally tagged as spam for later filtering. It provides a command line tool to perform filtering, a client-server system to filter large volumes of mail, and Mail::SpamAssassin, a set of Perl modules allowing Apache SpamAssassin to be used in a wide variety of email systems.
PDFassassin is a module for SpamAssassin that allows for the scanning of PDF files in email message attachments. Email bodies are scanned upon connection and checked for PDF attachments. Text is extracted from the PDF via pdftotext and scanned by SpamAssassin. Should the PDF contain images, the gocr program is called to extract the text content. The total spam score of the PDF is compared against the global required_score setting; if it's higher, a score equal to the one specified in pdf.cf is appended to the overall score of the email message.