Release Notes: This is a hodge-podge of fixes and improvements. A new hypex command, the TREC 2005 options files, and an essay on chess are now in the tarball. Several improvements to the parsing engine were made, including a new -e char option and bugfixes. Compilation problems on various architectures were fixed, and libslang2 support was added.
Release Notes: This release includes various bugfixes and small usability improvements in the documentation and default switch handling. The major addition is support for the TREC spamjig and improved memory mapping for faster online learning.
Release Notes: This release added a new MAP confidence score (-U, to complement the -X switch), some new scoring types in mailinspect, and a new parsing switch for trace headers in email (-T email:theaders). Category learning now accepts directory names as well as file names, and preliminary work on a new header mining tool (hmine) was performed. Category files are now written in 'portable' format by default.
Release Notes: This release is focused on the classification engine. A new, slightly faster default tokenizer and a reference measure for spam filtering were added (the old ones are still available). Support for token classes was implemented. Some miscellaneous bugs were also fixed.
Release Notes: This release adds reliability and speed improvements, bugfixes, and new features. Categories are now saved atomically and can be learned incrementally. Learning is faster on average by preloading weights. Signal handling allows graceful interruptions. A basic plot command now exists for the mailfoot/mailtoe utilities. More documentation has been added. Several bugs have been fixed, including an HTML parsing bug discovered by spammers which confused dbacl.
Release Notes: This release has a new, streamlined testsuite facility. It is now possible to compare the classification accuracy of 13 different filters, including ifile, crm114, spambayes, popfile, and spamassassin. Comparisons can be done on your own email corpora via cross validation, train-on-error, or full online learning. An experimental form of attachment scanning is also now available.
Release Notes: The email parser was improved and now decodes base 64 and quoted-printable attachments. Several new switches were added for controlling HTML removal and the RFC822 headers used. The code structure was reorganized into several directories, and the widespread duplication of ASCII/wide character handling functions in previous versions was addressed through a macro system. A new mailcross.testsuite command is now available for directly comparing the program with other mail classifiers such as ifile and bogofilter.
Release Notes: This release adds a command, mailinspect, which permits browsing email folders in order of closest to furthest from a given category, or vice-versa. The sorted emails can be piped to shell commands for further processing, very similar to formail (but, unlike formail, taking into account the similarity ordering).
Release Notes: This release adds a tutorial, and a new tool which computes the optimal Bayesian classification decision based on user-defined prior distributions and a misclassification cost matrix.
Release Notes: Comprehensive documentation for the algorithms and statistical models are now included. The algorithms were extended to handle ngram models better through large deviation estimates. A switch for default ngrams was added, which are much faster than regular expression ngrams. The hash table macros were sped up, code portability was controlled with typedefs, and the regular expression syntax was extended for more convenient model specifications. A switch to view the maximum entropy weights in human readable form was added.