dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.
Scriptid is a program and a library that can be used to determine whether a given text file contains code of a specified programming language. The current release can tell whether a file contains vbscript or not. It should be possible to extend this to any number of other languages. It is important to also download the latest neural network weights update file.
Naiban is a complete standalone Java application and an Avalon/Keel framework classification service. It features Naive-Bayes based algorithms, JDBC backed persistance, and support for text and numerics. Naive Bayes learning classifiers have recently gained popularity in their application to the spam vs. ham problem. Naiban provides a learning classifier service to the Avalon/Keel framework, and comes complete with two text classifiers and a simple numeric classifier. It is easily extendable, and provides two persistance mechanisms for storing trained data.