dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.
The OMCSNet-WordNet project aims to improve the quality of the OMCSNet dataset by using automated processes to map WordNet synonym sets to OMCSNet concepts and import additional semantic linkage data from WordNet. It is based on OMCSNet 1.2, a semantic network and inference toolkit written in Python/Java. OMCSNet currently contains over 280,000 separate pieces of common sense information extracted from the raw OMCS dataset. This project is also based on WordNet, an online lexical reference system that in recent years has become a popular tool for AI researchers.
OntoWiki is a semantic collaboration platform for the development of Semantic Web knowledge bases. OntoWiki enables large distributed communities to collaborate on a semantic level, by allowing users to contribute small structured knowledge pieces, which stitched together may result in comprehensive knowlegde bases. Powl is Web-based ontology authoring and management solution for the Semantic Web. Both expose an extensive API for PHP programmers.
Red-Piranha is a search system that can actually learn what you are looking for. It can be used as a Web page, command line, or XML-WebService, so it will work with most languages, including Java, Perl, C#/.NET, and PHP. It includes learning abilities for the Desktop/Internet search functionality. All feedback from the user is stored in (editable) XML and RDF, and is used by the system to improve the quality of searches.
IkeWiki is a new kind of Wiki (a so-called "Semantic Wiki") developed by Salzburg Research that allows users to collaboratively annotate pages and links between pages with semantic annotations. Such annotations are useful because they give machines a certain amount of "understanding" of the content that goes beyond merely displaying the page. This information can then, for example, be used for context-specific presentation of pages, advanced querying, consistency verification, or drawing conclusions.
Jigsaw is an embedded data-store designed for the development of data-warehouse, analytical, and machine learning applications. Jigsaw can perform over one million operations a second, and scale to store tera-bytes of data. The object library contains classes for representing ordered and unordered mappings, highly compressed bit vectors with a range of set theoretic operators, and directly integrates a high performance sort system.
Wandora is a general purpose data extraction, management, and publishing application based on Topic Maps and Java. Wandora has a graphical user interface, layered presentation of knowledge, several data storage options, rich data extraction, import and export capabilities, and an embedded HTTP server that enables dynamic publication of Topic Maps. Wandora is well suited for rapid ontology construction and knowledge mashups.