HALoGEN is an extremely powerful and easy to use general-purpose natural language generation system. It consists of a symbolic generator, a forest ranker, and some sample inputs. The symbolic generator includes the Sensus Ontology dictionary based on WordNet. The forest ranker includes a 250 million word ngram language model (unigram, bigram, and trigram) trained on the Wall Street Journal newspaper text. The symbolic generator is written in LISP and requires a Lisp interpreter.
Linguaphile is a simple command line language translator. It is open source, platform independent, and programmed in Perl. Linguaphile currently supports the following languages: Afrikaans, Alawa, Albanian, Arrernte, Basque, Belarusian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hawaiian, Hungarian, Icelandic, Indonesian, Interlingua, Irish, Italian, Kala Lagaw Ya, Korean, Kriol, Latvian, Lithuanian, Malay, Maltese, Maori, Norwegian, Pitjantjatjara, Polish, Portuguese, Romanian, Russian, Samoan, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Thai, Tok Pisin, Turkish, Ukrainian, Warlpiri, and Welsh. The Spanish to English translation is the most useful at this stage.
Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". It was primarily developed for language guessing, a task on which it is known to perform with near- perfect accuracy. Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses.
JavaBot is a chat bot for the AOL Instant Messenger, MSN Messenger, and Yahoo! messenger systems. It is designed to attempt to engage in conversation with people via IM. It supports such features as remote administration and full conversation logging. There is also a mechanism for eavesdropping on conversations other people are having with the bot. All administration is handled over IM, allowing remote administration. Using a simple scripting language, the bot's conversational responses can be modified to taste.
Biaroza is a multi-dictionary system for human languages which aims to set a standard on such type of software. It works internally (and externally if you want so) in UTF-8. The software itself supports querying by particles, customizable in/out filtering, and interface mode (for using with another software) among other features.
MegaLettering is the PHP engine created to manage the Italian translation of www.megatokyo.com, but it is written with general use in mind, so it can support any number of languages. Text in baloons can be translated by using a MySQL database that defines both the balloon shapes and the translated text and fonts to use to add new text.
Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.
BabelKit is an interface to a universal multilingual database code table. It takes all of the programming work out of maintaining multiple database code definition sets in multiple languages. The code administration and translation page lets developers define new virtual code tables, new languages, enter all codes and their descriptions, and then translate them into all languages of interest. Perl and PHP classes retrieve the code descriptions and automatically generate HTML code selection elements in the user's language. This makes internationalization and localization of Web sites and database interfaces much easier.