dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.
Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
Glossword is a system to publish dictionaries, glossaries, and encyclopedias. It features an installation wizard, support for multiple languages, visual themes, multi-domain installation, an administrative interface with multi-user support, built-in search and cache engines, the ability to export/import dictionaries in XML format, and W3C-validated code. Glossword is useful for any sort of dictionary-like content, including sites with game cheat codes, online translators, references, and various kinds of CMS solutions.
queXC is a Web-based data cleaning and coding/classification system that takes a data file (such as data collected from a questionnaire) and cleans the text input fields by spacing them and spell checking them. It allows operators to code text fields to existing coding schemes, or to create a coding scheme on the fly. Multiple operators can code and clean simultaneously, with the ability to assign operators to do particular codes. The queXC system includes some coding schemes created from ABS (Australian Bureau of Statistics) data. It can be used as an open source replacement for Nvivo in some situations.
Ciao is a complete Prolog system subsuming ISO-Prolog with a novel modular design which allows both restricting and extending the language. Ciao extensions currently include feature terms (records), higher-order, functions, constraints, objects, persistent predicates, a good base for distributed execution (agents), and concurrency. Libraries also support WWW programming, sockets, and external interfaces (C, Java, TCL/Tk, relational databases, etc.). An Emacs-based environment, a stand-alone compiler, and a toplevel shell are also provided.
The GNU Talk Filters are filter programs that convert ordinary English text into text that mimics a stereotyped or otherwise humorous dialect. Some of these filters have been in the public domain for many years, but here they are provided as a single integrated package. The filters include austro, b1ff, brooklyn, chef, cockney, drawl, dubya, fudd, funetak, jethro, jive, kraut, pansy, pirate, postmodern, redneck, valspeak, and warez. This package provides the filters both as individual executables and collectively as a C library, so they can be easily embedded in other programs.
Diogenes is a tool for searching and browsing the Latin and ancient Greek texts published on CD-ROM by the Packard Humanities Institute and the Thesaurus Linguae Graecae. It comes as an easy-to-install stand-alone application for GNU/Linux, Mac OS X, and Windows, based on the Firefox browser (i.e. Xulrunner). Alternatively, it can be installed by a network administrator as a server on a local network, and users then access it via an ordinary Web browser. There is also a command-line tool which can optionally format output as LaTeX instead of HTML.
This is a comprehensive "word game" word list for UNIX/Linux. It is a superset of the author's ENABLE list, the "OSW", and various lists researched by the author's colleague, Alan Beale. At 264,093 words, it is the largest list of its kind, suitable for use in all manners of crossword-type board games and word construction games, as well as for a spell checker dictionary. The YAWL package now includes two anagramming utilities (supplied as source code, handled by the included Makefile). There is also a shell script that extends the UNIX "strings" system command. This is the word list package recommended for the author's Quackey word game.