Arabic Wordlist is a project to deliver an English to Arabic translated word list to be used in translations and/or dictionaries. The word list contains in excess of 83,500 words (and growing), and spans a variety of categories (i.e. it is general in nature). This word list is encoded in UTF-8, and is expected to be used in many online free dictionaries.
The purpose of Mind AI is to build an artificial mind based on some advanced concepts: machine learning, representation and meta representation of concepts, concept reflection, reification (concept to meta concept), and denotation (meta concept to concept), and to explore some new concepts. Interaction with the AI is done via IRC.
The Computational Linguistics Toolset is a set of tools for computational linguistics. It contains re-usable code for cleaning, splitting, refining, and taking samples from corpora (ICE, Penn, and a native one), for tagging them using the TnT-tagger, for doing permutation statistics on N-grams (useful for finding statistically significant syntactical differences between any two sets of tagged texts), and various examination-tools. The tools themselves are well documented.
FramerD is a semi-structured object database integrated with a Scheme-based scripting language which supports multi-lingual programming (with pervasive Unicode), a stable module system for programming in the large, distributed applications (via an extensible RPC protocol), non-deterministic (PROLOG-like) evaluation for search and set operations, multi-threaded program execution, extensive tools for text and language analysis, built-in HTML/XML/MIME parsers, and intuitive (CGI- and FastCGI-based) Web scripting. The built-in object database robustly supports millions of objects and indexed access to those objects, both through disk files and networked servers.
SILGraphite (formerly OpenGraphite) is a project within SIL's Non-Roman Script Initiative and Language Software Development groups to provide extensible cross-platform rendering capabilities for complex non-Roman writing systems. It consists of a rule-based programming language, Graphite Description Language (GDL), that can be used to describe the behavior of a writing system, a compiler for that language, and a rendering engine that can serve as the backend of a text processing application. SILGraphite renders TrueType fonts that have been extended by means of compiling a GDL program. It is currently being integrated into Gecko/Mozilla through the SILA project, a GNU/Linux port is also underway, and there are plans for OpenOffice.org and Abiword integration.
OpenEphyra is a question answering (QA) system. It retrieves answers to natural language questions from the Web and other sources. OpenEphyra comes with implementations of algorithms that proved effective in Carnegie Mellon's Ephyra system, which participated in the TREC evaluations. It is platform independent and can be set up in just a few minutes. The goal of this project is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.
Hodie prints the current date and time to stdout in Roman numerals, with grammatically correct Latin. Complete with Id., Kal., Non., pridie, postridie, bis, and all the other nice annoyances. As an option, it even provides you with current date according to Roman calendar -- that is 'ab urbe condita'; after Rome was built.
The Universal Text Recognizer and Converter (Utrac) is a commandline tool and a C library that recognizes the encoding of an input file (UTF-8, ISO-8859-1, CP437, etc.) and its end-of-line type (CR, LF, or CRLF). It features automatic recognition (depending on the file and on the system's locale, reliable in most cases), assistance for verification or manual recognition, and conversion to another charset and/or end-of-line type.