Pure PHP Spell Check performs spell-checking of text using only base PHP functions, without using specific spell check PHP extensions such as aspell or pspell. The class uses a dictionary that is implemented as an array-based binary search table. The binary search table declaration is saved to a file for speed and can be updated easily by the developer.
Konjugator helps with learning or interpreting verb forms in Welsh. It produces a list of around 200,000 inflected verb forms for almost 4,000 Welsh verbs, along with English glosses and parsing information. It attempts to conjugate Welsh verbs that are unknown to it, and will give parsing details for random Welsh verb forms if these are known to it.
Isobel is a framework to build complex information retrieval and analysis systems. Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately. Isobel Gatherer offers ready-to-use services like content fetching, scheduling, document format conversion, Hyperlink graph storage and analysis, content storage and indexing. A programmer may easily add new services. Isobel Analyzer uses the IBM UIMA architecture to reuse the analysis components developed for this architecture.
Gentium is a typeface family designed to enable the diverse ethnic groups around the world who use the Latin script to produce readable, high-quality publications. It supports a wide range of Latin-based alphabets, and includes glyphs that correspond to all the Latin ranges of Unicode.
XChestival is an improved version of xchat_speak designed for Italian. It lets xchat and irssi "speak" through festival. It comes with a script for xchat and irssi and the Italian phonemes. The scripts have some useful features like channels and query filtering and string substitution.
SenseClusters is a natural language processing package that allows you to cluster similar contexts or to identify clusters of related words. It supports its own native methods based on first and second order representations of context, and also supports Latent Semantic Analysis. It is fully unsupervised, and can automatically discover the optimal number of clusters in your text. SenseClusters is a complete system that takes users from preprocessing of raw text to providing clustered output.