Tspell is a library and applications for solving Turkish Natural Language Processing (NLP) related computational problems. Turkish, by nature, has a very different morphological and grammatical structure than Indo-European languages such as English. Since it is an agglutinative language like Finnish, even making a simple spell checker is very challenging. Some target problems are: a spell checker, a word analyzer that determines roots and suffixes, a word constructor based on suffixes, and much more.
IPA Zounds models language sound changes by applying a given set of sound change rules to a given lexicon. It has a built-in model of the International Phonetic Alphabet, allowing users to write input words in IPA characters and rules using those characters or the distinctive features of the model.
Keyano is a graphical front end for popular Unix applications such as play, aplay, festival, and fortune. It has the ability to turn your PC into a audio/visual sampler that works similar to samplers now in use by DJs. It also includes vocal dictionary and text reader capabilities, as well as a spelling Tutorial and an early version of a chatter bot (in alphabet mode you can: type "A B C" and it says them out loud while it shows letters on screen).
Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.
The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.
ISCII Utilities is two programs for analyzing text files encoded according to the Indian Script Code for Information Interchange (ISCII), the Indian national standard. IsciiName identifies each code, printing the byte offset, the code in hex, and an explanation of the meaning of the code. ATR codes for writing system transition and display mode are interpreted. CountIsciiChars counts the codes in an ISCII file and classifies them according to their type and function. The original purpose was computing accurate letter counts for reading studies, but this information is also useful when processing ISCII-encoded text.