The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.
The GCC XML Tree Node Introspector project consists of a patch to the gcc compiler to output the internal compiler tree nodes in RDF/XML and programs to process that RDF/XML. The tree nodes are complex data structures which represent the source code inside the compiler. Through these tree nodes, users are able to extract information from their programs that would be otherwise very difficult to obtain. Modules exist to store these nodes in Redland RDF using a Berkley database. The long-term goal of the project is create a high-level API that will make the programmatic manipulation of programs easier than it is now.
libuninum is a library for converting Unicode strings to integers and integers to Unicode strings. Internal computation is done using arbitrary precision arithmetic, so there is no limit on the size of the integer that can be converted. Values are passed and returned as ASCII decimal strings, GNU MP mpz_t objects, or unsigned long integers. Auto-detection of the number system is provided. Very many number systems are supported. Group delimitation for output strings is fully controllable. Command line and graphical interfaces are also provided.
WordNet® is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Xlit converts text from one writing system into another. It allows the user to define a transliteration simply by typing the input strings in one window and the strings to which they are to be mapped in another. Transliteration may be restricted to regions bounded by specified delimiters or their complements. Transliteration may also be performed by external commands or plugins. Xlit can also convert one type of delimiter to another, e.g. from HZ escapes to XML. Xlit can read and write transliteration definitions in its own format and as Yudit keymaps. It can be run in batch mode without the GUI.
kdrill helps people learn Japanese 'Kanji' characters. Its includes a multiple-choice Kanji quiz program that helps people learn Japanese characters with different guess formats and history options. It also has a suite of dictionary lookup functions. Words can be found using a variety of methods including Romaji, SKIP, four-corner, cut-n-paste, radical lookup, and English search.
Linguaphile is a simple command line language translator. It is open source, platform independent, and programmed in Perl. Linguaphile currently supports the following languages: Afrikaans, Alawa, Albanian, Arrernte, Basque, Belarusian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hawaiian, Hungarian, Icelandic, Indonesian, Interlingua, Irish, Italian, Kala Lagaw Ya, Korean, Kriol, Latvian, Lithuanian, Malay, Maltese, Maori, Norwegian, Pitjantjatjara, Polish, Portuguese, Romanian, Russian, Samoan, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Thai, Tok Pisin, Turkish, Ukrainian, Warlpiri, and Welsh. The Spanish to English translation is the most useful at this stage.