Tomabaem is a substitute for the System's Character Palette, at least for people focusing on the so-called CJKV languages (Chinese, Japanese, Korean, and Vietnamese). Tomabaem, like Unicode, is cross-language. Whatever you are looking for related to Chinese characters, there's a high chance that Tomabaem has a way of looking it up, whether it's the Cantonese pronunciation, the UTF-16 codepoint, the radical, the meaning, or the character itself, which you can copy/paste or drag'n'drop from another document. It uses UniHan.txt file from the Unicode Consortium as the basis of the data shown.
Lost in Translation is a steganographic encoder that exploits the possibilities of steganographically embedding information in the "noise'' created by automatic translation of natural language documents. Because natural language translation inherently creates plenty of room for variation, it is ideal for steganographic applications. Also, because there are frequent errors in legitimate automatic text translations, additional errors inserted by an information hiding mechanism are plausibly undetectable and would appear to be part of the normal noise associated with translation.
The Universal Text Recognizer and Converter (Utrac) is a commandline tool and a C library that recognizes the encoding of an input file (UTF-8, ISO-8859-1, CP437, etc.) and its end-of-line type (CR, LF, or CRLF). It features automatic recognition (depending on the file and on the system's locale, reliable in most cases), assistance for verification or manual recognition, and conversion to another charset and/or end-of-line type.
Minpair consists of two programs, a C command-line program and a Tcl/Tk GUI, each of which can independently generate a complete list of minimal pairs (words differing in exactly one segment) for use in linguistic research. The GUI may also be used to control the faster CLI program. Both allow sequences of characters to be defined as single segments. Unicode is fully supported. It is also possible to obtain a list of pairs differing in exactly two positions for use in finding phonological rules.
libtranslate is a library for translating text and Web pages between natural languages. Its modular infrastructure allows the user to implement new translation services separately from the core library. libtranslate is shipped with a generic module that supports Web-based translation services such as Babel Fish, Google Language Tools, and SYSTRAN. Moreover, the generic module allows new services to be added simply by adding a few lines to an XML file. The libtranslate distribution includes a powerful command line interface.
ByteName is a tool that for each byte of the input prints a line consisting of the byte offset, the byte in hex, octal, binary, and decimal, and its description in a selected single-byte encoding. A command line flag suppresses printing of lines corresponding to ASCII characters, which is useful for locating stray non-ASCII codes. It can also generate a chart for a specified encoding or, for a specified codepoint, generate descriptions in all known encodings.