Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.
uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.
ISCII Utilities is two programs for analyzing text files encoded according to the Indian Script Code for Information Interchange (ISCII), the Indian national standard. IsciiName identifies each code, printing the byte offset, the code in hex, and an explanation of the meaning of the code. ATR codes for writing system transition and display mode are interpreted. CountIsciiChars counts the codes in an ISCII file and classifies them according to their type and function. The original purpose was computing accurate letter counts for reading studies, but this information is also useful when processing ISCII-encoded text.
The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.
Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.
Keyano is a graphical front end for popular Unix applications such as play, aplay, festival, and fortune. It has the ability to turn your PC into a audio/visual sampler that works similar to samplers now in use by DJs. It also includes vocal dictionary and text reader capabilities, as well as a spelling Tutorial and an early version of a chatter bot (in alphabet mode you can: type "A B C" and it says them out loud while it shows letters on screen).
IPA Zounds models language sound changes by applying a given set of sound change rules to a given lexicon. It has a built-in model of the International Phonetic Alphabet, allowing users to write input words in IPA characters and rules using those characters or the distinctive features of the model.