UnicodeDataBrowser is a browser for the UnicodeData.txt file, which contains much useful information but is not easily read by humans. It creates a scrollable table in which columns represent properties. The table may be sorted on any column. Abbreviations are expanded and characters cross-referenced in decomposition and casing fields are named. Regular expression search restricted to a selected column is available. The set of characters for which information is displayed may be restricted to those characters matching a regular expression on a specified property.
CharEntry is a tool for inserting non-ASCII characters into text, with particular emphasis on linguistic notation. It provides charts of the consonants, vowels, and diacritics of the International Phonetic Alphabet as well as a chart of precomposed accented characters. Clicking on a character inserts it into a text region, the contents of which may be saved to a file or copied and pasted elsewhere. A widget for inserting characters by Unicode codepoint is also provided. Furthermore, it is possible to read the definition of a custom character chart from a file.
WordGenerator generates hypothetical words from specifications of their syllable structure. The user specifies the maximum length of the words in syllables, the abstract structure of syllables in the language (in terms of such units as consonants and vowels or onsets and rhymes), and the actual sounds that comprise each abstract class (e.g. the list of vowels in the language); WordGenerator then generates the words that conform to this specification. Such lists are useful to field linguists exploring the vocabulary of a language, and to designers of artificial languages.
Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
rfc2mib is a short Tcl script which may be used to extract MIB (Management Information Base), PIB (Policy Information Base), and ASN.1 modules from an RFC document. Unlike most extractors, this script is smart enough to recognize ASN.1-style comments prior to or within the module header. It also recognizes the use of the "TagDefaults" part of the module header (not used by MIB modules), module headers that are broken across multiple lines, and macro definitions.