PolyGen is a program for generating random sentences according to a grammar definition, that is following custom syntactical and lexical rules. Formally, it is an interpreter of a language itself designed to define languages, where to interpret means executing a source program in real time and eventually outputting its result. Here, a source program is a grammar definition. The execution consists of the exploration of such grammar by selecting a random path, and the result is the sentence built on the way.
Apertium is a machine translation platform, initially aimed at related-language pairs, but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides a language-independent machine translation engine, tools to manage the linguistic data necessary to build a machine translation system for a given language pair, and linguistic data for a growing number of language pairs.
Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.
Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.