libunibreak is an implementation of the line breaking and word breaking algorithms as described in Unicode Standard Annex 14 and Unicode Standard Annex 29. It is a superset of, and supersedes, liblinebreak. It is designed to be used in a generic text renderer. FBReader is one real-world example.
Aephea is a text-based authoring tool for HTML. It enforces well-formedness with a simpler and stricter TeX-like syntax and provides useful extensions and abstractions with facilities for adding new ones. It emphasizes a single unified approach that stays close to HTML itself and promotes and utilizes CSS extensively. Abstractions such as dictionary stacks, arithmetic, and iteration are part of Aephea.
GNU Source-highlight produces a document with syntax highlighting when given a source file. It handles many languages, e.g., Java, C/C++, Prolog, Perl, PHP3, Python, Flex, HTML, and other formats, e.g., ChangeLog and log files, as source languages and HTML, XHTML, DocBook, ANSI color escapes, LaTeX, and Texinfo as output formats. Input and output formats can be specified with a regular expression-oriented syntax.
PyBison is a sophisticated yet easy-to-use parser creation toolkit for Python that interfaces directly to Bison (yacc)-based parsers. It provides full LALR(1) grammar support, allowing for simple parsing tasks through to writing compilers for high-level languages. Parser code is automatically generated from rules within user-created Parser classes (written in Python), and then, compiled, yacc'ed and linked into a shared library, which is loaded into the running process. All this happens automatically. When the parser runs, it connects directly with the yyparse() routine, and takes event callbacks upon parse targets being reached.
Zoem is a general-purpose macro/programming language that submits text to a two-stage transformation process. Macro expansion and interpretation is followed by application of customizable character filtering rules. Zoem supports inside-out evaluation, comprehensive IO, control operators, iteration, dictionary stacks, multidimensional data storage, arithmetic expressions, regular expressions, system commands, and more.
ICU provides a Unicode implementation, with functions for formatting numbers, dates, times, and currencies (according to locale conventions, transliteration, and parsing text in those formats). It provides flexible patterns for formatting messages, where the pattern determines the order of the variable parts of the messages, and the format for each of those variables. These patterns can be stored in resource files for translation to different languages. Included are more than 100 codepage converters for interaction with non-unicode systems.
uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.