doclifter helps with lifting documents with nroff markup to XML-DocBook. Lifting documents from presentation level to semantic level is hard, and a really good job requires human polishing. This tool aims to do everything that can be mechanized, and to preserve any troff-level information that might have structural implications in XML comments. TBL tables are translated into DocBook table markup, PIC into SVG, and EQN into MathML (relying on pic2svg and GNU eqn for the last two).
pyratemp is probably (one of) the smallest complete template-engines for Python (with about 500 LOC). It has a very small set of special syntax in the templates. This reduces complexity and the probability of bugs and lead to an easy-to-use and intuitive user-interface. It uses embedded Python-expressions (in a "sandbox"), is well documented, has full Unicode-support, and produces very good error-messages, which is very useful when creating new templates.
LyX is a document processor that encourages an approach to writing based on the structure of your documents, not their appearance. It is intended for people people who write and want their writing to look great without tinkering with formatting details, font attributes, or page boundaries. On screen, it looks like any word processor, but it uses the TeX engine for printed output and producing richly cross-referenced PDFs. It is stable and fully featured.
NetCrawler is the frontend to a Web crawling system. This command line application will download all of the pages within a domain, and then parse and process all of the relative content (Images, Text, Audio, Video), saving this content within an XML document for later processing. It is definitely alpha quality, but has been used quite extensively.
Alphabet Soup is a project which attempts to determine a number of things about the shapes of letters in several different writing systems. First, it hypothesizes a set of basic building blocks that all letters are built up from. Second, it hypothesizes a set of rules, a grammar or syntax, which defines how those pieces combine to make different letters. It can generate individual letters, randomize letters in an input string to create weird but readable text, or generate random strings of symbols.
EZ Reusable Objects (EZRO) is a Web application that can be used by non-technical staff to manage content as "objects." Content objects containing text, video, and audio can be shared, modified, and re-styled to appear via a traditional Web site, an on-line course, an innovative "Coach," or as a community of interest site. It is highly scalable and can be used for public Web sites, secure environments, and private intra/extranets.