AutoGen is a tool designed for generating program files that contain repetitive text with varied substitutions. Its goal is to simplify the maintenance of programs that contain large amounts of repetitious text. This is especially valuable if there are several blocks of such text that must be kept synchronized. Output is specified with a Scheme-enhanced output template. Input, if required by your template, may come from AutoGen definitions, CGI data, or XML files.
RefDB is a reference database and bibliography tool for SGML, XML, and LaTeX documents. Command-line tools allow interactive or scriptable access to the data which are stored in a SQL database. RefDB can also be accessed through a Web interface, a SRU interface, or via editor extensions (Emacs/vim). Libraries for Perl and PHP are available for programmers. RefDB provides sophisticated character encoding handling, using Unicode by default.
sgml2x is a script designed to help applying a DSSSL stylesheet to an SGML or XML document. It has a couple of interesting features, such as multiple possible stylesheets per document class, easy integration of new stylesheets by adding a simple new definition file in a configuration directory (system-wide, per-user, or per-project), and automatic selection of a default stylesheet to be used. It is already set up for DocBook SGML/XML.
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, tokenizer, and external part-of-speech tagger and shallow parsers. The following external tools are used: the Grok system for English (tagging and chunking) and the morphological analyzer ChaSen for Japanese. Other tools such as the TreeTagger can easily be added. Translated documents can be sentence aligned using the length-based approach by Gale & Church. Words and phrases can be aligned using the clue alignment approach and the toolbox for training statistical alignment models GIZA++.