HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.
XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.
Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.
SearchAssist is a simple but practical search engine application that uses a ternary search tree. It uses Java's dynamic loading feature to make the search engine highly customizable, and uses takes Mozilla bookmarks as input. A Swing UI allows users to enter search words and view the results.
The Multivalent PDF Tools is a suite of tools for manipulating PDF documents. It includes tools for compressing, uncompressing (for hand editing), obtaining metadata, splitting and merging, encrypting and decrypting, validating, imposition (aka n-up), making page images, extracting text, and full-text indexing (with Lucene). The compress tool shrinks the PDF 1.5 Reference from 13.5MB to 8MB in PDF 1.5/Acrobat 6 format and down to 5.1MB in a new proposed "Compact" format.