pdf2html converts one PDF into a series of ready-to-use HTML pages with PNG images of individual pages. It runs GhostScript at high resolution and processes the output into low-res, 8-bit grayscale PNG's, using 17x15 times subsampling to achieve exactly 256 levels of gray. It is intended to convert PDF for online Web browsing where time is not critical but quality is desired. A special utility to print the content on a Epson-compatible 9-pin printer is also included (with 12 possible quality/speed tradeoffs).
mod_filter allows you to filter output from other modules inside of Apache. This allows you to implement filters (think Swedish Chef, jive, etc.). You can also use it to retailer output for your locale. This works with HTML documents, mod_perl, PHP, JServ, CGIs, and for that matter just about any sort of custom handler you might have.
Demoroniser is a Perl script which attempts to fix the gratuitously incompatible HTML generated by Microsoft applications. Many Microsoft programs use an 'enhanced' version of Latin-1 with extra characters like quotation marks and dashes. Sometimes people paste these characters into supposedly ASCII or Latin-1 web pages, resulting in pages that don't display properly on non-MS platforms. Demoroniser replaces these MS characters with standard ASCII equivalents. It also fixes up wrongly nested tags generated by HTML export in some MS applications.
NetCrawler is the frontend to a Web crawling system. This command line application will download all of the pages within a domain, and then parse and process all of the relative content (Images, Text, Audio, Video), saving this content within an XML document for later processing. It is definitely alpha quality, but has been used quite extensively.