HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.
Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.
SWISH++ is a Unix-based file indexing and searching engine (typically used to index and search files on web sites). It was based on SWISH-E although SWISH++ is a complete rewrite. SWISH++ is at least 10 times faster and can handle much larger numbers of files. Additionally, it has unique features such as selective non-indexing, on-the-fly filters, user-selectable stemming, and more.
WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.
Doli (Documentation Libre Indexée) is a portable system to index and search documentation. The system consists of an indexer, and a Tcl-based Web server which provides the search interface. It was designed to provide a platform-independent method for searching HTML documentation. A PHP and MySQL interface is also included.
XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.