HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.
Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).