Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
Berkeley DB XML is a native XML database engine for use within your product. Made available as a C++ library with language bindings for Java, Perl, Python, PHP, and Tcl, it integrates directly into your application (it is not a standalone database server). It provides XQuery access into a database of document containers. XML documents are stored and indexed in their native format using Berkeley DB as the transactional database engine.
SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.