X-Hive/DB is a powerful native XML database designed for software developers who require advanced XML data processing and storage functionality within their applications. The comprehensive X-Hive/DB Java API contains methods for storing, querying, retrieving, transforming, and publishing XML data. X-Hive/DB supports all major W3C standards, such as XQuery, XPath, DOM, XPointer, XML Schemas, and more.
XMLPublication is a set of tools to generate Web pages from (possibly large) desktop documents or other structured documents, such as books with paragraphs, or tabular data. It cuts documents into Web pages, and creates customizable multi-indices. All this is done through a repeatable process in which data is separated from presentation and user settings. It uses XML techniques, particularly XSLT and Ant.
Java Search Engine is a server-side search engine program for Web sites written completely in Java. It features HTML and PDF indexing, a built-in Web crawler, international encodings support, words and phrases search, and returning results as quotations with highlighted words (like Google). It is available as EJB, JSP, servlet, or Java API library. For non-Java enviroments, it is available as an XML server with XSLT support.
Docco is a personal document retrieval tool based on Apache's Lucene indexing engine. It allows you to create an index for files on your file system which you can then search for keywords. It is not only a lot faster than searching by recursing through your file system every time, it also offers you extended query options like wildcards and fuzzy search as well as a visualization of result set intersections.
SearchAssist is a simple but practical search engine application that uses a ternary search tree. It uses Java's dynamic loading feature to make the search engine highly customizable, and uses takes Mozilla bookmarks as input. A Swing UI allows users to enter search words and view the results.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
The Multivalent PDF Tools is a suite of tools for manipulating PDF documents. It includes tools for compressing, uncompressing (for hand editing), obtaining metadata, splitting and merging, encrypting and decrypting, validating, imposition (aka n-up), making page images, extracting text, and full-text indexing (with Lucene). The compress tool shrinks the PDF 1.5 Reference from 13.5MB to 8MB in PDF 1.5/Acrobat 6 format and down to 5.1MB in a new proposed "Compact" format.