Harvest is a system to collect information and make it searchable using a Web interface. It can collect information using HTTP, FTP, NNTP, and local files. Supported formats include HTML, DVI, PS, fulltext, mail, man pages, news, troff, WordPerfect, C sources, and many more. Adding support for new formats is easy due to Harvest's modular design.
Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.
WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.
XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.
X-Hive/DB is a powerful native XML database designed for software developers who require advanced XML data processing and storage functionality within their applications. The comprehensive X-Hive/DB Java API contains methods for storing, querying, retrieving, transforming, and publishing XML data. X-Hive/DB supports all major W3C standards, such as XQuery, XPath, DOM, XPointer, XML Schemas, and more.
Java Search Engine is a server-side search engine program for Web sites written completely in Java. It features HTML and PDF indexing, a built-in Web crawler, international encodings support, words and phrases search, and returning results as quotations with highlighted words (like Google). It is available as EJB, JSP, servlet, or Java API library. For non-Java enviroments, it is available as an XML server with XSLT support.