XapianFu is a Ruby library for working with Xapian databases. It builds on the GPL licensed Xapian Ruby bindings, but provides an interface more in-line with "The Ruby Way"(tm) and is considerably easier to use. For example, you can work almost entirely with Hash objects, and XapianFu will handle converting the Hash keys into Xapian term prefixes when indexing and when parsing queries. It also handles storing and retrieving hash entries as Xapian::Document values. XapianFu basically gives you a persistent Hash with full text indexing (and ACID transactions).
Seeks is a Web search proxy, meta-search engine, and real-time P2P pattern matching network for social Web search. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience with these results. On this basis, Seeks allows true real-time, decentralized Web search to emerge. In the long term, there is no need for Web crawlers and third-party Web indexes. Most importantly, Seeks is intended to become a flagship for a fair, transparent, user controlled machinery for searching the Web over the Internet.
The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.
Arch is an extension of Apache Nutch (a popular, highly scalable general purpose search engine) for intranet search. It includes blind test evaluation tools for comparing to other search engines. Arch has many features critical for corporate environments, such as document level security.