BlackRay is a relational database system designed to offer performance features commonly associated with search engines. It offers SQL support and sophisticated operational and management features. Load-balancing and operational stability by means of N+1 redundance are included. BlackRay is called a "Data Engine" since it combines traditional, relational database features and SQL with the power and flexibility of search engines. It is a true hybrid, offering transaction support, data-versioned snapshots, and sophisticated function-based indices. Wildcards, phonetic, and fuzzy logic searches are supported, as well. BlackRay supports a subset of the SQL92 standard and provides JDBC/ODBC/native driver options via the PostgreSQL protocol, in addition to an API based query option. The project is released under the GPLv2, with some drivers available under BSD-style licenses. Commercial support contracts are available as well.
XapianFu is a Ruby library for working with Xapian databases. It builds on the GPL licensed Xapian Ruby bindings, but provides an interface more in-line with "The Ruby Way"(tm) and is considerably easier to use. For example, you can work almost entirely with Hash objects, and XapianFu will handle converting the Hash keys into Xapian term prefixes when indexing and when parsing queries. It also handles storing and retrieving hash entries as Xapian::Document values. XapianFu basically gives you a persistent Hash with full text indexing (and ACID transactions).
The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.
Arch is an extension of Apache Nutch (a popular, highly scalable general purpose search engine) for intranet search. It includes blind test evaluation tools for comparing to other search engines. Arch has many features critical for corporate environments, such as document level security.
FM SiteSearch Pro is a quick and simple solution to adding professional search capability to a Web site. It comes with a relevance engine, control panel, large Web site support, MySQL support (optional), search/keyword statistics, advanced searches, and specialized searches, and is fully customizable. It also comes with a setup interface.
Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.
RestPose is a search engine. It is designed to take a set of documents and then, when given a query, to return ranked lists of documents which are a good match for that query. RestPose manages a set of internal indexes and provides an interface (over HTTP, in a fairly RESTful style, using JSON as the main transfer format) which allows documents to be submitted and removed from indexes, and which allows searches to be performed.
Libcolumbus is a small error tolerant search engine designed to deal with noisy data and typos. It will power the searches in the next generation of Ubuntu's HUD system as well as other searches. It has a fast implementation of the Levenshtein distance algorithm, which allows it to correct errors such as added and dropped letters (e.g. 'bar' -> 'bard'), changed letters ('ctr' -> 'car') and translations ('acr' -> 'car'). It also allows the user to customize the error values. Libcolumbus is designed to be small, efficient and easy to embed. It is programmed in C++ but also provides C and Python APIs.