OpenSearchServer is a stable, high-performance search engine and a suite of high-powered full text search algorithms. Documents can be indexed in sixteen languages. Multi-lingual analyzers slice sentences into words, then run lemmatisation algorithms on words based on the document's language. Numerous document formats are supported, such as XML, HTML/XHTML, PDF, Word, PowerPoint, RTF, OpenOffice, plain text, MP3/4, Ogg, FLAC, etc. The Web interface, built around the Zkoss framework, provides an easy way to manage OSS. The integration is fast using the PHP client or the API (XML over HTTP). The crawlers of OpenSearchServer go through Web sites, file systems, and databases to rapidly and easily build your index.
Associations Indexing Service (AIS) was originally done as an extension of human memory for tagging (storing under personal keywords and associations) resources, URIs, bookmarks, and memos (for fast access to the information in future) by using the same keywords or queries, similar to popular search engines. It can be seen as a local search engine, used as an automatic indexer of big file hierarchies (e.g. personal archives or files repositories). It is based on Lucene, so the application will remain very fast with any size index.
Basenji is an indexing and search tool designed for easy and fast indexing of media collections. Once indexed, removable media such as CDs and USB sticks can be browsed and searched for specific files very quickly, without actually being connected to the computer. Besides file hierarchies and audio track listings, Basenji also presents extracted metadata (image dimensions, mp3 tags etc.) and content previews of indexed media in a clean and straightforward user interface.
XapianFu is a Ruby library for working with Xapian databases. It builds on the GPL licensed Xapian Ruby bindings, but provides an interface more in-line with "The Ruby Way"(tm) and is considerably easier to use. For example, you can work almost entirely with Hash objects, and XapianFu will handle converting the Hash keys into Xapian term prefixes when indexing and when parsing queries. It also handles storing and retrieving hash entries as Xapian::Document values. XapianFu basically gives you a persistent Hash with full text indexing (and ACID transactions).
Multi-Dimensional Data Structure (mdds) is a C++ library that includes a collection of various data structures designed to efficiently store and query multi-dimensional data for various filtering criteria. Different structures are optimized for different query needs. The library is provided as a header only, meaning that programs do not need to link to any additional shared library in order to use these data structures. The data structures are all available as C++ templates.
Stupa is an associative search engine. It lets you search related documents with high performance and high precision. Since document data and inverted indexes are kept in memory, Stupa reflects updates of documents in search results in real time. A server implementation of Stupa is possible by using Thrift.
AudioScout is a distributed audio content indexing system. It can index a large collection of audio content for the purpose of later recognition of unknown signals. Robust to noise, different encodings, and other types of distortion, it can be used for a variety of applications including duplicate detection of files, identifying music, as well as more sophisticated uses involving the enforcement of copyrights and ensuring lawful use of content.
Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.
MightyString adds array functionality and other tools for Ruby strings, including matching, indexing, substitution, and deletion. MightyString::HTML.strip_html provides more ideal HTML-to-ASCII formatting output. This is an advanced block "filtering" module. It works very well, with extremely rare cases which fall through its fingers.