Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
WACS is a tool for building adult Web sites. It is equally suitable for managing a private collection or building a commercial site. WACS has many best of breed features, including dynamic filtering, model catalogs, random sampling, galleries, automatic download, and a powerful search engine. WACS contains both a full ready-to-run Web environment and an extensive programming API implemented in both Perl and PHP.
WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.
cthumb allows you to create Web albums of digital pictures with thumbnails, captions, and several different views of the collection, including (optionally) several languages and resolutions. An album is composed of a series of pages, each composed of a collection of pictures. You can have several annotations per picture, and can customize almost everything in the way the albums look on the screen.
A 'honeypot' is designed to detect server-side attacks. In contrast, a 'honeyclient' is designed to detect client-side attacks. Specifically, a honeyclient is a dedicated host that drives specially instrumented applications to access remote servers to see if those servers are behaving in a malicious manner (by compromising the client). Honeyclients can proactively detect exploits against client applications without known signatures. This framework uses a client-server model with SOAP messaging as the primary communication method, and uses the free version of VMware Server as a means of virtualizing the client environment.
Sherlock Holmes is a modular system for gathering and indexing textual and image data, and searching in it. The most popular application is, of course, indexing of Web pages ranging from small Web sites to whole top-level domains, but other data sources, parsers, and user interfaces can be added easily.
PhiloLogic is a full-text database engine developed for humanities computing text analysis by the ARTFL Project and the Digital Library Development Center at the University of Chicago. It is optimized for fast searching across very large collections of documents. It currently supports TEI-Lite, TEI XML, and TEI SGML documents.
ALTSE is an alternative search engine technology. It can index up to a couple million Web pages. Altse does not aim to replace existing search engine technologies such as Google, Yahoo, and so on. Instead, it aims to provide an affordable alternative to index Web pages for small size businesses and organizations.