Projects / Sherlock Holmes

Sherlock Holmes

Sherlock Holmes is a modular system for gathering and indexing textual and image data, and searching in it. The most popular application is, of course, indexing of Web pages ranging from small Web sites to whole top-level domains, but other data sources, parsers, and user interfaces can be added easily.

Tags
Licenses
Operating Systems
Implementation

Recent releases

  •  13 Apr 2009 12:55

    Release Notes: This release moves almost all features from the commercial version into the free version. The most prominent new features include the computation of dynamic page weights, a new gatherer that intelligently chooses which pages to crawl by their weights, site compression, and extended image and audio search. The indexer is much faster thanks to new sorting routines.

    •  16 May 2007 20:19

      Release Notes: This release can download JPEG, PNG, and GIF images, store their thumbnails, and search in them through their reference texts. HTML documents can now be filtered by their content. This release significantly speeds up the indexer and the search server on multi-processor systems. It can be used on Darwin (Mac OS X).

      •  26 Jul 2006 04:33

        Release Notes: Sherlock now contains a new library for analyzing the contents of the documents. An existing index can now be quickly patched by new cards. The search server dumps the context of long cards better, and it can serve as a simple database by allowing browsing of all cards. A faster utility, "shcp", was added for copying the index into different machines. The configuration mechanism has been improved. Sherlock now supports the AMD64 architecture. Most modules have been substantially optimized, cleaned up, and corrected.

        •  20 Jun 2005 21:52

          Release Notes: The limitation on indexing only the first 4096 words in a document has been removed. Two morphological stemmers and utilities to create tables for them have been added. The customization interface, the makefiles, and the configuration system have been greatly improved. A major cleanup of the code has been done, several bugs have been fixed, and many small features have been added.

          •  23 Feb 2005 18:50

            Release Notes: This release fixes a bug in the gatherer concerning compressed buckets. Upgrading is essential.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.