Sherlock Holmes is a modular system for gathering and indexing textual and image data, and searching in it. The most popular application is, of course, indexing of Web pages ranging from small Web sites to whole top-level domains, but other data sources, parsers, and user interfaces can be added easily.
| Tags | Internet Web Indexing/Search Text Processing Indexing |
|---|---|
| Licenses | GPL |
| Operating Systems | POSIX Linux |
| Implementation | C Perl |
Recent releases


Release Notes: This release moves almost all features from the commercial version into the free version. The most prominent new features include the computation of dynamic page weights, a new gatherer that intelligently chooses which pages to crawl by their weights, site compression, and extended image and audio search. The indexer is much faster thanks to new sorting routines.


Release Notes: This release can download JPEG, PNG, and GIF images, store their thumbnails, and search in them through their reference texts. HTML documents can now be filtered by their content. This release significantly speeds up the indexer and the search server on multi-processor systems. It can be used on Darwin (Mac OS X).


Release Notes: Sherlock now contains a new library for analyzing the contents of the documents. An existing index can now be quickly patched by new cards. The search server dumps the context of long cards better, and it can serve as a simple database by allowing browsing of all cards. A faster utility, "shcp", was added for copying the index into different machines. The configuration mechanism has been improved. Sherlock now supports the AMD64 architecture. Most modules have been substantially optimized, cleaned up, and corrected.


Release Notes: The limitation on indexing only the first 4096 words in a document has been removed. Two morphological stemmers and utilities to create tables for them have been added. The customization interface, the makefiles, and the configuration system have been greatly improved. A major cleanup of the code has been done, several bugs have been fixed, and many small features have been added.


Release Notes: This release fixes a bug in the gatherer concerning compressed buckets. Upgrading is essential.
A flexible and manageable operating system for PCs, notebooks, and thin clients.
A tool which splits a single WAV file into multiple wav files based on silence.