Luke is a handy development and diagnostic tool for Apache Lucene. It accesses existing Lucene indexes and allows you to display and modify their contents in several ways. A user can browse by document number or by term, view documents, copy them to the clipboard, retrieve a ranked list of the most frequent terms, execute a search and browse the results, analyze search results, selectively delete documents from the index, reconstruct the original document fields, edit them, and reinsert them into the index, optimize indexes, and much more. Luke can also be extended through plugins.
SCAN is a personal information retrieval framework, combining search, text analysis, tagging, and metadata functions for document collections management. SCAN is a component-based software using a number of plugins for specific features. The basic SCAN platform can be easily extended with plugins for different document formats and document location types.
TiTLi is a Google-like search tool for relational databases . It builds on top of Apache Lucene to provide an API and a GWT-based UI for searching multiple databases from various vendors simultaneously. It is very fast due to indexing, and the database is queried only when a record is chosen.
TextSearch is a program to search through a set of text files in a directory structure. Each document is searched using a regular expression and an overview of the results is shown as a tree structure. By clicking on a file, it can be viewed, with matches being highlighted. As opposed to other programs out there, its focus is not so much on statistics, i.e. how often a word would occur in an entire corpus of files, but rather on occurrences in single files.
index.rb is a general indexing framework for Ruby. With it, you can create collections of documents, then index and search them. The traditional inverted index is supported, as is Latent Semantic Indexing (LSI). Input documents may be stemmed, to make user queries more general. It also provides TextTiling to break input documents covering multiple topics into topic-specific sub-documents.
Isobel is a framework to build complex information retrieval and analysis systems. Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately. Isobel Gatherer offers ready-to-use services like content fetching, scheduling, document format conversion, Hyperlink graph storage and analysis, content storage and indexing. A programmer may easily add new services. Isobel Analyzer uses the IBM UIMA architecture to reuse the analysis components developed for this architecture.
Sikher is a desktop program designed to archive, search, and display the Sikh scriptures using advanced functions. It allows the common person to understand and read the messages contained in the Sikh scriptures through translations and transliterations in different languages, thereby breaking the language and geographical barrier between Gurbani (Sikh Scriptures) and the world. Sikher is a robust, future proof, and cross-platform application which may be used by developers to create similar internationalized and localized search applications.