Pyndex is a simple and fast full-text indexer implemented in Python. Pyndex also includes an easy to use Bayesian classifier. It uses Metakit as its storage back-end. It works well for quickly adding a search feature to an application, and is also well suited to in-memory indexing and searching. It can handle phrase queries. It performs best in applications involving a few thousand documents, but its scaling is mostly limited by available memory.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
Dowser is a Web research and archiving tool that clusters results from search engines, associates words that appear in previous searches, and keeps a local cache of all the results you click on in a searchable database along with summaries and links to related information. It helps you to keep track of what you find, with no advertising.
Isobel is a framework to build complex information retrieval and analysis systems. Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately. Isobel Gatherer offers ready-to-use services like content fetching, scheduling, document format conversion, Hyperlink graph storage and analysis, content storage and indexing. A programmer may easily add new services. Isobel Analyzer uses the IBM UIMA architecture to reuse the analysis components developed for this architecture.
tag-not-ed is a system that allows you to create and manage text documents by attaching tags to them. Later, documents can be retrieved by running queries on those tags (e.g., "show me all docs that deal with 'dogs' and 'cats'"). It is composed of a front-end (currently a mode for the jed text editor) and an indexer. The latter can be used to implement a rudimentary "tagging file system".