Search::Xapian is a Perl XS frontend to the Xapian C++ search library. It is a fairly complete wrapper: most features of the Xapian library are made available for use from Perl. Xapian is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model as well as a rich set of boolean query operators. It's fast and scalable to hundreds of millions of documents.
Twibright Twig is a static HTML photo gallery software that supports organization of JPEG and PNG images into a directory structure and EXIF and JPEG comments. It is meant for more experienced users rather than newbies. Three levels of downscaled image and three levels of thumbnails are generated. Each image is assigned a unique identifier to faciliate easy random linking from a master Web site. It handles reasonable large galleries (and is currently used for a 3GB one). Automatic regeneration of added, changed, and deleted images can be done with one script.
PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
Amberfish is a general purpose text/XML retrieval utility. It features indexing of both free text and nested fields, built-in support for XML documents, structured queries allowing generalized field/tag paths, hierarchical result sets, automatic searching across multiple databases, efficient indexing, and relatively low memory requirements.
POPsearch is a desktop search engine that is designed to help you easily find information on your computer. With features that other search engines don't have,it lets you index your entire collection of email messages and files. As information is indexed, it is immediately available for analysis from any Web browser. When POPsearch is configured correctly, you can also access your data remotely with RSS feeds, email feeds, or from any computer that has a Web browser.
Tdbengine is an RDBMS with an integrated programming language. It represents the enhancement of the famous DOS-TDB, and is designed to handle databases on the Web. It connects to the Web server using the standard CGI interface, or runs on the command line. It is very small (about 400 KBytes), extremely quick, and easy to administrate. Its features are full text indexing, an automatic data link system, and the script language EASY, which replaces the commonly used SQL with its modular code.
LinkGrammar-WN is a lexicon expansion for the Link Grammar Parser. The Link Grammar Parser is a syntactic parser of the English language that is capable of handling a wide variety of syntactic constructions and is considered quite robust. The LinkGrammar-WN project aims to import lexical information from WordNet in an effort to increase the size of the LGP lexicon. This project is of interest to anyone interested in NLP (natural language parsing) of English text.