Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
Comment is a command line directory context note taker. Notes are stored in both the local directory and each users home. It was developed as a low impact tool for retaining flyaway information that is often needed at a later date. The dual storage system provides convenient access to prior notes, and all notes are stored in plain-text format.
SearchAssist is a simple but practical search engine application that uses a ternary search tree. It uses Java's dynamic loading feature to make the search engine highly customizable, and uses takes Mozilla bookmarks as input. A Swing UI allows users to enter search words and view the results.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
isbnsearch provides a simple method for retrieving information about any book using only an ISBN or EAN barcode. It is intended to provide assistance for online libraries, user groups, or individual users, and is designed in such a way to provide a distributed ISBN database query system. Users can choose to view the summary information (author, title, publisher, date, edition, subject, ISBN) as HTML, XML, or a pre-formatted SQL statement.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
The Revisionist is a tool for extracting and indexing hidden metadata (such as deleted or modified text) from large collections of MS Word files. It can operate whole Web sites or SMB or NFS directories. It is handy for pen-testing, or it can be used just to spot embarrassing secrets.