Docco is a personal document retrieval tool based on Apache's Lucene indexing engine. It allows you to create an index for files on your file system which you can then search for keywords. It is not only a lot faster than searching by recursing through your file system every time, it also offers you extended query options like wildcards and fuzzy search as well as a visualization of result set intersections.
Dowser is a Web research and archiving tool that clusters results from search engines, associates words that appear in previous searches, and keeps a local cache of all the results you click on in a searchable database along with summaries and links to related information. It helps you to keep track of what you find, with no advertising.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
LaTeX Word Counter is a word counter for LaTeX files. It can also count simple text files. It has a simple and easy graphical interface. Logs can be shown for every file (such as included LaTeX files). When opened, the file is determined to be LaTeX or not. When done, all the words are counted. Once done, the user can always recount the same file or open another.
Luke is a handy development and diagnostic tool for Apache Lucene. It accesses existing Lucene indexes and allows you to display and modify their contents in several ways. A user can browse by document number or by term, view documents, copy them to the clipboard, retrieve a ranked list of the most frequent terms, execute a search and browse the results, analyze search results, selectively delete documents from the index, reconstruct the original document fields, edit them, and reinsert them into the index, optimize indexes, and much more. Luke can also be extended through plugins.
Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.