UverseWiki is a modular open source PHP framework designed for text processing. Unlike most existing solutions, it is not regular expression-based but instead uses a recursive descent parser to build a document object model. After the parsing stage has been finished and the DOM is produced, the original source is discarded and all operations are performed on the document tree instead: nodes can be altered, serialized, or rendered into a particular format (such as HTML or RTF). The wiki syntax is language-neutral and the processing itself is carried out in UTF-8.
jbookshelf is an electronic book collection organizer and reader. It supports collecting plain files (text, HTML, PDF, etc.), has basic collection search, fulltext collection search (planned), internal viewers for plain text, HTML, RTF, and PDF, notes and citations, book categories, FB2 support (planned), and portability (removable drives support) (planned).
xMarkup is a command line and GUI utility for multipurpose processing of a set of text files. It can be used to generate or edit the navigational cross-references within a set of HTML documents, analyze and convert the structure or content of SGML, XML, HTML, or text documents, split or merge text files with specified rules, analyze and extract data, generate scripts, and more. xMarkup supports a built-in procedural language which may be used to describe rules of the processing. This language is a simple dialect of the Icon programming language.
JOrtho is a spell checker for Java. The library works with any JTextComponent from the Swing framework and checks as you type. The dictionary is based on the free Wiktionary.org, and is applicable for multiple languages. You can select the spell checking language via a context menu. The Features of JOrtho are the highlighting of potentially wrongly spelled words, a context menu with suggestions for correct forms of the word, and a context menu with option to change the checking language. At the moment there are nine languages for spell checking available: English, German, French, Spanish, Italian, Russian, Polish, Dutch, and Arabic.
Invenio (formerly CDSware) is a suite of applications that provides the framework and tools for building and managing an autonomous digital library server. It complies with the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic standard. Its flexibility and performance make it a comprehensive solution for the management of document repositories of moderate to large size.
OmegaT is a translation memory application intended for professional translators. It does not translate for you (software that does this is called "machine translation"). It features fuzzy matching, match propagation, simultaneous processing of multiple-file projects, simultaneous use of multiple translation memories, and external glossaries. Document file formats include plain text, HTML, and OpenOffice.org/StarOffice. It has Unicode (UTF-8) support (can be used with non-Latin alphabets). It is compatible with other translation memory applications (TMX Level 1).
Pinot is a D-Bus service that crawls, indexes your documents, and monitors them for changes. It is also a GTK-based user interface that enables you to query the index built by the service or your favorite Web engine, and display and analyze the results. It makes full use of advanced indexing and search facilities offered by Xapian, features language detection, dynamic document summaries, easy labelling of documents, and internal support for common file types. The D-Bus interface allows easy integration with other applications.