Babeldoc is a framework and set of applications to process documents for business-to-business and other Internet/integration applications. It is primarily intended for text documents, especially XML, but supports a wide range of operations and data types. It has a sophisticated journaling system that supports replaying and reprocessing. Babeldoc is pipeline based and supports numerous ways to combine the pipeline stages in a dynamically reconfigurable fashion. It has a GUI and a Web-based console for document processing and monitoring, and comes with tools for the tranformation of flatfile data to XML, archival, and cryptography. Additionally it is able to scan various data sources based on sophisticated constraints.
Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
Nabu is a simple framework that extracts chunks of various types of information from documents written in simple text files (written with reStructuredText conventions, parsed with docutils) and that stores this information (including the document) in a remote database for later retrieval. The processing and extraction of the document is handled on a server, and there is a small and simple client that is used to push the files to the server for processing and storage. The client requires only Python to work. The presentation layer is left unspecified: you can use whichever Web application framework you like.
Outwit is a suite of tools based on the Unix tool design principles allowing the processing of Windows application data with sophisticated data manipulation pipelines. The outwit tools offer access to the Windows clipboard, the registry, the event log, relational databases, document properties, shell links, and the event log.
Pagex is designed to be a barebones content management system. It was originally built as a time saving precursor to new web development projects. It is a completely functioning "bolt-on" solution aimed at SEO workers, Web developers, and the like. It works purely from the URL used to request a page, simplifying any number of dynamic Web address issues caused when using mod rewrite and allowing you to set a range of variables depending on the URL.
The SODA Native XML Database System is a native XML database that provides efficient management of large amounts of XML data. It is based on a multi-user, client-server architecture with a generic query processing layer that can easily support different query languages. In this lightweight version, user- defined indexes and query optimizations have been removed, however full transaction support (commits and rollbacks) and crash recovery are available.
SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.
XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.