XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.
Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
The libmba package is a collection of mostly independent C modules potentially useful to any project. There are the usual ADTs including a linkedlist, hashmap, pool, stack, and varray, a flexible memory allocator, CSV parser, path canonicalization routine, I18N text abstraction, configuration file module, portable semaphores, condition variables, and more. The code is designed so that individual modules can be integrated into existing codebases rather than requiring the user to commit to the entire library. The code has no typedefs, few comments, and extensive man pages and HTML documentation.
The SODA Native XML Database System is a native XML database that provides efficient management of large amounts of XML data. It is based on a multi-user, client-server architecture with a generic query processing layer that can easily support different query languages. In this lightweight version, user- defined indexes and query optimizations have been removed, however full transaction support (commits and rollbacks) and crash recovery are available.
Babeldoc is a framework and set of applications to process documents for business-to-business and other Internet/integration applications. It is primarily intended for text documents, especially XML, but supports a wide range of operations and data types. It has a sophisticated journaling system that supports replaying and reprocessing. Babeldoc is pipeline based and supports numerous ways to combine the pipeline stages in a dynamically reconfigurable fashion. It has a GUI and a Web-based console for document processing and monitoring, and comes with tools for the tranformation of flatfile data to XML, archival, and cryptography. Additionally it is able to scan various data sources based on sophisticated constraints.
Outwit is a suite of tools based on the Unix tool design principles allowing the processing of Windows application data with sophisticated data manipulation pipelines. The outwit tools offer access to the Windows clipboard, the registry, the event log, relational databases, document properties, shell links, and the event log.
SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.
Nabu is a simple framework that extracts chunks of various types of information from documents written in simple text files (written with reStructuredText conventions, parsed with docutils) and that stores this information (including the document) in a remote database for later retrieval. The processing and extraction of the document is handled on a server, and there is a small and simple client that is used to push the files to the server for processing and storage. The client requires only Python to work. The presentation layer is left unspecified: you can use whichever Web application framework you like.