Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
X-Hive/DB is a powerful native XML database designed for software developers who require advanced XML data processing and storage functionality within their applications. The comprehensive X-Hive/DB Java API contains methods for storing, querying, retrieving, transforming, and publishing XML data. X-Hive/DB supports all major W3C standards, such as XQuery, XPath, DOM, XPointer, XML Schemas, and more.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.
SILVERCODERS DocStorage is a utility to improve document management. You can have one database for all invoices, guarantees, protocols, and other documents. DocStorage can extract plain text from documents in doc, XLS, PPT, PDF, RTF, ODT, ODS, ODP, docx, XLSX, PPTX, and many other formats. It can use an OCR engine to extract plain text even from scanned documents. It can perform global fulltext search in all documents regardless of format. It supports document versioning, document duplicate detection, document notes, and document signing. It provides full integration with software suites like Microsoft Office and OpenOffice.
Itzam/Sharp is a C# class library for creating and manipulating keyed-access database files containing variable-length, random access records. Information is referenced by a user-defined key value; indexes may be combined with or separate from data. Itzam/Sharp implements the Itzam engine (see Itzam/Core and Itzam/Java) in 100% managed C#. At 32K, the Itzam/Sharp engine is both small and powerful, and it works with any .NET language, including F# and Visual Basic.
VectorSpace Database (VSDB) provides multi-dimensional similarity search capability in a robust server package. It is a server that allows any socket-capable programming language to post and search vectorspaces of multi-dimensional data. Data can be of any base datatype (e.g. text, objects, dating profiles, sessions, ecommerce orders, etc.). VSDB also offers a clustering capability that can display groupings of data based on common dimensions. A built-in thesaurus feature can help bridge multiple-similar-dimensions in search or clustering.