Apertium is a machine translation platform, initially aimed at related-language pairs, but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides a language-independent machine translation engine, tools to manage the linguistic data necessary to build a machine translation system for a given language pair, and linguistic data for a growing number of language pairs.
Algraeph is a tool for manual alignment of linguistic graphs, such as phrase structure trees or dependency structures, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these alignments. Graphs are read from one or more graphbanks (or treebanks) in the GraphML or Alpino formats. Alignment relations are user-defined and are stored in a simple XML format, which can be used for further processing. The resulting parallel graph corpus is a useful data set for many tasks in computational linguistics and natural language processing.
The XEVM is an XML processing engine. It's a multi-threaded, Pub/Sub environment for dynamic programming on an event-driven state machine with TCP communications, tight fault free memory management, powerful set algebra, and a magical database. It is 100% C++ (25,000 LOC), with a thin porting layer; there are implementations for POSIX (Mac/Linux) and Win32. The XEVM is for processing XEPL (the Xepl Engine Programming Language).
The NCBI C++ Toolkit provides portable libraries and applications for assisting genetic science. These include libraries for networking, SQL and BerkeleyDB access, CGI and HTML handling, ASN.1 and XML handling, sequence alignment engines, sequence retrieval engines, BLAST database engines, FLTK and OpenGL graphics toolkits, and basic system utilities.
Isobel is a framework to build complex information retrieval and analysis systems. Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately. Isobel Gatherer offers ready-to-use services like content fetching, scheduling, document format conversion, Hyperlink graph storage and analysis, content storage and indexing. A programmer may easily add new services. Isobel Analyzer uses the IBM UIMA architecture to reuse the analysis components developed for this architecture.
SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging experience from more than 20 years of information retrieval. SENTENSA products offer advanced text retrieval solutions for large databases that will make your searches for key information fast and effective. You can index on one platform and query on another.
ProteomeCommons.org IO Framework is a proper Java framework for handling spectra and peak lists. The framework can read and write to a number of different spectra and peak list formats, and it provides a simple, intuitive Java object model for working with spectra or peak lists. All classes support two methods of handling peak list and spectrum data: in-memory or stream. The goal of this framework is to support all the popular MS and MSMS data formats, and to eliminate any time or effort involved in figuring out how to read and write peak list or spectrum files.