Algraeph is a tool for manual alignment of linguistic graphs, such as phrase structure trees or dependency structures, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these alignments. Graphs are read from one or more graphbanks (or treebanks) in the GraphML or Alpino formats. Alignment relations are user-defined and are stored in a simple XML format, which can be used for further processing. The resulting parallel graph corpus is a useful data set for many tasks in computational linguistics and natural language processing.
Apertium is a machine translation platform, initially aimed at related-language pairs, but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides a language-independent machine translation engine, tools to manage the linguistic data necessary to build a machine translation system for a given language pair, and linguistic data for a growing number of language pairs.
Ciao is a complete Prolog system subsuming ISO-Prolog with a novel modular design which allows both restricting and extending the language. Ciao extensions currently include feature terms (records), higher-order, functions, constraints, objects, persistent predicates, a good base for distributed execution (agents), and concurrency. Libraries also support WWW programming, sockets, and external interfaces (C, Java, TCL/Tk, relational databases, etc.). An Emacs-based environment, a stand-alone compiler, and a toplevel shell are also provided.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
FramerD is a semi-structured object database integrated with a Scheme-based scripting language which supports multi-lingual programming (with pervasive Unicode), a stable module system for programming in the large, distributed applications (via an extensible RPC protocol), non-deterministic (PROLOG-like) evaluation for search and set operations, multi-threaded program execution, extensive tools for text and language analysis, built-in HTML/XML/MIME parsers, and intuitive (CGI- and FastCGI-based) Web scripting. The built-in object database robustly supports millions of objects and indexed access to those objects, both through disk files and networked servers.
SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging experience from more than 20 years of information retrieval. SENTENSA products offer advanced text retrieval solutions for large databases that will make your searches for key information fast and effective. You can index on one platform and query on another.
The XEVM is an XML processing engine. It's a multi-threaded, Pub/Sub environment for dynamic programming on an event-driven state machine with TCP communications, tight fault free memory management, powerful set algebra, and a magical database. It is 100% C++ (25,000 LOC), with a thin porting layer; there are implementations for POSIX (Mac/Linux) and Win32. The XEVM is for processing XEPL (the Xepl Engine Programming Language).
Isobel is a framework to build complex information retrieval and analysis systems. Isobel can be functionally divided in two subsytems, Isobel Gatherer (the crawling and filtering subsystem) and Isobel Analyzer (the analysis subsystem). The two subsytems can also be used separately. Isobel Gatherer offers ready-to-use services like content fetching, scheduling, document format conversion, Hyperlink graph storage and analysis, content storage and indexing. A programmer may easily add new services. Isobel Analyzer uses the IBM UIMA architecture to reuse the analysis components developed for this architecture.