Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
DKPro WSD provides UIMA components which encapsulate corpus readers, linguistic annotators, lexical semantic resources, WSD algorithms, and evaluation and reporting tools. You configure the components, or write new ones, and arrange them into a data processing pipeline. DKPro WSD is modular and flexible. Components which provide the same functionality can be freely swapped. You can easily run the same algorithm on different data sets, or test several different algorithms on the same data set.
DKPro Lab is a lightweight framework for parameter sweeping experiments. It allows the user to set up experiments consisting of multiple interdependent tasks in a declarative manner with minimal overhead. Parameters are injected into tasks using via annotated class fields. Data produced by a task for any particular parameter configuration is stored and re-used whenever possible to avoid needless recalculation of results. Reports can be attached to each task to post-process the experimental results and present them in a convenient manner (e.g., as tables or charts).
JWKTL (Java-based Wiktionary Library) is an application programming interface for the free multilingual online dictionary Wiktionary. Wiktionary is collaboratively constructed by volunteers and continually growing. JWKTL enables efficient and structured access to the information encoded in the English, German, and Russian Wiktionary language editions, including sense definitions, part of speech tags, etymology, example sentences, translations, semantic relations, and many other lexical information types.
DKPro Similarity is a framework for text similarity. Its goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and phonetic measures. In order to promote the reproducibility of experimental results and to provide reliable, permanent experimental conditions for future studies, DKPro Similarity also comes with a set of full-featured experimental setups which can be run out-of-the-box and used for future systems to built upon.
Apache UIMA DUCC (Distributed UIMA Cluster Computing) is a cluster management system providing tooling, management, and scheduling facilities that automate the scale-out of applications written using the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS extends UIMA and provides a scale-out mechanism for distributing UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC extends UIMA-AS by defining a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.