DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released continuously. The components cover the whole range of NLP-related processing tasks. DKPro Core provides wrappers for such third-party tool as well as original NLP components. DKPro Core builds heavily on uimaFIT which allows for rapid and easy development of NLP processing pipelines.
DKPro Similarity is a framework for text similarity. Its goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and phonetic measures. In order to promote the reproducibility of experimental results and to provide reliable, permanent experimental conditions for future studies, DKPro Similarity also comes with a set of full-featured experimental setups which can be run out-of-the-box and used for future systems to built upon.