RSS 16 projects tagged "NLP"

Download Website Updated 15 Oct 2013 eoconv

Screenshot
Pop 57.52
Vit 11.60

eoconv is a tool that converts text files to and from various Esperanto transliteration schemes (e.g. h-notation, x-notation) and text encodings, including Unicode, ISO-8859-3, HTML, LaTeX, and ASCII.

Download Website Updated 04 Mar 2010 Acopost

Screenshot
Pop 24.68
Vit 1.00

ACOPOST is a set of freely available POS taggers modeled after well-known techniques. The programs are written in C (aiming for extreme portability and code correctness/safety) and run under various Unix flavors (and probably even under Windows). ACOPOST currently consists of four taggers that are based on different frameworks: Maximum Entropy Tagger (MET), Trigram Tagger (T3, based on Hidden Markov Models), Error-driven Transformation-based Tagger (TBT or Brill Tagger), and Example-based tagger (ET).

No download No website Updated 15 Oct 2010 Language Detection Library for Java

Screenshot
Pop 55.75
Vit 35.76

The Language Detection Library for Java is a Java library to detect the natural languages in which texts are written. This task is also known as "language identification", "language guessing", and "language recognition". It has over 99% precision for more than 40 languages. The supported languages are Afrikaans, Arabic, Bulgarian, Bengali, Czech, German, Greek, English, Spanish, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Macedonian, Malayalam, Marathi, Nepali, Dutch, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Somali, Albanian, Swedish, Swahili, Tamil, Telugu, Thai, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, and Simplified/Traditional Chinese.

No download No website Updated 11 Dec 2010 AlchemyAPI Android SDK

Screenshot
Pop 82.43
Vit 34.96

The AlchemyAPI Android SDK enables real-time semantic analysis of text, HTML, or Internet-hosted Web page content. The SDK provides mechanisms to extract Concepts, Named Entities, Keywords and Tags, Categories, and clean HTML into text, and even detects languages. It can analyze text in eight different languages: English, French, German, Italian, Portuguese, Russian, Spanish, and Swedish. Example code and a demo application are included to help get you started.

No download Website Updated 15 Dec 2011 foma

Screenshot
Pop 54.41
Vit 1.00

foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes. It comes with an xfst-compatible interface and regular expression language. The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, and boolean operations. More advanced construction methods are also available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.

No download No website Updated 14 Feb 2014 TreeTagger for Java

Screenshot
Pop 163.52
Vit 7.49

TreeTagger for Java (TT4J) is a Java wrapper around the popular TreeTagger package by Helmut Schmid, a language independent part-of-speech tagger and lemmatizer. It was written with a focus on platform-independence and easy integration into applications.

No download Website Updated 16 Feb 2012 jWeb1T

Screenshot
Pop 36.72
Vit 1.00

jWeb1T is an Java tool for efficiently searching n-gram data in the Web 1T 5-gram corpus format. It is based on a binary search algorithm that finds the n-grams and returns their frequency counts in logarithmic time. As the corpus is stored in many files, a simple index is used to retrieve the files containing the n-grams.

Download No website Updated 14 Oct 2013 UBY

Screenshot
Pop 85.90
Vit 4.04

UBY is a large-scale unified lexical-semantic resource for natural language processing (NLP) based on the ISO standard Lexical Markup Framework (LMF).

No download No website Updated 23 Dec 2013 DKPro Core

Screenshot
Pop 58.51
Vit 2.34

DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released continuously. The components cover the whole range of NLP-related processing tasks. DKPro Core provides wrappers for such third-party tool as well as original NLP components. DKPro Core builds heavily on uimaFIT which allows for rapid and easy development of NLP processing pipelines.

Download No website Updated 14 Apr 2014 Infovore

Screenshot
Pop 616.15
Vit 68.15

Infovore is a map/reduce framework for processing large RDF data sets such as Freebase and DBpedia. It is based on Hadoop.

Screenshot

Project Spotlight

CloverETL

A Java framework for building data integration and ETL applications.

Screenshot

Project Spotlight

Caché Monitor

A dev utility for the InterSystems database Caché.