RSS 2 projects tagged "tokenizer"

No download No website Updated 07 Jul 2012 ULS

Screenshot
Pop 20.76
Vit 25.54

ULS is an library for general purpose lexical analysis, with support for UTF-8. It comes with C/C++ libraries and a couple of tools for Linux and Windows. It's a intuitive, practical, flexible, and optimized tokenizer. ULS can instantiate multiple objects for lexical analyses. The objects can process multiple (nested) inputs of different languages. The language specification is specified in configuration file suffixed by *.ulc. ULS can tokenize the input file, which encoded by UTF-8. The input files may contain the words in a localized language as identifiers. ULS can stream the tokens from many input files to another output (files).The stream can be stored in a *.uls file and replayed from it whenever necessary.

No download No website Updated 23 Dec 2013 DKPro Core

Screenshot
Pop 57.06
Vit 2.33

DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released continuously. The components cover the whole range of NLP-related processing tasks. DKPro Core provides wrappers for such third-party tool as well as original NLP components. DKPro Core builds heavily on uimaFIT which allows for rapid and easy development of NLP processing pipelines.

Screenshot

Project Spotlight

UDP IPTV to RTSP proxy

Access to UDP multicast streams via RTSP/RTP unicast protocols.

Screenshot

Project Spotlight

Tiny Tiny RSS

A Web-based AJAX news feed aggregator.