3 projects tagged "text mining"
Pymur provides Python bindings to the C++ based Lemur Toolkit. The Lemur Toolkit is an open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.
xMarkup is a command line and GUI utility for multipurpose processing of a set of text files. It can be used to generate or edit the navigational cross-references within a set of HTML documents, analyze and convert the structure or content of SGML, XML, HTML, or text documents, split or merge text files with specified rules, analyze and extract data, generate scripts, and more. xMarkup supports a built-in procedural language which may be used to describe rules of the processing. This language is a simple dialect of the Icon programming language.