1820 projects tagged "Text Processing"
Glimpse is a very powerful indexing and querying system that allows you to search through all your files very quickly. It can be used by individuals for their personal file systems as well as by organizations for large data collections. Glimpse is the default search engine in Harvest.
GNU m4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible, although it has some extensions (for example, handling more than 9 positional parameters to macros). GNU m4 also has built-in functions for including files, running shell commands, doing arithmetic, etc. Autoconf needs GNU m4 for generating `configure' scripts, but not for running them.
GNU TeXmacs is a free wysiwyw (what you see is what you want) editing platform with special features for scientists. The software aims to provide a unified and user friendly framework for editing structured documents with different types of content: text, mathematics, graphics, interactive content. TeXmacs can also be used as an interface to many external systems for computer algebra, numerical analysis, and statistics. New presentation styles can be written by the user and new features can be added to the editor using Scheme.
GPP is a general-purpose preprocessor with customizable syntax, suitable for a wide range of preprocessing tasks. Its independence from any programming language makes it much more versatile than cpp, while its syntax is lighter and more flexible than that of m4. The syntax is fully customizable, which makes it possible to process text files, HTML, or source code equally efficiently in a variety of languages.
The Groff package contains the traditional UN*X text formatting tools troff, nroff, tbl, eqn, and pic. These utilities, together with the man package, are essential for displaying the online manual pages. Output can be produced in a number of formats including plain ASCII and PostScript. All the standard macro packages are supported. A number of other utilities are also included together with several fonts.
Grok is a library of Java components for performing various natural language tasks. These include several preprocessing tasks, chart parsing, a large categorial grammar for English (induced from the Penn treebank), and some knowledge representation components (basic coreference, salience tracking, etc.). The library also has a companion kit which provides a GUI interface to the components, several of which are implementations of interfaces in the Quipu OpenNLP API.
The Guava tools are a set of Perl scripts for HTML pre-processing. You can create multi-page documents with contents tables, or use templates to give a consistent look to a set of pages. All output is passed through the C preprocessor, so you can use directives such as #include, #define and #if. There are also built-in macros for producing dates, cross references, etc.