958 projects tagged "Text Processing"
a2ps is an Any to PostScript filter. Of course it processes plain text files, but also pretty prints quite a few popular languages (66). Moreover it has the ability to delegate the processing of some files to other filters (such as groff, texi2dvi, dvips, gzip etc.), which allows a uniform treatment (n-up, page selection etc.) of heterogeneous files.
ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++, Java, or Sather actions. It is similar to the popular compiler generator YACC, however ANTLR is much more powerful and easy to use. ANTLR-produced parsers are not only highly efficient, but are both human-readable and human-debuggable (especially with the interactive ParseView debugging tool). ANTLR can generate parsers, lexers, and tree-parsers in either C++, Java, or Sather. ANTLR is currently written in Java.
GNU Aspell is a spell checker designed to eventually replace Ispell. It can either be used as a library or as an independent spell checker. Its main feature is that it does a superior job of suggesting possible replacements for a misspelled word than just about any other spell checker out there for the English language. Unlike Ispell, Aspell can also easily check documents in UTF-8 without having to use a special dictionary. Aspell will also do its best to respect the current locale setting. Other advantages over Ispell include support for using multiple dictionaries at once and intelligently handling personal dictionaries when more than one Aspell process is open at once.
AutoConvert is an intelligent Chinese Encoding converter. It uses built-in functions to judge the type of the input file's Chinese Encoding (such as GB/Big5/HZ), then converts the input file to any type of Chinese Encoding you want. You can use autoconvert to automatically convert incoming e-mail messages. It can also optionally handle the UNI/UTF7/UTF8 encoding.
bibelot.pl is a Perl script that formats and converts text documents into compressed PalmDoc .pdb files, suitable for reading on a Palm or Handspring device with any standard PalmDoc reader (AportisDoc, CSpotRun, RichReader, TealDoc, etc.). It was written primarily for formatting etexts from the Project Gutenberg, but works well for most text files.