2137 projects tagged "Text Processing"
4Suite is a Python-based toolkit for XML and RDF application development. It features a library of integrated tools for XML processing, implementing open technologies such as DOM, RDF, XSLT, XInclude, XPointer, XLink, XPath, XUpdate, RELAX NG, and XML/SGML Catalogs. Layered upon this is an XML and RDF data repository and server, which supports multiple methods of data access, query, indexing, transformation, rich linking, and rule processing, and provides the data infrastructure of a full database system, including transactions, concurrency, access control, and management tools. It also supports HTTP, RPC, SOAP, and FTP, plus APIs in Python and XSLT.
The ALICE software implements AIML (Artificial Intelligence Markup Language), a non-standard evolving markup language for creating chat robots. The primary design feature of AIML is minimalism. Compared with other chat robot languages, AIML is perhaps the simplest. The pattern matching language is very simple, for example permitting only one wild-card ('*') match character per pattern. AIML is an XML language, implying that it obeys certain grammatical meta-rules. The choice of XML syntax permits integration with other tools such as XML editors. Another motivation for XML is its familiar look and feel, especially to people with HTML experience.
a2ps is an Any to PostScript filter. Of course it processes plain text files, but also pretty prints quite a few popular languages (66). Moreover it has the ability to delegate the processing of some files to other filters (such as groff, texi2dvi, dvips, gzip etc.), which allows a uniform treatment (n-up, page selection etc.) of heterogeneous files.
AFT (Almost Free Text) is a document preparation system. It is mostly free form, meaning that there is little intrusive markup; AFT source documents look a lot like plain old ASCII text. It has a few rules for structuring your document, more to do with formatting your text than embedding lots of commands, and it produces all types of output (HTML, XHTML, LaTeX, roll-your-own XML, etc.). All that needs to be done is to edit a rule file. You can even customize your own rule files for specialized output.
ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++, Java, or Sather actions. It is similar to the popular compiler generator YACC, however ANTLR is much more powerful and easy to use. ANTLR-produced parsers are not only highly efficient, but are both human-readable and human-debuggable (especially with the interactive ParseView debugging tool). ANTLR can generate parsers, lexers, and tree-parsers in either C++, Java, or Sather. ANTLR is currently written in Java.
A tool which splits a single WAV file into multiple wav files based on silence.