ClearParse is a flexible engine that can be used for any parsing task including interpreting or compiling programming languages, analyzing or converting data files, processing command line parameters and user input, implementing markup languages and scripts, natural language processing (NLP), and more.
SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.
Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite are supported.
uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.
CodeWorker is a versatile parsing tool and a universal source code generator. It interprets a scripting language for producing reusable, tailor-made, evolving, and reliable IT systems with a high level of automation. The file formats to parse are described in an extended-BNF syntax. Template-based scripts drive the writing of patterns for generating code or text. The code generation knows how to preserve protected areas with hand-typed code and provides code expansion, source-to-source translation, and program transformation. It provides a native translation of CodeWorker's scripts in C++.