htmLawed is a PHP script that makes input text more secure, HTML standards-compliant, and suitable in general from the viewpoint of a Web-page administrator, for use in the body of HTML 4 or XHTML 1 or 1.1 documents. It is a customizable HTML/XHTML filter, processor, purifier, and sanitizer. It can ensure that HTML tags are balanced and properly nested tags, neutralize code that may be used for cross-site scripting (XSS) attacks, and limit the allowed HTML elements, tags, attributes, or URL protocols.
TXR is a new data munging language. TXR's special pattern language provides template-based matching of entire documents or large sections of documents. It also contains a language for functional and imperative programming. It is written in C and takes the form of a utility that is portable to Unix-like platforms and Windows.
JSX serializes Java objects to XML. You can persist objects, evolve them, and send them over the network and between applications. Your object data becomes human-readable and human-writable. You can test it, search it, profile it, audit it, and edit it with ordinary text and XML tools. JSX handles all POJOs and also all classes that require Java's own object serialization.
Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
ExactScan is a versatile document capture application for home offices and workgroups. It is designed from the ground up for high-speed document scanners and can easily handle hundreds of images per minute, including duplex scans. Included functionality reaches from managing, sorting, and editing singles pages to writing multi- as well as single-page PDF files including JPEG compression and TIFF, JPEG, JPEG2000, and PNG bitmap files. ExactScan allows performing state of the art image processing including automatic cropping, deskewing, dynamic thresholding for perfect black and white documents, and descreening print rasters.
AutoLaTeX is a tool for managing small to large LaTeX documents. It detects which files which are used to build the document (included TeX files, BibTeX, figures, etc.), and launches the various different tools (latex, bibtex, makeindex) when the sources files have been changed. It provides translation rules which automatically generate figures in EPS, PNG, or PDF formats from different types of sources (dia, xfig, svg, astah, source code, etc.) AutoLaTeX also provides graphical user interfaces, a plugin for the editors Gedit and Sublime Text, and a standalone Gtk application.