TXR is a new data munging language. TXR's special pattern language provides template-based matching of entire documents or large sections of documents. It also contains a language for functional and imperative programming. It is written in C and takes the form of a utility that is portable to Unix-like platforms and Windows.
Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
The SeaMonkey project is a community effort to develop an all-in-one Internet application suite. It contains an Internet browser, email and newsgroup client with an included Web feed reader, HTML editor, IRC chat, and Web development tools, and is sure to appeal to advanced users, Web developers, and corporate users. It uses much of the Mozilla source code powering such successful siblings as Firefox, Thunderbird, Camino, Sunbird, and Miro.
TEA is a powerful and easy-to-use Qt4-based editor with many useful features for HTML, Docbook, and LaTeX editing. It features a small footprint, a tabbed layout engine, support for multiple encodings, code snippets, templates, customizable hotkeys, an "open at cursor" function for HTML files and images, miscellaneous HTML tools, preview in external browser, string manipulation functions, Morse-code tools, bookmarks, syntax highlighting, and more.
Highlight is a universal converter from source code to HTML, XHTML, RTF, TeX, LaTeX, SVG, BBCode, and terminal escape sequences. (X)HTML and SVG output are formatted by Cascading Style Sheets. It supports more than 170 programming languages, and includes 80 highlighting color themes. The configuration files are Lua scripts with plug-in support. The converter includes some features to provide a consistent layout of the output code.
SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), presentations (ODP) and graphics (ODG)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), iWork formats (PAGES, NUMBERS, KEYNOTE), OpenDocument Flat XML formats (FODP, FODS, FODT), Portable Document Format (PDF), Email files (EML), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.