GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
s6-portable-utils is a set of tiny general Unix utilities, often performing well-known tasks such as cut and grep, but optimized for simplicity and small size. They were designed for embedded systems and other constrained environments, but work everywhere. Other sets of small utilities are usually system-specific; for instance, the (otherwise excellent) BusyBox project only works on Linux.
Xidel is a command line tool to download Web pages and extract data from them. It can download files over HTTP/S connections, follow redirections, links, or extracted values, and process local files. The data can be extracted using XPath 2.0, XQuery 1.0, and JSONiq expressions, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain text/XML/HTML/JSON, or assigned to variables to be used in other extract expressions or be exported to the shell. There is also an online CGI service for testing.
Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
The Version numbers read/write command line tool finds and updates version number tokens in source code or packaging files. It takes filename context and syntax into account for reading and writing to files. It provides an easy and readable argument format, and can increase patch versions or bump build numbers. It's a minor simplification for manual package building.
tblutils is a collection of several utilities for working with tabular text files: data written in plain text, with one row per line and columns separated by a common character (usually TAB or semicolon). It complements the usual Unix tools like cut and paste by providing enhanced versions that support column labels through-out, so that you can extract columns by name (tblcut), filter data using a mathematical expression (tblfilter), re-order columns without caring about the column index (tblcsort), join multiple files on a common index without having to pre-sort them (tblmerge), and much more.
Shasplit takes a large data block, splits it into smaller parts, and puts those parts into an SHA-based content-addressed store. Reassembling those parts is a trivial "cat" invocation. Repeating parts (e.g., from previous split operations) are stored only once, which allows efficient incremental backups of whole LVM snapshots via Rsync. Shasplit shows its strengths on encrypted block devices, but might be useful for non-encrypted data, too.