GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
RunJRun is a very simple system for doing parallel processing in Java, using Amazon Elastic Compute Cloud (EC2) instances as compute nodes. The basic compute unit is a Runnable, Serializable Java object, a "task" for short. A user submits a list of such tasks to RunJRun. Each task then has its run() method invoked on an EC2 instance. To use it, you'll need an Amazon Machine Image (AMI) that has the RunJRun server-side software installed; several such AMIs are available.
HPCC (High Performance Computing Cluster) stores and processes large quantities of data, processing billions of records per second using massive parallel processing technology. Large amounts of data across disparate data sources can be accessed, analyzed, and manipulated in fractions of seconds. HPCC functions as both a processing and a distributed data storage environment capable of analyzing terabytes of information.
FastFlow is a pattern-based programming framework targeting streaming applications. It implements pipeline, farm, divide and conquer, and their composition, as well as generic streaming networks. It is specifically designed to support the development and the seamless porting of existing applications on multi-core, GPGPUs, and clusters of them. The layered template-based C++ design ensures flexibility and extendibility. Its lock-free/fence-free run-time support minimizes cache invalidation traffic and enforces the development of high-performance (high-throughput, low-latency) scalable applications. It has been proven comparable or faster than TBB, OpenMP, and Cilk on several micro-benchmarcks and real-world applications, especially when dealing with fine-grained parallelism and high-throughput applications.