FastFlow is a pattern-based programming framework targeting streaming applications. It implements pipeline, farm, divide and conquer, and their composition, as well as generic streaming networks. It is specifically designed to support the development and the seamless porting of existing applications on multi-core, GPGPUs, and clusters of them. The layered template-based C++ design ensures flexibility and extendibility. Its lock-free/fence-free run-time support minimizes cache invalidation traffic and enforces the development of high-performance (high-throughput, low-latency) scalable applications. It has been proven comparable or faster than TBB, OpenMP, and Cilk on several micro-benchmarcks and real-world applications, especially when dealing with fine-grained parallelism and high-throughput applications.
Portable Computing Language (pocl) aims to become an efficient implementation of the OpenCL standard. In addition to producing an easily-portable Open Source implementation, another major goal of the project is improving performance portability of OpenCL programs with compiler optimizations, reducing the need for target-dependent manual optimizations. At the core of pocl is a set of LLVM passes used to statically parallelize multiple work items with the kernel compiler, even in the presence of work group barriers. This enables parallelization of the fine-grained static concurrency in the work groups in multiple ways (SIMD, VLIW, superscalar, etc.). The code base is modularized to allow easy adding of new "device drivers" in the host-device layer. A generic multithreaded "target driver" is included. It allows running OpenCL applications on a host which supports the pthread library with multithreading at the work group granularity.
dispy is a Python framework for parallel execution of computations by distributing them across multiple processors in a single machine (SMP), or among many machines in a cluster or grid. The computations can be standalone programs or Python functions. dispy is well suited for the data parallel (SIMD) paradigm where a computation is evaluated with different (large) datasets independently (similar to Hadoop, MapReduce, Parallel Python). dispy features include automatic distribution of dependencies (files, Python functions, classes, modules), client-side and server-side fault recovery, scheduling of computations to specific nodes, encryption for security, sharing of computation resources if desired, and more.
POP-C++ is a comprehensive object-oriented system for developing applications in large distributed computing infrastructures such as Grid, P2P or Clouds. It consists of a programming suite (language, compiler) and a run-time system for running POP-C++ applications. The POP-C++ language is a minimal extension of C++ that implements the parallel object model with the integration of resource requirements into distributed objects. This extension is as close as possible to standard C++ so that programmers can easily learn POP-C++ and so that existing C++ libraries can be parallelized using POP-C++ without too much effort. The POP-C++ run-time is an object-oriented open design that aims at integrating different distributed computing tool kits into an infrastructure for executing requirement-driven object-oriented applications. It uses objects to serve objects: the system provides services for executing remote objects.
ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. It supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, and XML). It will run on most multiprocessor systems which have MPI installed, and can run over a wide variety of interconnects, including infiniband, quadrics, and ethernet. It is designed to run a large number of queries against either large or small databases. It parallelizes the BLAST calculations by dynamically scheduling them across processors using a fault-resilient scheme.
BitDew is a programmable environment for the management and distribution of data for grid, desktop grid, and cloud systems. It can easily be integrated into large scale computational systems such as XtremWeb, BOINC, Hadoop, Condor, Glite, Unicore, OpenStack, and Eucalyptus. It provides key P2P, grid, and cloud technologies (DHT, BitTorrent, Amazon S3, DropBox) and high level programming interfaces with a simple API for creating, accessing, storing, and moving data with ease, even in highly dynamic and volatile environments.
Charm++ is a portable adaptive runtime system for parallel applications. Application developers create an object-based decomposition of the problem of interest, and the runtime system manages issues of communication, mapping, load balancing, fault tolerance, and more. Sequential code implementing the methods of these parallel objects is written in C++. Calls to libraries in C++, C, and Fortran are common and straightforward. Charm++ is portable across individual workstations, clusters, accelerators (Cell SPEs and GPUs), and supercomputers such as those sold by IBM (Blue Gene, POWER) and Cray (XT3/4/5/6). Applications based on Charm++ are used on at least 5 of the 20 most powerful computers in the world.