Libfairydust is a small wrapper library intended for use with GPU clusters that 'hijacks' CUDA and OpenCL calls. It can be used to 're-route' calls to a certain GPU, so a process requesting GPU#0 might end up running on GPU#4 without knowing (or caring) about it. This works completely transparently and does not need any sort of 'cooperation' from the application, changes to code, or relinking.
yaSSL is a C++ based SSL library for embedded and RTOS environments, designed for individuals who prefer to use the C++ language. For a C-based solution, please see CyaSSL. yaSSL supports the industry standards up to TLS 1.2, and also includes an OpenSSL compatibility interface.
FastFlow is a pattern-based programming framework targeting streaming applications. It implements pipeline, farm, divide and conquer, and their composition, as well as generic streaming networks. It is specifically designed to support the development and the seamless porting of existing applications on multi-core, GPGPUs, and clusters of them. The layered template-based C++ design ensures flexibility and extendibility. Its lock-free/fence-free run-time support minimizes cache invalidation traffic and enforces the development of high-performance (high-throughput, low-latency) scalable applications. It has been proven comparable or faster than TBB, OpenMP, and Cilk on several micro-benchmarcks and real-world applications, especially when dealing with fine-grained parallelism and high-throughput applications.
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features tight integration with numpy, transparent use of a GPU, efficient symbolic differentiation, speed and stability optimizations, dynamic C code generation, and extensive unit-testing and self-verification. Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal).
Moscrack is a WPA cracker for use on clusters. It supports MOSIX, SSH, and RSH connectivity and works by reading a word list from STDIN or a file, breaking it into chunks, and passing those chunks off to separate processes that run in parallel. The parallel processes are then executed on different nodes in your cluster. All results are checked and recorded on your master node. Logging and error handling are taken care of. It is capable of running reliably for long periods of time, without the risk of losing data or having to restart. Moscrack uses aircrack-ng by default. Pyrit for WPA cracking and Dehasher for Unix password hashes are supported via plugins.
Pyrit takes a step ahead in attacking WPA-PSK and WPA2-PSK, the protocols that protect today's public WiFi-airspace. Pyrit's implementation allows you to create massive databases, pre-computing part of the WPA/WPA2-PSK authentication phase in a space-time-tradeoff. The performance gain for real-world-attacks is in the range of three orders of magnitude, which urges for re-consideration of the protocol's security. It exploits the computational power of multiple cores and other platforms through ATI-Stream, Nvidia CUDA, OpenCL, and VIA Padlock. It is a powerful attack against one of the world's most used security-protocols.
The Jacket platform consists of a runtime and language processing system that automatically optimizes existing applications or new algorithms for GPU computing. Jacket currently supports the MATLAB language as a frontend to the platform. Jacket's language processing system automatically translates MATLAB code to high performance primitives required for best utilization of Nvidia, CUDA capable GPUs. Working in concert with the translation system, Jacket's runtime system optimizes memory transfers, compiles code on-the-fly for realtime tuned performance, and launches GPU kernels efficiently for maximal performance. All GPU-specific programming details are handled by Jacket, freeing the user to focus on science, engineering, and analytics.
Charm++ is a portable adaptive runtime system for parallel applications. Application developers create an object-based decomposition of the problem of interest, and the runtime system manages issues of communication, mapping, load balancing, fault tolerance, and more. Sequential code implementing the methods of these parallel objects is written in C++. Calls to libraries in C++, C, and Fortran are common and straightforward. Charm++ is portable across individual workstations, clusters, accelerators (Cell SPEs and GPUs), and supercomputers such as those sold by IBM (Blue Gene, POWER) and Cray (XT3/4/5/6). Applications based on Charm++ are used on at least 5 of the 20 most powerful computers in the world.