Apache UIMA DUCC (Distributed UIMA Cluster Computing) is a cluster management system providing tooling, management, and scheduling facilities that automate the scale-out of applications written using the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS extends UIMA and provides a scale-out mechanism for distributing UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC extends UIMA-AS by defining a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
wsrep-enabled MySQL (previously MySQL/Galera cluster) can use wsrep replication providers, such as Galera, to form a cluster. wsrep API is an abstract replication interface that supports global transaction ID, true multi-master capability, conflict detection and resolution, and parallel applying and is transparent to triggers, stored procedures, and functions by replicating only final transaction results. Only the InnoDB storage engine and DDL commands are supported by this patch.
Makeflow is a workflow engine for executing large complex applications on clusters, clouds, and grids. It can be used to drive several different distributed computing systems, including Condor, SGE, and the included Work Queue system. It does not require a distributed filesystem, so you can use it to harness whatever collection of machines you have available. It is typically used for scaling up data-intensive scientific applications to hundreds or thousands of cores.
Hados stores files in a cluster of servers. Its goal is to handle high availability by storing copies of the same file on several nodes. It provides RESTFUL APIs to easily store, check, or retrieve files. Using the cluster APIs, you can retrieve files from whichever node hosts them. To avoid any single point of failure, it is possible to apply a request to any node of the cluster; there is no master node.
Moscrack is a WPA cracker for use on clusters. It supports MOSIX, SSH, and RSH connectivity and works by reading a word list from STDIN or a file, breaking it into chunks, and passing those chunks off to separate processes that run in parallel. The parallel processes are then executed on different nodes in your cluster. All results are checked and recorded on your master node. Logging and error handling are taken care of. It is capable of running reliably for long periods of time, without the risk of losing data or having to restart. Moscrack uses aircrack-ng by default. Pyrit for WPA cracking and Dehasher for Unix password hashes are supported via plugins.
LavaFlow creates useful reports on the usage of high-performance computing clusters. It takes data from the batch scheduling system, monitoring, and other tooling, and creates reports which help administrators, managers, and end users better understand their cluster environment. The reports are modular, and new modules are easy to create using templates and Django's query set API. LavaFlow uses human-readable RESTful URLs, making it easy to automate and share links to reports.
JASocket is a lock-free, scalable, and robust server framework with no single point of failure. Servers are run on a cluster of nodes. Servers interact with other servers using mobile agents, which reduces the number of messages and thus reduces the overall system latency. Administration is handled via ssh.
JAConfig implements an eventually consistent distributed key/value database for managing a JASocket cluster. Also included are Quarum for tracking when a quorum of hosts is present, Ranker for determining which nodes are least loaded, ClusterManager for starting up other servers, and Kingmaker, which decides which node is to run ClusterManager. JAConfig is lock-free, actor-based, and has no single point of failure.
Strategico is an engine for running statistical analysis over groups of time series. It can manage one or more groups (projects) of time series: by default, you can get data from a database or CSV files, normalize them, and then save them inside the engine. The first statistical analysis implemented inside Strategico is the "Long Term Prediction": it automatically finds the best model that fits each time series. Some of the models implemented are mean, trend, linear, exponential smoothing, and Arima. Strategico is scalable: the statistical analysis over each time series (of a project) can be run separately and independently. It is suggested that you set up an HPC Cluster (High Performance Computing) and/or use a resource scheduler like slurm. It is developed with R, one of the most famous statistical languages.