jmemcached is a fast network available cache daemon. It is protocol-compatible with memcached, but written in Java and suitable for applications with portability concerns, where Java is the preferred solution, or for using the memcached protocol in embedded applications with alternate storage engines. Existing clients for memcache work unmodified. It can run as a standalone daemon or be embedded inside an existing Java application.
K-tree provides a scalable approach to clustering by combining the B+-tree and k-means algorithms. Clustering can be used to solve problems in signal processing, machine learning, and other contexts. It has recently been used to solve document clustering problems on the Wikipedia collection.
Lucie is a cluster installation and configuration tool. It enables parallel network installation of large numbers of nodes from one single administration server. The Lucie installer performs HDD partitioning and installations of the Linux kernel and required software packages. The Lucie configurator then generates system and software configurations. Lucie is designed to be scalable and efficient, so a complete Linux cluster can be built from scratch in a short amount of time. Moreover, the whole installation process is designed to be fully automated.
Crash Dummy is a Java Web application to help IT professionals set up Java application server environments. It has several features to help make this easier, including simulating failures and diagnostics. Crash Dummy is particularly helpful for setting up complex clustered environments and monitoring infrastructure.
BorderFlow implements a general-purpose graph clustering algorithm. It maximizes the inner to outer flow ratio from the border of each cluster to the rest of the graph. The main advantage of the algorithm is that it does not need parametrization to compute results of high accuracy.
Hados stores files in a cluster of servers. Its goal is to handle high availability by storing copies of the same file on several nodes. It provides RESTFUL APIs to easily store, check, or retrieve files. Using the cluster APIs, you can retrieve files from whichever node hosts them. To avoid any single point of failure, it is possible to apply a request to any node of the cluster; there is no master node.
MLPACK is a C++ machine learning library with an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. It contains algorithms such as k-means, Gaussian mixture models, hidden Markov models, density estimation trees, kernel PCA, locality-sensitive hashing, sparse coding, linear regression and least-angle regression.