Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. Ganglia is currently in use on over 500 clusters around the world and has scaled to handle clusters with 2000 nodes.
Heartbeat is a full-function high-availability system for Linux and other POSIX-like OSes. It monitors services and restarts them on errors. When managing a cluster (more than 1 machine), it will also monitor the members of the cluster and begin recovery of lost services in less than a second. It runs over serial ports and UDP broadcast/multicast, as well as OpenAIS multicast. It is easily adapted to different interconnect media and protocols. When used in a cluster, it can operate using shared disks, data replication, or no data sharing. Versions starting with 2.0 are comparable to any commercial HA package, providing resource monitoring, larger clusters, and detailed dependency information.
openMosix is a a set of extensions to the standard Linux kernel allowing you to build a cluster of out of off-the-shelf PC hardware. openMosix scales perfectly up to thousands of nodes. You do not need to modify your applications to benefit from your cluster (unlike PVM, MPI, Linda, etc.). Processes in openMosix migrate transparently between nodes and the cluster will always auto-balance.
redWall is a bootable CD-ROM firewall which focuses on Web-based reporting of the firewall's status. It includes Snort, snortsam, dansguardian, and support for fwbuilder, squidguard, reporting (using BASE/sarg/ntop/webfwlog), VPN (Openswan/PoPToP/Openvpn), Spam Filtering (spamassassin, dcc, razor2, clamav, amavis-new, dspam and maia mailguard), and mail-based, alerting. Configuration data are stored on a floppy or USB disk.
BixData is a cluster management tool that includes monitoring and system administration features. It monitors services (HTTP, ping, POP3, SMTP), performance, and processes. It has a management console for VMWare and Xen that supports multiple virtual machine hosts and guests. It can create critical notifications and send email alerts for any system event (HTTP, ping, CPU, memory, SMART diagnostics, VM stats). A graphical desktop supports real-time dynamic graphs. The runtime agents and server components are lightweight and easy to set up and run.
changedfiles is a framework for filesystem replication, security monitoring, and/or automatic file transformations--essentially any application where you'd poll files or directories and either do something to them or send them somewhere else (or both). The difference is that the kernel tells you when they change instead of you having to poll. It's an easy real time FTP push mirror to one or multiple sites. It's also a full fledged MySQL client, so you can do realtime database operations (for example, batch imports). It consists of two parts: a kernel module (works with Linux kernel version 2.4) which reports to a device whenever a file on the filesystem changes, and a daemon which runs in user space and can be configured to do almost any action when a change to a file matching the one of the patterns it looks for is reported. The kernel module is SMP safe and has been tested on Intel, PowerPC, and Alpha.
AutoNOC is a high performance, production integrated, peer-to-peer network operations management platform for Windows and Linux. It provides real-time historical analysis, root cause, fault detection, reporting, alerts and alarms, and no-nonsense correlation. It is an interoperable vendor independent solution with built-in support for Microsoft, Cisco, Linux, IBM, and other major technologies. Additionally it offers many novel capabilities, including end user personalization, easy scalability, compressed historical databases, infinite histories, event archiving (it works as a syslog server), and multi-language support.
Jagger is a Java application monitoring tool that uses JMX technology to aggregate, archive, and visualize monitoring data for larger computer clusters, giving developers and administrators both a succinct and comprehensive view into their systems. Normal JMX consoles cannot do this due to information overflow.
Performance Co-Pilot (PCP) is a framework and set of services for supporting system-level performance monitoring and performance management. It provides a unifying abstraction for all of the interesting performance data in a system, and allows client applications to easily retrieve and process any subset of that data using a single API. A client-server architecture allows multiple clients to monitor the same host, and a single client to monitor multiple hosts. Archive logging and replay are integrated so that a client application can use the same API to process real-time data from a host or historical data from an archive.