Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. Ganglia is currently in use on over 500 clusters around the world and has scaled to handle clusters with 2000 nodes.
The WebReboot Plugin for Nagios is a suite of commands that can be used within Nagios to monitor a server and take corrective action if necessary via the WebReboot line of products. For example, the plugin can be used to alert you if a host is powered down, versus simply not responding to network requests. Likewise, it can be used to reboot a server if a host fails to respond to ping, or to shut down a server when a critical temperature threshold is exceeded. The commands can be mixed-and-matched with all existing Nagios commands, maximizing total network coverage.