Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters. Ganglia is currently in use on over 500 clusters around the world and has scaled to handle clusters with 2000 nodes.
ICPLD (Internet Connection Performance Logging Daemon) is a connection monitor that sends ICMP requests to IP addresses of your choice and monitors if your machine has a working network connection. It logs failed attempts to reach the hosts, and will stamp a log as soon as a reply is received. It keeps track of when and for how long the connection was unavailable and records both total down time and each occasion of interrupted connection. It supports IPv6 and can execute a command whenever a connection goes up or down, which is useful for alerting users.
RRDutil is a a tool to collect statistics (typically every 5 minutes) from multiple servers, store the values in RRD databases (using RRDtool), and plot out pretty graphs to a Web server on demand. The graph types shown include CPU, memory, disk (space and I/O), Apache, MySQL queries and query types, email, Web hits, and more.
Osiris is a host integrity management system that can be used to monitor changes to a network of hosts over time and report those changes back to the administrator(s). Osiris takes periodic snapshots of the filesystem, configurations, and logs, and stores them on a central management host. When changes are detected, Osiris will log these events and optionally send email to an administrator. Osiris also has preliminary support for monitoring other system data, including user lists, file system details, kernel modules, and network interface configurations.