Heartbeat is a full-function high-availability system for Linux and other POSIX-like OSes. It monitors services and restarts them on errors. When managing a cluster (more than 1 machine), it will also monitor the members of the cluster and begin recovery of lost services in less than a second. It runs over serial ports and UDP broadcast/multicast, as well as OpenAIS multicast. It is easily adapted to different interconnect media and protocols. When used in a cluster, it can operate using shared disks, data replication, or no data sharing. Versions starting with 2.0 are comparable to any commercial HA package, providing resource monitoring, larger clusters, and detailed dependency information.
The Assimilation Monitoring Project is a highly scalable discovery-driven monitoring system. It integrates continuous discovery of servers, services, service dependencies, switch connections, and lots of other things into the monitoring process. The discovery is "stealthy" and will never set off any network security alarms. Adding servers doesn't measurably increase monitoring load, and the system is expected to easily scale into the 100K server range. The discovery work is distributed among all the nanoprobes (agents), which run scripts that spit out JSON. The central system (CMA) stores these strings and runs optional plugins to create graph nodes.