GroundWork Monitor Community Edition can give you insight into your computing infrastructure, allowing you to see the current and historical states of all your computers: servers, desktops, and laptops, all of your network devices, all of your services (like TCP/IP and Web services), and all of your applications (like mail servers and database apps). You can choose to be alerted when something goes awry via pager, SMS, email, or phone, and even set up automatic restarts or fall-overs.
check_openmanage is a plugin for Nagios that checks the hardware health of Dell servers running OpenManage Server Administrator (OMSA). The plugin can be used remotely with SNMP or locally with NRPE, check_by_ssh, or similar. It checks the health of the storage subsystem, power supplies, memory modules, temperature probes, etc., and gives an alert if any of the components are faulty or operate outside normal parameters.
memtester is a user-space utility for testing the memory subsystem in a computer to determine if it is faulty. It does a good job of finding intermittent faults and non-deterministic faults. It has many tests to help catch borderline memory. memtester should compile and run on any 32- or 64-bit Unix or Unix-like system.
cpuburn is is a set of programs that load x86 CPUs as heavily as possible for the purposes of system testing. FPU and ALU instructions are coded in an endless loop in an attempt to maximize heat production from the CPU, putting stress on the CPU itself, the cooling system, the motherboard (especially voltage regulators), and power supply. The tests may damage undercooled, overclocked, or otherwise weak systems and cause data loss or permanent damage to electronic components.
mpt-status is a query tool for accessing the running configuration and status of LSI SCSI HBAs. It is a heavily modified version of the original mpt-status-1.0 tool written by Matt Braithwaite. It allows you to monitor the health and status of your RAID setup. Currently supported and tested HBAs are the LSI 1030 SCSI RAID storage controller and LSI SAS1064 SCSI RAID storage controller. Since the tool uses the MPI (message passing interface), chances are high that the basic information regarding RAID status will be available for all LSI-based controllers.
memtest86+ is a memory tester which is based on memtest86 v3.0, and provides an up-to-date version of this useful tool, which aims to be as reliable as the original. It has been fixed to work on AMD64 systems, and also properly detects all current CPUs and motherboard chipsets. It supports ECC polling for AMD64, i875P, and E7205, and displays some useful settings for the most popular chipsets.
check_hpasm is a plugin for Nagios which checks the hardware health of Hewlett-Packard Proliant servers. To accomplish this, you must have installed the hpasm package. The plugin checks the health of processors, power supplies, memory modules, fans, CPU- and board-temperatures, and alerts you if one of these components is faulty or operates outside its normal parameters.