As more Information Technology departments centralize and consolidate to reduce cost, many remote sites are left with no on-site IT support. Remote administration of computers is increasingly common because of the significant cost benefits; many tasks can be automated, and the administrator does not have to physically visit each computer (CERT, 2000). In their whitepaper on Remote Systems Administration, Stephen Packard and Archie Andrews stated that remote systems administration is a reasonable, economical approach. Also, as networks and servers become critical to nearly all business functions, more IT departments are staffing or providing for some type of round-the-clock monitoring and support. While 7 by 24 support is a great capability, it is limited by the ability to gain physical access to off-site network devices and servers when they lock up or cannot be accessed in-band. Even when the problem occurs during business hours, the lack of on-site IT staff may require an unskilled user to work with the remote IT staff to correct the problem. This is not a good use of the unskilled user's time and may turn a small problem into a large one. The type of devices discussed in this paper would allow network and systems administrators to access functions and information that would normally require physical access to the equipment.
The state-of-the-art in remote systems and network device administration includes graphical tools that allow the simultaneous remote monitoring, configuration, and management of a large number of devices. Uninterruptable power supplies can be connected to the network (typically in-band), which allows significant monitoring and a cold boot (i.e., power off/power on). At least one PC server vendor offers a board that allows remote power cycling and limited administration on a machine that will not boot. With these capabilities, why is another class of remote management tool required?
Toolsets for remote configuration and diagnostics tend to be vendor-specific and work with only a subset of a vendor's products. Unless your operating environment contains a limited set of equipment from a single vendor, remote administration will likely require multiple software products. Further, each of these tools tends to be expensive and require training and experience to use properly. The problem with tools that rely on in-band communication is that a frozen device often will not be reachable in-band. Also, these in-band solutions create network overhead that competes with customers for available bandwidth. While smart UPS devices are useful, a simple cold-boot capability with no diagnostic capability is of limited value. Compaq offers the Remote Insight Board (and a Lights-Out Edition) that provides some of the capabilities that I will describe below. This is a great tool which provides significant capability, but it only works with a limited set of Compaq servers. My proposal is for a more limited feature set with greater out-of-band communications capability and the ability to work with any PCI- or PCMCIA-compliant device.
This paper recommends the creation of an open standard for a remote control and diagnostic device that includes an open standard MIB (Management Information Base). A standard MIB would allow the device to report status to the large variety of SNMP (Simple Network Management Protocol) applications currently available. This standard should also define an API that would allow applications programmers to access all of the features and capabilities of the device. The standard and MIB could be created by a Request for Comments to the Internet Engineering Task Force. The creation of these standards would allow hardware and software vendors equal access to the market. It could significantly affect the cost of network and systems management by eliminating multiple applications required for administration.
The potential features of the device are nearly unlimited, but the minimum features required include:
These minimum features would address the most common problem for systems and network administrators: rebooting a frozen server or device. They would also allow diagnosis, to facilitate a decision about whether or not to dispatch repair service.
The standard should also provide for full capture and reporting of data from SMART (Self-Monitoring, Analysis, and Reporting Technology) hard drives and from devices and systems that use ACPI (Advanced Configuration and Power Interface) and other related standards. It should provide sufficient capacity for future growth and interface with other new devices as those standards are accepted.
One difference between this device and the devices currently available is the variety of methods for connections. Particularly useful is the ability to connect to the device using a wireless connection. As most large organizations use some type of PBX (Phone Branch Exchange), POTS lines are limited. Server and communication rooms are already crowded with cables. CDPD connectivity would provide a low-cost method for ready access to the device when in-band signaling is not available.
The device should be implemented in two versions. Each device would have common features but would vary by physical interface and form factor. The first device would use the industry standard PCI interface on a half-length card, to facilitate compatibility with the widest range of products. The second device would use the industry standard PCMCIA interface and Type II form factor. These two devices could provide interfaces to networking devices, servers, and appliance devices.
In conclusion, this paper presented a concept for a standard specification for a device to facilitate remote management of servers and network devices. While the communications capabilities of the device described would be significant, its most important feature would be that it was built around an open standard available to all manufacturers and applications developers. This would lead to significant cost savings for IT managers by reducing software, software maintenance, and staff training costs. It would facilitate efficient management by significantly reducing the number of applications required to effectively manage a network and networked systems. It would allow greater integration with intelligent network management systems for automated response to outages. While this paper described add-on devices, the technology could be integrated into servers and network devices as a value-added feature. I anticipate that these devices would add no more than $500 to the cost of a system. This cost pales in comparison to the hard and soft costs of remote administration described throughout this paper.