Articles / Lights-Out Administration

Lights-Out Administration

Current network and systems administration tools offer engineers a wide range of capabilities for remote administration. One capability that is limited is the capability to remotely power cycle a server or network device and perform remote diagnostics on any machine that will not boot. This paper will outline the requirements for a set of industry standard devices capable of performing remote functions on servers and network devices, targeted toward a common situation faced by network and systems administrators.

As more Information Technology departments centralize and consolidate to reduce cost, many remote sites are left with no on-site IT support. Remote administration of computers is increasingly common because of the significant cost benefits; many tasks can be automated, and the administrator does not have to physically visit each computer (CERT, 2000). In their whitepaper on Remote Systems Administration, Stephen Packard and Archie Andrews stated that remote systems administration is a reasonable, economical approach. Also, as networks and servers become critical to nearly all business functions, more IT departments are staffing or providing for some type of round-the-clock monitoring and support. While 7 by 24 support is a great capability, it is limited by the ability to gain physical access to off-site network devices and servers when they lock up or cannot be accessed in-band. Even when the problem occurs during business hours, the lack of on-site IT staff may require an unskilled user to work with the remote IT staff to correct the problem. This is not a good use of the unskilled user's time and may turn a small problem into a large one. The type of devices discussed in this paper would allow network and systems administrators to access functions and information that would normally require physical access to the equipment.

The state-of-the-art in remote systems and network device administration includes graphical tools that allow the simultaneous remote monitoring, configuration, and management of a large number of devices. Uninterruptable power supplies can be connected to the network (typically in-band), which allows significant monitoring and a cold boot (i.e., power off/power on). At least one PC server vendor offers a board that allows remote power cycling and limited administration on a machine that will not boot. With these capabilities, why is another class of remote management tool required?

Toolsets for remote configuration and diagnostics tend to be vendor-specific and work with only a subset of a vendor's products. Unless your operating environment contains a limited set of equipment from a single vendor, remote administration will likely require multiple software products. Further, each of these tools tends to be expensive and require training and experience to use properly. The problem with tools that rely on in-band communication is that a frozen device often will not be reachable in-band. Also, these in-band solutions create network overhead that competes with customers for available bandwidth. While smart UPS devices are useful, a simple cold-boot capability with no diagnostic capability is of limited value. Compaq offers the Remote Insight Board (and a Lights-Out Edition) that provides some of the capabilities that I will describe below. This is a great tool which provides significant capability, but it only works with a limited set of Compaq servers. My proposal is for a more limited feature set with greater out-of-band communications capability and the ability to work with any PCI- or PCMCIA-compliant device.

This paper recommends the creation of an open standard for a remote control and diagnostic device that includes an open standard MIB (Management Information Base). A standard MIB would allow the device to report status to the large variety of SNMP (Simple Network Management Protocol) applications currently available. This standard should also define an API that would allow applications programmers to access all of the features and capabilities of the device. The standard and MIB could be created by a Request for Comments to the Internet Engineering Task Force. The creation of these standards would allow hardware and software vendors equal access to the market. It could significantly affect the cost of network and systems management by eliminating multiple applications required for administration.

The potential features of the device are nearly unlimited, but the minimum features required include:

  • Support for a variety of in- and out-of-band communications which would include (but not be limited to):
    • 100baseT (RJ45),
    • Plain Old Telephone Service (RJ11),
    • Cellular Digital Packet Data,
    • and 802.11b.
  • The ability to be managed by SNMP through a standard MIB.
  • The ability to access device features and data via a Web browser or directly via an open API with a standard data specification and access method.
  • Support for full DHCP or static entries.
  • Support for SSL.
  • An EEPROM socket for a user-customized ROM.
  • User- and roles-based security with appropriate logging.
  • System event logging.
  • Support for warm and cold device rebooting.
  • Support for POST (Power-On Self-Test) reporting via SNMP (MIB) and as events written to the system log.
  • On-board battery backup for sustained functions for at least 30 minutes.

These minimum features would address the most common problem for systems and network administrators: rebooting a frozen server or device. They would also allow diagnosis, to facilitate a decision about whether or not to dispatch repair service.

The standard should also provide for full capture and reporting of data from SMART (Self-Monitoring, Analysis, and Reporting Technology) hard drives and from devices and systems that use ACPI (Advanced Configuration and Power Interface) and other related standards. It should provide sufficient capacity for future growth and interface with other new devices as those standards are accepted.

One difference between this device and the devices currently available is the variety of methods for connections. Particularly useful is the ability to connect to the device using a wireless connection. As most large organizations use some type of PBX (Phone Branch Exchange), POTS lines are limited. Server and communication rooms are already crowded with cables. CDPD connectivity would provide a low-cost method for ready access to the device when in-band signaling is not available.

The device should be implemented in two versions. Each device would have common features but would vary by physical interface and form factor. The first device would use the industry standard PCI interface on a half-length card, to facilitate compatibility with the widest range of products. The second device would use the industry standard PCMCIA interface and Type II form factor. These two devices could provide interfaces to networking devices, servers, and appliance devices.

In conclusion, this paper presented a concept for a standard specification for a device to facilitate remote management of servers and network devices. While the communications capabilities of the device described would be significant, its most important feature would be that it was built around an open standard available to all manufacturers and applications developers. This would lead to significant cost savings for IT managers by reducing software, software maintenance, and staff training costs. It would facilitate efficient management by significantly reducing the number of applications required to effectively manage a network and networked systems. It would allow greater integration with intelligent network management systems for automated response to outages. While this paper described add-on devices, the technology could be integrated into servers and network devices as a value-added feature. I anticipate that these devices would add no more than $500 to the cost of a system. This cost pales in comparison to the hard and soft costs of remote administration described throughout this paper.

Bibliography

Bradner, S. (1996)
RFC 2026
CERT (2000)
Configure computers for secure remote administration
Packard, Stephen L. and Andrews, Archie D. (2000)
Remote System Administration

Recent comments

03 Jan 2002 12:49 Avatar lingenfr

Author's Response 1
Some great input on some great devices. I am not going to critique each device or product, but suffice it to say that none of them address all of the requirements outlined. My focus is not on a specific device, but on an open standard, a published MIB and out-of-band capabilities. So far the discussion has focused on specific devices or software that really exemplify the problem (i.e. vendor-specific and in-band) that an open standard could address. The only one that really heads down the path that I outline is the RealWeasel board. While it does not immediately address many of the requirements that I outlined, the fact that it is based on an open specification certainly sets the stage for accessing it from within any vendors network management solution (i.e. Openview, Unicenter ) which is a key tenet of my paper.

While sharing solutions is valuable to the readers, I would also be interested in your thoughts on the requirements for the perfect solution for remote server and network management. Make the assumption that it does not exist as I am pretty confident that it does not. If you were king (or queen) for a day and could write the open standard, what capabilities would the specification include. My brain is probably limited to current technologies and I had difficulty getting out of that box. I shared this with Freshmeat because I wanted feedback from SA's and NM's and that is obviously what is happening. Thanks for your interest.

23 Dec 2001 08:11 Avatar FrancoisHarvey

Hum HP TopTools
A card with a lot of option (bios access, dmi browser, snmp, server status, etc.) exist
it TopTools card, some of my customer use this for
critical server


www.hp.com/toptools/


(free version of toptools software exist, and the card cost some money)

22 Dec 2001 22:48 Avatar winsurfn

Seasons Greetings - CIM is free.
It worries me that anyone would suggest increasing the cost of a server by ~$500 for a tool that should never be required if the server is reliable and the sysadmin does his/her job properly. Personally I'd rather see my wages go up by $500 per server! :)

Compaq Insight Manager has been around for years, is free and supports Linux as well as the others OS's. (Other server builders would do well to develop agents for CIM instead of TNG)

Nevertheless I'm still trying to think of a situation (other than a floppy or bootable CD left in it) where if a server fails to boot, sysadmin attendance would not be required. Most server builders use the security of their systems as a selling point, a tool that provides an avenue for remote unattended access to the hardware whilst the os is inoperable sounds dangerous. The other thing that would worry me is would the server continue to function if the device itself failed. Of the 150 servers I have managed over the past 2 years none less than four years old have failed because of hardware problems.

A reliable supplier of spare parts sounds like a safer bet.

22 Dec 2001 19:59 Avatar twwlogin

Remote control
http://www.realweasel.com
http://www.apcc.com/products/masterswitch/index.cfm

Great combination!

22 Dec 2001 12:32 Avatar gafami

hrmmm... :)
i am not 100% sure if this device is what has been meant in the article.. but we use one of those:

http://www.avocent.de/enterp_loesung.asp


pretty neat solution for remote and local management of multiple servers...

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.