Articles / How to Build a Beowulf

How to Build a Beowulf

I've set up two Beowulfs so far, and in both cases it involved gathering material from various Web sites and somehow putting it all together. I got everything up and running, but it was quite a "time sink" for me, so I was interested to receive a book entitled "How to Build a Beowulf". Finally, information regarding Beowulfs would be available in one place and I could save my bandwidth for other stuff!

My overall impression of the book is that it is targeted toward scientists and engineers who want to use Beowulfs as computational tools. As a result, there are certain areas which might seem to go into unnecessary detail regarding computer hardware and software, operating systems, etc. However, it provides a good description of the issues involved in setting up, maintaining, and running applications on Beowulf clusters. A lot of the material might seem redundant for an experienced system or network administrator, but one must keep in mind the target audience -- scientists and engineers who solve computational problems, not network problems. The book gives a good all around picture of the various components of a Beowulf.

The first two chapters give a brief overview of what Beowulfs are and what types of problems can be solved with them. There is a nice comparison to other types of proprietary parallel processing systems. The concise description of the hardware and software aspects of building a Beowulf cluster provides a good foundation for the later chapters in the book, which go into more detail about the various points raised in the first two chapters.

The rest of the book covers the hardware and software aspects of Beowulfs in detail. The common feature of all the chapters is that they do not go into specifics, which makes sense. However, the specific examples that are there are quite outdated. This is to be expected since the book was written before 1999, but it would have been nice if it had some updated examples. That said, the descriptions of various components like motherboards, RAM, the PCI bus, etc. are general enough to provide a foundation on which to base further reading from more specialized literature.

The chapter on Beowulf nodes is good, but for a person familiar with computer hardware, it can get a little tedious, with superfluous details. However, keeping in mind the audience to which this book is targeted, I feel the extra detail is justified. I learned a few things about the PCI bus myself! The chapter describes the various components that make up a machine, and then goes on to describe how to assemble one. Though the description is quite general, it provides a good preparation for a person planning on building machines to create a cluster. Networking is covered in much detail, discussing the large variety of network hardware and topologies available. The roles of various network components are discussed. In general, most Beowulf installations will employ fast Ethernet, the most cost-effective option. However, in many cases, the network may become the bottleneck, and high performance network components are required. High end hardware and protocols like FDDI, ATM, and Myrinet are all discussed. The software aspects of the network are also described in detail, including TCP/IP packets, sockets, RPC, and distributed filesystems (NFS and AFS). There is also a section on Java/RMI and CORBA. Mention is made of the r* commands. Overall, this is a very detailed and useful chapter, covering one of the ares of building clusters which can most affect performance.

After the first five chapters, the book starts on the software needed to manage and program a Beowulf. The chapter on managing clusters of machine is quite useful. It raises the issues of security and access to the cluster. Quite a bit of the chapter is devoted to discussion of how to allow access to the cluster and the implementation of a firewall (though it uses ipfwadm in the example!). The section on cloning nodes is very useful and goes a long way toward making installations on a large number of machines much more consistent. Some basic system administration and management tips are also provided. This part of the book is lacking in two ways: Firstly, though it does a good job of discussing basic security, it relies on the use rsh for system administration tasks. SSH is mentioned as a viable option. I would have preferred greater stress on SSH. Secondly, there is hardly any discussion of job submission/scheduling. These are very important components of a cluster used by several people with several jobs. Packages like DQS and PBS could have been discussed.

The last three chapters are devoted to programming a Beowulf. The reader should not expect to become an expert on parallel programming techniques with the help of these chapters. However, as a general introduction to the various aspects of parallel code, they do their job. Various features of parallel code such as graininess, synchronization, latency, and bandwidth are all discussed. A whole chapter is devoted to discussion of MPI. As before, the reader will not reach MPI guru status with the help of the chapter, but it does provide a concise overview of the functionality provided by MPI. Features such as synchronous and asynchronous I/O, data types, and parallel data structures are all covered. I was quite impressed with this chapter, as it provided a very clear and informative description of MPI, with non-trivial examples.

The final chapter provides a much more detailed example of MPI programming, using parallel sorting techniques as its basis. It concisely describes the pitfalls to look out for and the features of good parallel algorithms. Features such as communication costs (in terms of time), load balancing, redundancy, etc. are all described. It is detailed in its analysis of the sort algorithms in terms of time costs and bandwidth costs. Overall, the last chapters are very useful and helpful. Admittedly, they won't make you an expert, but they do provide a very good sampling of the problems and pitfalls of programming Beowulf clusters and provide tips on how to solve or get around them.

To sum up, the book provides an excellent description of the various components of Beowulf clusters. This is impressive, since there are so many variations possible when constructing clustered machines. The easy going style of the authors makes the book an enjoyable read. It covers both hardware and software components in detail, with examples. In both cases, the descriptions are sufficiently detailed, but general enough so as not to be too specific to a certain piece of hardware or software. A few things do detract from the book's utility: It is definitely outdated. A lot of the hardware examples refer to very old CPUs (the PII was the top-of-the-line CPU for this book). The firewall example uses ipfwadm. Security is covered, but I feel that SSH should have been given a little more stress and the dangers of rsh made more apparent. Finally, job management and scheduling should have been discussed in more detail.

Overall, I'd recommend this book to scientists and engineers who want to harness the power of a Beowulf cluster but aren't exactly sure how to go about it.

Recent comments

26 Mar 2002 06:23 Avatar bashoo

beowolf cluster
Guha`s rather personal review is good information
for individuals or groups who are not not scientist to
know what to look for in building a cluster.
I have been doing a personal research in beowolfing.

I am no super techie but the idea of building powerful computing from off-the-shelf-computers is
really is great.

What got me interested was an article sometime
last year in Wired magazine by David HM Spector about
super computing. there he mentioned Beowolf cluster for the rest of us.He rererenced Building Linux Clusters by David HM Specter published by
O`Reilly I bought the book, which isn`t cheap.But I
found it very easy to read good references.
Both technical and non-technical people can find
it useful. It comes with a CD-Rom chuck full of info
both old and new.But there`s caveat of sorts it`s
geared towards RedHat Linux. despite that I think it`s a good start for anyone who is interested in the
subject.
At this time of writing I am still trying to get additional Linux boxes to start making my Beowolf.
A techno-romantic poet and painter at the moment
I have limited funds but I keep searching and planning.

09 Mar 2002 20:49 Avatar digitalhermit

Beowulf demo in S. Florida
Speaking of Beowulf (Beowulves?), Dr. Raul Salazar will be giving a demonstration of this technology at an upcoming FLUX (Florida Linux User's Exchange) talk this coming Thursday in Ft. Lauderdale, FL (USA). He'll demonstrate how to setup a Beowulf on three machines (a mighty small Beowulf, I know, but the principle is largely the same). If you're interested in this technology and can find your way to Ft. Lauderdale, show up and support the local Linux community. For more information, please take a look at http://www.flux.org (http://www.flux.org).

09 Mar 2002 10:24 Avatar leimy

Hmmm I do this stuff for a living
I work for a company who is just breaking through in
the turnkey cluster solution area. I try not to speak for
them when I am off the clock but if you go to
MPI-Software
Technology Inc. you should be able to contact us
about possible commercial clustering solutions using
Linux and hopefully even Windows XP/NT/2000 soon.


We have been around for about 5 years and make a
high-quality implementation of the MPI standard for
various UN*X's and Windows


I also have the book you speak of as well as many
others on the subject of building "beowulf's". What I
have learned in my time at this company is that the
only things that are common across all clusters is
usually a high-speed network fabric for compute nodes
possibly rack-mounted 1U servers and a lot of custom
applications. :)

09 Mar 2002 07:26 Avatar rikm

How to Build a Beowulf
well, I've always wanted a Beowulf ...

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.