Articles / Linux Clustering Software

Linux Clustering Software

Just a few years ago, to most people, the terms "Linux cluster" and "Beowulf cluster" were virtually synonymous. However, these days, many people are realizing that Linux clusters can not only be used to make cheap supercomputers, but can also be used for high availability, load balancing, rendering farms, and more.

In this review, I will try to go through most of the over 100 projects that are listed in freshmeat's Clustering/Distributed Networks category that relate to Linux clustering. In order to do this effectively, I will break down the projects into a few categories. Here is a quick outline of how this review will be structured:

Software for building and using clusters
  • High Performance Computing Software (Beowulf/Scyld, OSCAR, OpenMosix...), including special mention of specific groups of projects such as Single System Image systems and Cluster-specific Operating Systems.
  • High Availability Software (Kimberlite, Heartbeat...).
  • Load Balancing Software (Linux Virtual Server, Ultra Monkey...).
Software used on, and for using, clusters
  • File Systems (Intermezzo, ClusterNFS, DRBD...).
  • Installation and Configuration Software (FAI, System Installation Suite...).
  • Monitoring and Management Software (Ganglia, MOSIXVIEW, Performance Co-Pilot...).
  • Programming and Execution Environments and Tools (MPI, PVM, spread...).
  • Miscellaneous (things that don't necessarily fit into other categories well).

Software for building and using clusters

In this part of the review, I will talk about projects that offer "complete solutions" for the various kinds of Linux Clustering. This means that projects that offer complete start-to-finish solutions for the respective goals will be here, while specific, partial solutions (such as installation, management, monitoring, or programming environment tools) will be covered in Part II. Note that these "start-to-finish" solutions are different for the different types of clustering; for example, while this usually means some sort of slave node installation mechanism for HPC clusters, an installation mechanism is not needed for a High Availability or Load Balancing cluster.

High Performance Computing Software

High Performance Computing seems to be the term that everyone likes to use these days. With regards to Linux clustering, this refers to creating a cluster to do any type of task that involves a great deal of computing power, whether it be modeling galaxy collisions or rendering the animation of the latest box office hit.

In this arena, there are quite a few projects out there. Unfortunately, this is sometimes due to the fact that some people are interested in building their own solution to a problem, rather than basing work on what is already out there. I will try to concentrate some of the better projects.

Let's start off by talking a little about the project that most everyone has heard of, the Beowulf Project, also known these days as Scyld. Scyld contains an enhanced kernel and some tools and libraries that are used to present the cluster as a "Single System Image". This idea of a single system image means that processes that are running on slave nodes in the cluster are visible and manageable from the master node, giving the impression of the cluster being just a single system.

The important thing to remember about these single system image systems is that because they alter the way the kernel works, they either distribute their own version of the kernel or distribute kernel patches. This means that you can only use the software with the specific kernel they give you or the version of the kernel that the patch is made for. There are other projects that use this idea of a single system image. Compaq has adopted their NonStop Clusters for Unixware software for Linux and put it into Single System Image Clusters. OpenMosix is a set of extensions to the standard kernel, as well as some userland tools that they are developing to help use the cluster more efficiently. SCE, or the Scalable Cluster Environment, is a set of tools that allow you to build and use a Beowulf cluster. Bproc, the program at the heart of the Beowulf project's ability to present a single system image, is used in Clubmask, as well as some other Open Source projects and tools like Kickstart, cfengine, the Maui scheduler, LAM/MPI, and more.

There are other HPC clustering solutions that do not change the way the kernel functions. These use other means to run jobs and deal with showing information about them. Cplant, the Ka Clustering Toolkit, and OSCAR all allow you to build, use, and manage your cluster in this manner.

There are certain Operating Systems that are geared towards high performance computing and clusters. They can serve various functions, from providing common HPC tools to actually being a specific cluster installation. Warewulf is a distro that you configure and burn onto a CD, then pop in the slave nodes. Boot them off it, and you have an instant cluster. ClumpOS is a MOSIX distribution that is put on a CD to allow the user to quickly add nodes to a MOSIX cluster. One of the nice things about these CD-based complete systems is that they can be temporary; slave nodes can be booted off the CD to be part of the cluster, and then rebooted without the CD to go back to their regular operating systems that reside on their hard disks. MSC.Linux is focused on high performance computing and comes with various kernel extensions, engineering tools, Beowulf tools, and desktop components that facilitate cluster computing.

High Availability Software

These days, maintaining system availability is important to everyone. No one wants to have any critical services unavailable, be they a mail server, Web server, database, or something else. In the past few years, many Linux projects have sprung up to try to meet this demand, which has for so many years been filled well by the commercial operating system and software market (AIX or Solaris, for example).

I like to break this idea of High Availability into two separate categories, as it makes the concepts easier to think about. These are straight High Availability and Load Balancing and Sharing, which can be considered a specialized variant of High Availability. Let's talk about straight High Availability first. In its simplest form, you have at least two systems, one of which is live and is currently the box that users connect to when they access your Web page or database. The other is in some sort of stand-by state, waiting and watching to take over should something happen to the live server. There are more complicated issues that can come into play, such as access to shared resources, MAC address takeover, dealing with "quorums" in setups that have more than two boxes, etc. Each of the projects deals with some subset of these issues.

Kimberlite specializes in shared data storage and maintaining data integrity. Piranha (a.k.a. the Red Hat High Availability Server Project), can serve in one of two ways; it can be a two-node high availability failover solution or a multi-node load balancing solution. One of the better-known projects in this space is probably the High Availability Linux Project, also known as Linux-HA. The heart of Linux-HA is Heartbeat, which provides a heartbeat, monitoring, and IP takeover functionality. It can run heartbeats over serial ports or UDP broadcast or multicast, and can re-allocate IP addresses and other resources to various members of the cluster when a node goes down, and restore them when the node comes back up. A very lightweight High Availability setup is peerd, which will do simple monitoring and failover. The purpose of this project is not to provide sub-second failovers, but to have a total system failover in under a minute while allowing the systems to be located in physically separate locations (rather than in, say, the same rack in the server room). Another lightweight project is Poor Man's High Availability, which uses the Dynamic DNS service from dnsart.com to do simple failover and failback of a Web server.

Now, let's go on to talk a bit about that other type of High Availability:

Load Balancing Software

Load Balancing is a special kind of High Availability, because not only does it offer the same monitoring and failover services that the straight HA packages offer, it can also balance and share incoming traffic, so the load is distributed somewhat evenly across multiple servers, allowing for faster responses and less likelihood of a server overloading.

Load Balancing is achieved by having at least one (though to achieve "real" High Availability, you'd need at least two) load balancers/routers and at least two backend servers. The live load balancer receives incoming requests, monitors the load and available system resources on the backend servers, and redirects requests to the most appropriate server. The backend servers are all live, in the sense that they are all handling requests they receive from the router. All of this is completely transparent to the end user.

One of the best known projects in this area is the Linux Virtual Server Project. It uses the load balancers to pass along requests to the servers, and can "virtualize" almost any TCP or UDP service, such as HTTP(S), DNS, ssh, POP, IMAP, SMTP, and FTP. Many Load Balancing projects are based on LVS. Ultra Monkey incorporates LVS, a heartbeat, and service monitoring to provide highly available and load balanced services. As mentioned previously, Piranha has a load balancing mode, which it refers to in its documentation as LVS mode. Keepalived adds a strong and robust keepalive facility to LVS. It monitors the server pools, and when one of the servers goes down, it tells the kernel and has the server removed from the LVS topology. The Zeus Load Balancer is not based on LVS, but offers similar functionality. It combines content-aware traffic management, site health monitoring, and failover services in its Web site load balancing. Another project not based on LVS is Pen, a simple load balancer for TCP-based protocols like HTTP or SMTP. Turbolinux Cluster Server is the last of the load balancing projects I will talk about. It is from the folks at Turbolinux, and its load balancing and monitoring software allows detection and recovery from hardware and software failures (if recovery is possible).

Software used on, and for using, clusters

In this part of the review, I will talk about software that is used on a cluster for a specific purpose. This purpose can be an execution environment, a programming interface, monitoring and/or management software, filesystems that lend themselves well to clustering, and more.

Filesystems

Let's first talk about filesystems and filesystem-related projects. There are a lot of filesystems for Linux. These days, many of them have features like journaling, and some were designed specifically for use in clustered and distributed systems. The strong and weak points and differences in the various filesystems is a bit too large a topic to get into here, so I will just mention some of them, with the recommendation that more research be done before you decide what is best for you.

OpenAFS (an open version of the Andrew Filesystem originally developed at Carnegie Mellon University), GFS (the Global Filesystem), Coda, and InterMezzo are all distributed/cluster filesystems.

There are some other projects that I want to mention in this section because they relate to filesystems, though may not technically be considered filesystems by some. First, there is the ClusterNFS project, a set of patches to the Universal NFS Daemon that allows multiple clients to NFS mount the same root filesystem. It uses a mechanism in which it "tags" files. This is quite useful for NFS mounting the root filesystem for a cluster of diskless slave nodes.

The next two projects I will mention both deal with block devices. ENBD, the Enhanced Network Block Device, and DRBD allow remote disks to appear as if they were local block devices. This is especially useful when setting up a RAID mirror, to have your data updated in realtime but located on separate machines so it can be highly available.

Installation and Configuration Software

When dealing with perhaps hundreds or thousands of slave nodes in a cluster, installing and configuring them all is a huge task, and it is quite boring, especially when each of the nodes is functionally the exact duplicate of its hundreds or thousands of counterparts. Any tool to automate as much of this process as possible is always a welcome addition to the arsenal of the cluster administrator. As this is a problem that many other people have already faced, there are a few different solutions, each attacking the issue a little differently and solving different aspects of the overall problem.

FAI stands for Fully Automatic Installation. It is a non-interactive system to install the Debian GNU/Linux distribution. The installation of one or many nodes is initiated, and after installation is complete, the systems are fully configured and running, with no intervention on the part of the user. While FAI is specific to the Debian GNU/Linux distro, there are other projects that are not.

To install boxes
System Installation Suite
It is the answer

(I thought I would present this next project in haiku form, just to see if you were still paying attention.) The System Installation Suite is an image-based installation tool. Images are created and stored on an image server and copied to nodes as needed. Currently, Red Hat, Mandrake, SuSE, Conectiva, and Turbolinux are supported by SIS, with the hope that Debian and Debian-variant support will be coming sometime soon. SIS is made up of three projects, SystemImager, System Installer, and System Configurator.

Monitoring and Management Software

Being able to monitor and manage a cluster from one central location is another key to not going insane while administering a cluster of a couple hundred nodes. In this space, there seems to be a large number of projects, and again, each one does things a little differently, solving different variations on the common problem.

ClusterIt is only for management and maintenance of large numbers of systems. Ganglia is a very stable, well-known and well-tested, realtime monitoring and remote execution environment. It is in use in universities, government labs, and complete cluster implementations all over the world, and is highly respected by everyone I know that has used it. Performance Co-Pilot is a monitoring and management package from SGI. Originally written for IRIX, SGI ported it to Linux and released it to the Open Source community back in February 2000. With years of development and testing behind its commercial version, SGI is able to use its knowledge to offer the Linux version the stability and functionality that is sometimes lacking in Open Source projects.

Besides these projects that can be used on any set of boxes, there are some projects that were developed to work specifically with other projects. MOSIXVIEW is a GUI for a MOSIX cluster. It supports both MOSIX and OpenMosix, acting as a frontend to the mosctl commands. LVSmon is a monitoring daemon written to be used for maintaining LVS tables (LVS referring to the Linux Virtual Server Project, previously mentioned in the Load Balancing Software section above).

In addition to these large projects that encompass a lot of capabilities, there are many projects that are simpler, with smaller scope. Syncopt is a solution to the problem of keeping software on multiple systems up-to-date without undertaking the task of installing the software on each system by hand. With Syncopt, the software is installed on a central server and propagated to all systems that need it. Fsync is similar to rsync and CVS; it allows file synchronization between remote hosts, with functionality to merge differences and hooks for tools to merge the trees. It is a single Perl script, designed to function over modem speed (read: slow) network connections. Ghosts (global hosts) is a system that allows creation of macros to define groups of machines. Using these macros, gsh, an included parallel task runner, can execute code on a group of machines simultaneously. Last, but not least in this section, there is pconsole, a tool with a functionality somewhat similar to ghosts, but with a different approach. pconsole is an interactive administrative shell for clusters. It allows a user to open connections to a number of different cluster nodes and type commands into a special window, and the command is sent to each of the machines.

Programming and Execution Environments and Tools

Once you have a cluster built, what do you do with it? How do you write programs that take advantage of it? What libraries and programming tools are out there to help you write code optimized to run on a cluster? What software is out there to help you run a job on the cluster or keep a schedule of what is going to run, and when? These are the types of questions that the projects in this section answer.

PVM stands for the Parallel Virtual Machine. It is a portable message passing interface that lets a heterogeneous set of machines function as a cluster. Applications that use it can be written in C, C++, or Fortran, and can be comprised of a number of separate components (processes). PVM++ attempts to provide an easy-to-use library to program for PVM in C++. Programs like pvmpov or PVM Gmake use the PVM interface. pvmpov is a patch for POV-Ray which allows rendering to take place on a cluster using PVM-based code. PVM Gmake is a variant of GNU make which uses PVM to build in parallel on several remote machines.

Another of these message passing interfaces is MPI, which, coincidentally, stands for "The Message Passing Interface". There are a few different implementations of MPI. LAM/MPI and MPICH are two of them, and Object-Oriented MPI is an OO interface to the MPI standard which can be used at a much higher level than the standard C++ bindings. PETSc is a set of data structures and routines used for parallel applications that employs the MPI standard for all its message passing.

Another messaging system that is available is Spread. Spread is a toolkit that provides a message passing interface that can be used for highly available applications, a message bus for clusters, replicated databases, and failover applications.

Now, you may be asking how someone might keep track of all the jobs that are going to be run on a cluster. For that, you need a scheduler. There are a number of different schedulers out there, but when it comes to Linux Clustering, I mainly hear about two of them, Condor and Maui. These schedulers can handle scheduling policies, heterogeneous resources, dynamic priorities, reservations, and more.

Miscellaneous Cluster-Related Projects

There are some projects that are related to clustering but do not really fit into the categories that I have used to group things above. I will now briefly mention two these, as they are useful and worth noting.

First, there is the Distributed Lock Manager from IBM. The DLM provides an implementation of locking semantics modeled after the VAX Cluster locking semantics. This is a specific tool, not an entire cluster environment. An API is embodied in a shared library, and applications that wish to use the DLM can connect through this.

Secondly, the Linux Terminal Server Project is a great way of providing access to diskless systems (workstations, cluster slave nodes, or any other diskless systems you may have). LTSP provides the administrative tools that make accessing these systems easier than it might otherwise be.

Some Comments In Closing

This is by no means an all-encompassing list of the projects that exist in the world of Linux Clustering. There are a lot of other projects out there. I tried not to mention projects that have dead links or which haven't had any updates in a couple of years. At the pace that the world of clustering moves, two years is pretty outdated, unfortunately. Also, I may have missed some projects. Maybe they aren't included in the Clustering/Distributed Networks Category here on freshmeat. Some other good categories to look in would be:

, just to name a few. Maybe they aren't listed on freshmeat at all. There are also many Web sites on the great big World Wide Web that are devoted to Linux Clustering in its various forms:

The High Availability Linux Project
This site is not only the home of the Linux-HA project, but also has lots of information about Linux High Availability.
The Linux Clustering Information Center
Ok, I may be a little biased, as this is my Web site, but I think it's a pretty useful place to find links to all sorts of information about all the types of clustering, from software to documentation to Linux Clustering haikus.
The Beowulf Project
The project that started it all. This page contains the history of the Beowulf Project, as well as some links to some other interesting sources of information.
The Sourceforge Clustering Foundry
A Sourceforge Foundry devoted to Clustering. What more could you ask for?

Again, this is a partial listing. I would recommend these places as good starting points in your search for more information; more likely than not, they will be able to point you in the direction you want to go.

What is the state of Linux Clustering? Where does it stand now, and where is it going?

These are good questions. Linux Clustering is, at least in my biased opinion, off to a great start. With recent announcements such as Pacific Northwest National Laboratory's purchase of a 1,400 node McKinley cluster running Linux (with an expected peak performance of 8.3 Teraflops) and vendors like Penguin Computing announcing their new 1U SMP Athlon system (ideal for clustering), the future is looking quite bright. Many people are taking Linux seriously these days, with support from companies like IBM, SGI, Intel, and Sun, just to name a few, as well as efforts at many universities and the DOE National Labs, where software development and testing take place. With these pieces at the base, there is no real ceiling of how far this can go. This may sound kind of hokey, but I think it is true. Being able to have supercomputer speeds or fault tolerant systems for a fraction of what it used to cost, I think many people are going to embrace this and run with it.

RSS Recent comments

01 Jun 2002 11:06 mattdm

xCAT
Don't forget xCAT.

01 Jun 2002 12:37 jgreenseid

Re: xCAT
Don't forget xCAT.

Unfortunately, xCAT isn't registered here on Freshmeat, so I did not include it in the review, but yes, from what little I've seen of it, xCAT looks pretty cool.

02 Jun 2002 08:28 jeffcovey

Re: xCAT

> xCAT isn't registered here on Freshmeat, so I did not include it in
> the review

Just to be clear: In my message inviting you to write this article, I
said,

"We can add or recategorize any projects that you think
need to be listed under this category."

03 Jun 2002 10:57 imipak

Thoughts from the peanut gallery
First, some of the projects are in an indeterminate state. SCE, for example, has no downloads at the moment, and the download link on their entry is broken. (Hey, I'm as guilty as anyone, when it comes to link validation & project validation. FOLK is -hardly- the standard-bearer for Quality Maintenance.) Ideally, though, a write-up would red-flag suspect links and projects in limbo.

But, remember this may be the ideal, but writers have real-lives, are human, and have finite patience with net-lag. The comments can red-flag a project or link just as easily as the master article. If I'd like to see anything in the follow-up comments, it's information from developers and users on what projects mentioned (or missed) are dead, alive, both, or neither.

The second thing I'd mention is that PVM and MPI, as far I am able to understand them, don't actually require underlying clustering software to work. Given that PVM is designed to work with fixed numbers of physical machines, mixing it into a dynamically-resizable virtual architecture might be, ummm, entertaining?

Thirdly, I'd like to mention that OpenMosix is a political fork from MOSIX, and that the two are unlikely to be compatiable. I don't want to get into the issues here - from everything I've heard, there's a LOT of ill-feeling involved, and I've learned that you really don't want to open a can of angry vampire worms.

The technical issue, though, is that heterogenius clustering solutions don't yet exist - there are quite simply no standard frameworks - and project splits can diverge amazingly rapidly. This means that although you can have clusters containing different machines (at least, for some solutions), you can't mix-and-match solutions.

This brings me to my last point. At the present time, inter-communication between clustering methods doesn't exist. This means that whatever method you choose will be a compromise solution. If you have a problem which you want to cluster, but that problem contains elements which each would be better done with a different solution, all you can do is pick the one that is least worst overall.

A trivial, and undeniably lame, example would be clustering a Mandelbrot generator:

Your virtual screen is fixed, so you'd want to use PVM to split the individual operations. PVM is optimal when you've a fixed number of components. Your maths algorithm is also fixed (3 independent operations), but a different size. You still want PVM, but now you're running a virtual machine inside a virtual machine. If you're wanting to generate multiple regions, though, each region wants to be run independently.

If you don't know how many regions you want to generate, at compile time, then you're going to have to use MPI or some other dynamically-resizable clustering methodology.

MPI and PVM are not interoperable. They cannot talk to each other. Mixing them into the same program, therefore, is usually not a good idea.

This leaves two options. (1) Use PVM for the bulk of the work, as before, and then support different regions by having "launchers" on the network, which you can feed the data to independently run those PVM sessions. You then collect the data at the end. Since, in this case, there's no communication between virtual machines, except at the start and end, you can get away with this. If there was a significant amount, you can forget it. This is optimal for any given point, as PVM doesn't need to do the housework of a dynamic solution, so the architecture is much faster for this static problem. Only the work that -needs- to be done is done, and only the traffic that -needs- to be sent is sent.

(2) Use MPI or some other dynamic clustering technology for the whole system. Treat the whole problem space as a single, dynamically-stretchable problem. Because there are likely to be more points being calculated than computers, you can have a mechanism for farming out & re-distributing the work, on the fly. It's sub-optimal for any given point, but optimal for the problem as a whole. This is because your low-level work now has to involve a lot more processing, because you're not making so many simplifying assumptions. On the other hand, your higher layers are vastly better off, as ALL the regions can use ALL the capacity of the cluster. You don't have autonomous clusters sitting idle, because they ran out of work.

All this boils down to: What assumptions can I make, and what clustering solution can exploit those assumptions the best? If you can answer that, then you've answered how you need to cluster your computers to best solve your problem.

03 Jun 2002 11:35 imipak

Re: xCAT

> Don't forget xCAT.

Oh, there were plenty "missed". COSM, for example. There were other aspects of clustering which were overlooked, too. "Fault Tolerent" systems (essentially HA with zero switch-over time) were not discussed.

However, let's be fair on the author, here. However long, complete, or researched an article, it's always going to miss something, for some reason or other. IMHO, that's one of the good things about comments. If there's something vitally important to add, we're able to add it, putting less pressure on authors to "get it perfect", so they can spend more time feeling it's worth doing at all.

Remember that those 300-600 page monster manuals that cost more than your house? Those won't contain everything, either. (In fact, those usually contain less than your average Linux HOWTO, or a Freshmeat article, in terms of actual useful knowledge.)

I'm assuming that the vast number of apps and methods missed were missed:

a) Because the author had run out of Mountain Dew and just wanted to get the damn thing finished

b) Because researching every little app, every side-avenue, every derivative, etc, would take longer than the remaining estimated lifespan of the Universe, and the author felt it more important to post something than nothing

c) Some applications aren't "obvious" clustering apps, but could be used that way. nmap, for example, could be used as an alternative to heartbeat, where the system load is moved onto one machine. (Simply scan your cluster, repeatedly, for a change in port status or machine status.) Anything which allows you to use, manipulate, or process data from, a network connection in a way that can simplify some aspect of clustering could be considered a clustering tool of sorts. In the end, you can either draw the line at some level of indirect obscurity, or you can go quietly insane, burbling to yourself in the corner.

d) Sometimes, a useful tool is going to get nudged out for other reasons. Too many others, of the same ilk, for example. Confusion over whether it meets a given criteria, as seems to have been the case with xCAT.

In my recent article on K5, on extreme tech, I listed maybe a dozen or so corporations involved in technology way outside the normal realms of sane environments. By the next day, I'd researched maybe a dozen more, and had strong pointers to maybe a few hundred more, again. Should I have waited, and published the article as a 10-tonne paperweight guide to companies you'll never ever want or need to know about?

Nah! As I've written elsewhere, the comments section can add all the "fixes" and addendums. That's what it's there for (besides actual comments!! :) Release Early, Release Often is a maxim that works well for code, and works well for weblogs, too.

03 Jun 2002 11:40 mattdm

Re: xCAT
My goodness. That was quite verbose. I wasn't chiding the author for missing something; I was just mentioning another cool clustering tool. You even end by saying that doing this is exactly what the comments are for -- what was your point exactly? To write a lot of words? To refer to your k5 article? I don't get it.

03 Jun 2002 15:50 jgreenseid

Re: xCAT
> xCAT isn't registered here on Freshmeat, so I did not include it in the review

Just to be clear: In my message
inviting you to write this article, I
said, "We can add or recategorize any
projects that you think
need to be listed under this
category."

Yes, you did, I think that I just didn't realize that you meant that this also included actually registering projects to Freshmeat itself, I thought you meant what we actually did -- added/recategorized some projects that should have been included in this category. I thought that was left to project participants. Apparently a little hiccup in the lines of communication. Though I think no great harm done. :)

03 Jun 2002 16:15 jgreenseid

Re: xCAT

After reading through these two messages you've posted, I basically agree with most of what you said. When I approached the writing of this review, I ended up writing what I knew about. I have a very strong SA background, so I know things like tools better than programming environments, for example. And also, there are some projects out there that I have just never heard of, and may have missed.

I tried to do some research on everything that was in the clustering/distributed category, so I could include as much as possible, and I also did a few general Freshmeat searches to try to find projects not in this specific category.

In your above post, you mentioned some specific details about a few different projects (interoperability with similar projects, the current state of the project, and such). I tried to leave out projects that seemed completely dead or unreachable, for example. And on matters of the interoperability example, I didn't want to get into this too much. I think that I mentioned over 70 projects in this review. That is a lot. If I had gone into as much detail as I had in the first draft of the first section, this review would have been unreadable. I tried to give the reader a general overview of what's out there, and then let them hit the projects for more specifics. When you go to openMosix's website, for example, you find out that yes, they are not interoperable with MOSIX, but there are so many of the little things like that that the review would have been too unwieldy -- who would want to spend half an hour or an hour reading a project review? I know I'd get bored, and probably get a headache, from staring at the computer for that long.

Oh, and by the way, xCAT is kind of cool, but www.x-cat.org/download/ (www.x-cat.org/download/) -- "xCAT is for use by IBM and IBM Linux cluster customers only." That's kind of annoying. :)

25 Jun 2002 03:53 kisses

Re: xCAT

Unfortunately, xCAT isn't registered
here on Freshmeat, so I did not include
it in the review, but yes, I've seen of it, xCAT looks
pretty cool.

16 Aug 2002 19:54 Avatar josevnz

What about SNMP monitoring tools?

First, thanks for the article, i'll take some ideas from it :)

I saw that SNMP is not mentioned in any place on the article; Even if is painfull to configure (and sometimes buggy) is a common standard and if you have equipment (like Nokia Firewalls, Cisco Switches, routers and hubs) there is a high chance than they speak SNMP (thus can be monitored).

Some opensource cool tools (SNMP & NMS):

Net-SNMP (open source SNMP agent):
net-snmp.sourceforge.net

Netsaint (cool NMS and extendable monitoring tool, very flexible):
www.netsaint.org

OpenNMS (java based, scalable, good arquitecture):
www.opennms.org

19 Nov 2002 17:01 rsug

radmind
Anyone using radmind to manage the filesystems of computing clusters? We use it to manage about a hundred Solaris machines, in addition to around 800 Mac OS X machines.

:w

24 Mar 2012 09:24 keratintreatment2012

looks pretty cool.

Screenshot

Project Spotlight

Ctalk

A language that adds classes, methods, and other object oriented features to C.

Screenshot

Project Spotlight

DbWrench

Cross-platform database design software.