upcoming mrsync version 2.0
A new feature has been added, tested and in production use.
Specifically, a network congestion control
is implemented, which makes multicaster more network
friendly. The resulting codes actually deliver data a little
faster than the codes in which the congestion control is turned off.
* Python script (mrsync.py replacing mrsync.c )
is used to set things up.
* rtt now measures the real round trip time distribution.
* IP address can be set in command line (per Robert Dack
* replacing memory mapped file IO with the usual
seek() and write() sequence.
This change was echoed by Clint
*adding verbose control so that by default mrsync prints
only essential info instead of detailed status report.
This was suggested by Clint Byrum
Currently, the codes are waiting for clearance from our
company before I can put them in Freshmeat.
If you want to take a look at them, drop me an email. :)
Re: Mrsync implementation
We did some more mods..
This patch takes mmap out and also reduces the output to a more suitable level when you need to see errors, but not status messages. ;)
> Overall the performance is not all that
> bad, sending 100GIG to about 300 nodes
> equipped with 100MB connections takes
> about 8 hours. When dropping to only a
> small number of nodes 15 GIG to 6-10
> nodes in 25minutes.
We're doing it with gigabit. We've seen a much better rate, though we're copying a lot less data. Also, our focus is on doing this in as short a time period as possible. We have 15 boxes on a gigabit LAN, and about 1.6GB to transfer. It takes 7 minutes no matter if we just go to 1 or 15 of them.
One thing that I did was patch multicatcher to not use mmap. We're transferring some larger files, and mmaping 500MB or more on all our nodes was not efficient for the other processes running on them. I've put the patch up here for anyone who might need it, until "HP" can incorporate it into the baseline mrsync:
We've been using Mrsync for about 1 year to move blocks of data around our clusters. I'm quite pleased with the utility, although it could use some additional features.
We found early on that it the multicaster needed to be on the same subnet as the recieving hosts. We adapted the code to allow a user defined interface for hosts which existed on more than one subnet for this purpose.
Currently, we find that the software needs the ability to allow mulitple multicasts to run simultaneously. This will require the multicast address and perhaps the port to change for each multicast running.
Overall the performance is not all that bad, sending 100GIG to about 300 nodes equipped with 100MB connections takes about 8 hours. When dropping to only a small number of nodes 15 GIG to 6-10 nodes in 25minutes.
A library to access Generic Tagged Arrays (GTA) files.
A collection of applications to harness the power of the Unix command line.