RSS Comments for arch revision control system

17 Dec 2003 21:49 cduffy Thumbs up

Re: I'm not quite sure I buy that.
Pau,

I certainly agree that with advanced usage (such as trimming patch logs), space-efficiency concerns wrt arch can be reduced. Whether it's the application's fault or the filesystem's is irrelevant -- to a new user on a common-case (ie. ext3) system who's accustomed to revision control systems designed with space efficiency as a higher-priority goal than a corruption-resistant repository format, arch *will* be relatively space-inefficient. (Consider the following pathological case: in CVS, a 1-line patch means at most a few hundred bytes' increase to a preexisting ,v file; in arch, it's represented by at least two new files, each of which the filesystem typically treats as at 4k in size).

Because of this, I'm uncomfortable with making space efficiency a selling point to newbies -- because if they do a quick import of a typical CVS project into arch (without bothering with the advanced-usage functionality) and discover that it actually takes *more* space, then they're likely to doubt the other, more defensible, claims made.

My issue here isn't that I think arch is a space hog -- far from it. However, I think that it's a very dangerous proposition to promise potential users that they'll use "only a fraction of the space" they're accustomed to, when there are far more compelling elements to arch's value proposition available.

17 Dec 2003 00:18 linux4all Thumbs up

Re: I'm not quite sure I buy that.

>
> % arch is now not only the most
> flexible
> % and distributed SCM system, but the
> % fastest and more space efficient too.
> % You'll use only a fraction of the
> space
> % you need with other tools but without
> % lose of speed ;)
>
>
> tla 1.1 is certainly much faster than
> prior versions of tla, and there are
> many operations which it certainly does
> much faster than other tools. That said,
> I'm not sure I'd agree to "fastest"
> without hard numbers including
> comparisons of competitors (that is, svn
> and bk) --

In usual operations like get (checkout) and changes (diff -R), if you have a working tree (a project) hardlinked to the previous revision, it's almost impossible to be faster than arch. I don't say you can't be, but neither bk nor svn architectures allow this kind of optimization. Even without hardlinks, the inode signatures of each file allow for lightening comparaisons. There's room for improvement in certain tagging methods like explicit tagging, but the current status is certinly very competitive.
%
> and I certainly don't agree
> with arch using a minimal amount of disk
> space, at least if one is using a
> filesystem which penalizes having a
> large number of small files --

ext3 has these kind of problems with all the small files, this is not an arch issue. I see a problem if you use explicit tagging because you double the number of files (one file to store each id), but this is surmountable and one of the points to improve during 1.2 development. I think the ext3 gourp is working in improving ths situation; BTW, reiserfs does not suffer from it.
%
> and if
> one has very large revision libraries
> (ie. the complete gcc tree history),
> it's also for the better to be running a
> filesystem without a fixed maximum inode
%

With the new sparse library revisions you do not need to keep all the revisions in the library. That's a big win because you usually don't need the ancient revisions but only the recent ones. Now it's up to you to decide wich ones you do need. And if you choose it, whenever you need a revision (i.e. to diff against), it will be automatically added. Moreover, you can have libraires in multiple locations, a very useful feature when you work in different devices or computers or when you want to share libraries in a development group.

One more plus: with tla-1.1 you won't have more pristine trees (those are full copies of a revision) that took a lot of space. They have been replaced by spares revision libraries (these are the recommendations, in fact they are there but I'm sure they'll dissappear in a couple of releases). Revison libraries are much more useful because they are hard linked among them (les space, more efficiency), and shared (pristines belong to a working tree and, although you searched for them in the slibing directories, taht was only a good hack now luckily overcome).

> Keep in mind that many of Arch's design
> decisions emphasize robustness and
> simplicity over localized "common-case"
> optimizations which make presumptions
> about intended use-cases, with Tom's
> observations wrt how such optimizations
> are inappropriate given the state of
> modern computing hardware as the usual
> backing evidence. I certainly agree that
> this focus is The Right Thing to do, but
> it also means that arch isn't really
> optimized for tiny disk usage (at the
> expense of other things) in the same way
> that some of competing systems are.

I don't agree with you in this point. Revisions are fully optimized caches for revision control use. They are like working trees that share all the not-changed inodes with the ancestor revision. You can even hardlink your working tree to a cached revision, making it stil more effective. Caches should improve speed and, in this case, they do it with the maximum optimization.

The archive is not stored in a fully optimized binary format, but, as you say, this is a good decision (ext3 will catch up some day). It allows to have arch repositories withut special servers and retrieve the information easyly if you wanted to ove to another system. A different backend server (svn, for instance) could be plugged into arch withou a great pain, as stated by Tom some times. The space trade off is worth the simplicity and openness.

> Observe recent posts on arch-users
> regarding the cumulative disk usage of
> patch logs for an example of a case
> where arch uses more disk resources than
> might otherwise be the case. Is it an
> appropriate design decision for a modern
> revision control system, given the
> constraints within which it was
> designed? Absolutely. Does it make arch
> an optimally space-efficient revision
> control system? No, not really.

When you put a project under revision control, you should plan for a good storage for the repository. If you have hundreds of changes, some space will be used. There are tricks to cut those numbes down if you suffer from that syndrome: recycle your archives from time to time, or create new tagged branches, and store a cached revision in the archive so that you do not need to keep the older pachlogs around. The solution is alredy there and I doubt you can have the same kind of optimization with other SCM tools.

16 Dec 2003 22:29 cduffy Thumbs up

I'm not quite sure I buy that.

> arch is now not only the most flexible
> and distributed SCM system, but the
> fastest and more space efficient too.
> You'll use only a fraction of the space
> you need with other tools but without
> lose of speed ;)

tla 1.1 is certainly much faster than prior versions of tla, and there are many operations which it certainly does much faster than other tools. That said, I'm not sure I'd agree to "fastest" without hard numbers including comparisons of competitors (that is, svn and bk) -- and I certainly don't agree with arch using a minimal amount of disk space, at least if one is using a filesystem which penalizes having a large number of small files -- and if one has very large revision libraries (ie. the complete gcc tree history), it's also for the better to be running a filesystem without a fixed maximum inode count.

Keep in mind that many of Arch's design decisions emphasize robustness and simplicity over localized "common-case" optimizations which make presumptions about intended use-cases, with Tom's observations wrt how such optimizations are inappropriate given the state of modern computing hardware as the usual backing evidence. I certainly agree that this focus is The Right Thing to do, but it also means that arch isn't really optimized for tiny disk usage (at the expense of other things) in the same way that some of competing systems are.

Observe recent posts on arch-users regarding the cumulative disk usage of patch logs for an example of a case where arch uses more disk resources than might otherwise be the case. Is it an appropriate design decision for a modern revision control system, given the constraints within which it was designed? Absolutely. Does it make arch an optimally space-efficient revision control system? No, not really.

16 Dec 2003 14:00 linux4all Thumbs up

It's time to move to arch and ofrget about CVS, bk and co
arch is now not only the most flexible and distributed SCM system, but the fastest and more space efficient too. You'll use only a fraction of the space you need with other tools but without lose of speed ;)

Come on what are you waiting for?

25 Nov 2003 22:36 linux4all Thumbs up

improving as good wine
tla is geting better and better.
All the problems that arised to manage big source trees have been addressed, usability issues solved, friendlyness of commands taken care of... what else can I say, it's what you should be using to control your software releases, mainly if you wnat a distributed model.

02 Oct 2003 15:13 linux4all Thumbs up

it's lightening fast!
If you haven't tried it yet, go on! CVS and co are history.

26 Jun 2003 04:35 linux4all Thumbs up

RIP CVS, long life to arch!
If you use CVS and are fed up of its limitations, restrictions and annoying workarounds, just bury it and use arch.

Things are damn easier to do, no need of central repository, complex merges are done with just one instruction... too much to say in just one comment.

Read the tutorial, try it and you'll see. And no, it's no Subversion, it's much easier and, by now, powerful.

Screenshot

Project Spotlight

cryptmount

A tool for setup and on-demand mounting of encrypted filesystems under Linux.

Screenshot

Project Spotlight

KDE-Services

Extensions providing additional features for Dolphin's right click menu in KDE.