Articles / Tired of fscking? Try a jou…

Tired of fscking? Try a journaling filesystem!

One of the most-anticipated of recent Linux developments is the availability of journaling filesystems. In today's editorial, Philipp Tomsich provides an overview of the alternatives and his thoughts on which you should consider using, depending on your needs.

Journaling filesystems

Waiting for a fsck to complete on a server system can tax your patience more than it should. Fortunately, a new breed of filesystem is coming to your Linux machine soon. Journaling filesystems maintain a special file called a log (or journal), the contents of which are not cached. Whenever the filesystem is updated, a record describing the transaction is added to the log. An idle thread processes these transactions, writes data to the filesystem, and flags each processed transaction as completed. If the machine crashes, the background process is run on reboot and simply finishes copying updates from the journal to the filesystem. Incomplete transactions in the journal file are discarded, so the filesystem's internal consistency is guaranteed.

This cuts the complexity of a filesystem check by a couple of orders of magnitude. A full-blown consistency check is never necessary (in contrast to ext2fs and similar filesystems) and restoring a filesystem after a reboot is a matter of seconds at most.

The players

Today, at least four major players exist in the Linux journaling filesystem arena. They are in various stages of completion, with some of them becoming ready for use in production systems. They are:

Each offers distinct advantages. A detailed technical comparison is available from issue 55 of Linux Gazette.

Most of the available options provide support for dynamically extending the filesystems using a logical volume manager (such as LVM), which makes them perfect for large server installations.

ReiserFS

ReiserFS is a radical departure from the traditional Unix filesystems, which are block-structured. It will be available in the upcoming Red Hat 7.1 distribution and is already available in SuSE Linux 7.0.

Hans Reiser writes about the filesystem he designed: "In my approach, I store both files and filenames in a balanced tree, with small files, directory entries, inodes, and the tail ends of large files all being more efficiently packed as a result of relaxing the requirements of block alignment and eliminating the use of a fixed space allocation for inodes." The effect is that a wide array of common operations, such a filename resolution and file accesses, are optimized when compared to traditional filesystems such as ext2fs. Furthermore, optimizations for small files are well developed, reducing storage overheads due to fragmentation.

ReiserFS is not yet a true journaling filesystem (although full journaling support is currently under development). Instead, buffering and preserve lists are used to track all tree modifications, which achieves a very similar effect. This reduces the risk of filesystem inconsistencies in the event of a crash and thus provides rapid recovery on restart.

Beside offering rapid restart capability after a crash and efficient storage of large numbers of small files, it is the developers' intention to offer facilities to store objects much smaller than those that are normally saved as separate files. Future design plans include adding set-theoretic semantics, making it possible to retrieve files by specifying their attributes instead of an explicit pathname.

ReiserFS was the first of this new breed that managed to be included in the standard Linux kernel distribution, giving it a head start in building a user community.

XFS/Linux

When SGI needed a high performance and scalable filesystem to replace EFS in 1990, it developed XFS to handle the demands of increased disk capacity and bandwidth, and parallelism with new applications such as film, video, and large databases. These demands included extremely fast crash recovery, support for large filesystems, directories with large numbers of files, and fair performance with small and large files. Now SGI is contributing this technology to the Open Source community and is in the process of finalizing its port to Linux.

Technically, XFS is based on the use of B+ trees (similar to the use of balanced trees in ReiserFS) to replace the conventional linear file system structure. B+ trees provide an efficient way to index directory entries and manage file extents, free space, and filesystem metadata. This guarantees quick directory listing and file accesses. The allocation of disk blocks to inodes is done dynamically, which means that you no longer need to create a filesystem with smaller block sizes for your mail server; your filesystem will handle this automatically for you. XFS is also a 64-bit filesystem, which theoretically allows the creation of files that are a few million terabytes in size, which compares favorably to the limitations of 32-bit filesystems. The ability to attach free-form metadata tags to files on an XFS volume is yet another useful feature of this filesystem.

XFS also contains good support for multiprocessor machines. This is visible in the implementation of the page buffer subsystem, which uses an AVL tree which is kept separate from the objects to avoid locking problems and cache thrashing on larger SMP systems. Multithreaded operation has been a declared design goal of this filesystem and has been well tested in large multiprocessor IRIX systems worldwide.

The Linux port is still undergoing development and some features are still to be finalized. For example, loop-mounting a file containing an XFS volume will not work without problems, yet. The X/Open data management API provided on IRIX is still incomplete in the Linux port and guaranteed rate I/O is also an IRIX exclusive, so far. Even now, XFS is more than just a viable alternative on Linux. I've personally used it for a few months on my own systems and have been very happy with its performance, which is at least on a par with ext2fs. Now that an installable CD image (based of the first CD of the Red Hat 7.0 distribution) is available for download, it will be even easier to enjoy the benefits of this filesystem. The user-level tools for filesystem creation, maintenance, and resizing are more functional and easier to use than their ReiserFS counterparts, which mostly stems from the fact that they have been around for a far longer time.

So why should one switch to XFS/Linux if ReiserFS will be readily available in Red Hat 7.1 and SuSE 7.0 (even though it will be a while until it is equally well integrated into and supported by the major distributions)? The main factor is trust, robustness, and maturity... XFS has been deployed on IRIX systems since 1994 and been used in a wide array of mission-critical applications. It's a proven technology, while ReiserFS and ext3fs are relatively new without offering too much new functionality.

JFS

IBM's JFS is a journaling filesystem used in its enterprise servers. It was designed for "high-throughput server environments, key to running intranet and other high-performance e-business file servers" according to IBM's Web site. Judging from the documentation available and the source drops, it will still be a while before the Linux port is completed and included in the standard kernel distribution.

JFS offers a sound design foundation and a proven track record on IBM servers. It uses an interesting approach to organizing free blocks by structuring them in a tree and using a special technique to collect and group continuous groups of free logical blocks. Although it uses extents for a file's block addressing, free space is therefore not used to maintain the free space. Small directories are supported in an optimized fashion (i.e., stored directly within an inode), although with different limitations than those of XFS. However, small files cannot be stored directly within an inode.

The port of JFS is an interesting project and will benefit the Linux community. However, it seems to be farther from being usable for production systems than its competitors.

ext3fs

ext3fs is an alternative for all those who do not want to switch their filesystem, but require journaling capabilities. It is distributed in the form of a kernel patch and provides full backward compatibility. It also allows the conversion of an ext2fs partition without reformatting and a reverse conversion to ext2fs, if desired.

However, using such an add-on to ext2fs has the drawback that none of the advanced optimization techniques employed in the other journaling filesystems is available: no balanced trees, no extents for free space, etc.

My personal opinion on ext3fs is that it is about to meet its fate with the availability of more powerful journaling filesystems. A handful of successful sites, such as RPMFind use this filesystem, but it lacks the momentum that the others have.

Conclusion

With the increasing size of hard disks, journaling filesystems are becoming important to an ever-increasing number of users. If you ever waited for a filesystem check on a machine with an 80GB hard disk, you know what I'm talking about. Even if you do not plan to reboot your system often, they can save you a lot of time and trouble if you experience a power failure or a hardware glitch. With the large number of contenders striving to become the de-facto standard in the journaling filesystem space on Linux, we can look forward to interesting months as these filesystems' code bases mature, are integrated into the standard kernel, and are supported in upcoming releases of the major Linux distributions.

However, keep in mind that migrating to another filesystem is not a trivial task. It usually requires backing up your data, reformatting, and restoring the data onto the newly created volume. You should thoroughly evaluate your options before making the switch.

Recent comments

10 Aug 2001 10:12 Avatar dgunia

Re: ReiserFS Availabilty - production systems

I think reiserfs has still some errors. I have here a reproduceable problem: When I convert a rpm package of the commercial software sniff++ by using alien to a debian package the computer slows down more and more. I managed to shutdown the system because working was not possible any more and it could not unmount the filesystem. After a reboot I had some files in my /tmp directory I could not delete any more (I was root and had all rights, it was definitly a file system problem). I tried this on my notebook and my desktop pc and had both times these problems and could not convert this package. This was both on reiserfs.

Then I changed my filesystem to XFS and tried it again (or on my desktop-pc tried to convert it on a partition with XFS) and I had no problems!


So there IS an error in reiserfs.

Right now we are trying to rescue 50GB of data that were on a reiserfs partition that got some bad sectors. reiserfsck says everything is ok, but when one tries to mount this partition, one gets an "Oops" and the mount process hangs.


And we had more of these problems on different computers. So now we will try XFS, both of my computers already use it as root file system and I have no problems yet :)

01 Aug 2001 18:08 Avatar hodeleri

Tired of fscking? Try FFS with softupdates

Well, ok, so it doesn't compeletely remove fscking, but your disks can be brought up immediately after a crash and checked in the background.

For more information, see Kirk McKusick's (the author's) site here (http://www.mckusick.com/softdep/), a paper on softupdates (http://www.usenix.org/publications/library/proceedings/usenix2000/general/seltzer.html) is also available.

(Available on *BSD)

05 May 2001 20:52 Avatar tsikora

Re: ReiserFS not yet ready for prime time, ext3fs "works for me"
No problems here. Been running ReiserFS on Slack-current
since 2.2.14 with not one glitch. In fact I have found more problems on FFS/softupdates in FreeBSD than Linux. (actually only twice) I have it installed on a bunch of Slack production servers. I highly recommend it. It's the best thing that has happened to Linux in a while. Thanks Hans.

12 Mar 2001 08:19 Avatar stic

reiser on large partitions is a great relief
I use reiser since 9 months on kernel 2.2.14 and got no reason to complain.

Mostly my /reiser has to store lots of medium sized files (100k) which

caused a terribly long fsck when on ext2. reiser solved that problem for me.

I cannot confirm the alleged storage economy for small files, though.

One of my applications generates files with only a few hundred bytes

of content. 20000 of them contain 5 MB (= 250 byte average) but du -sk

says they occupied 80 MB of disk space (= 4 Kbyte average).

Maybe i misconfigured (but what ?) or du and ReiserFs are badly coordinated

on my SuSE 6.4 ... shrug, it's not a big drawback for me.

28 Feb 2001 14:49 Avatar jdanield

reiser
I use reiser from suse 6.4 on (approx a year now), and am very satisfied. I note that erasing a bunch of files is far more fast than with ext2.

however there are problems booting from reiser and I prefere keeping a small /boot partition with ext2.

one must notice that there are no windows utility to read reiserfs. It's sometimes a problem, but can also be a good thing.

jdd

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.