Deduplicator is a simple and efficient data deduplicator that works by hard linking files that have the same content. It is ideal for reducing the size of backups. It can save and restore intermediate results, so you can run it in a few short intervals, and allows you to review changes before they are committed to disk.
Gfarm is a distributed filesystem, generally used for large scale cluster computing. It's implemented in userland, and can be mounted by FUSE. It utilizes locality of a file to access a data node, and supports Globus GSI for Wide Area Network. Users can explicitly control file replica location on Gfarm. Gfarm can be used as an alternative storage system to HDFS for Hadoop, Samba, MPI-IO, and GridFTP. Monitoring via ZABBIX and Ganglia is also supported.
fswalker is an indexer and query tool for large filesystems. On large filesystems it is impossible to run tools such as du and obtain results in a reasonable time. fswalker crawls over a filesystem and populates a SQLite database containing information about each file. The fsq utility can then be used to query the database and obtain information much faster. It is intended that fswalk be run in a periodic manner so the sysadmin can monitor changes in the filesystem and produce reports.
Hgfs is a read-only filesystem interface to Mercurial repositories. The interface gives access to the commit message, manifest, and files of each revision, and to .tgz's of each revision (the .tgz's are generated as they are read). The filesystem is a front-end for the Mercurial library that comes with it. All code is written in Limbo, for Inferno.