2 projects tagged "de-duplication"
S3QL is a file system that stores all its data online. It supports Amazon S3, Google Storage, and OpenStack and effectively provides you with a hard disk of dynamic, infinite capacity that can be accessed from any computer with Internet access. S3QL provides a standard, full featured Unix file system that is conceptually indistinguishable from any local file system. Additional features include compression, encryption, data de-duplication, immutable trees, and snapshotting, which make it especially suitable for online backup and archiving. The design favors simplicity and elegance over performance and feature-creep. Care has been taken to make the source code as readable and serviceable as possible. Solid error detection, error handling, and extensive automated test cases are provided.
Backshift is a deduplicating (variable-sized, content-based blocks), compressing (xz or bz2) backup program. Full saves and incrementals are pretty indistinct other than the amount of data transmitted, somewhat like with "rsync --link-dest" but without the huge number of hardlinks. It also de-duplicates large file content at a granularity of about 2 megabytes on average; there tends to be a unique copy of each file with size less than around 2 megabytes on average.