Arkeia Network Backup is designed for organizations that require fast, easy-to-use, and affordable data protection. It backs up critical data to disk, tape, and cloud storage. Arkeia protects all major virtual platforms including VMware, Hyper-V, XenServer, and more than 200 physical platforms including Windows, Mac OS X, Linux, Netware, most UNIX flavors, and BSDs. The company’s source-side Progressive Deduplication technology helps users realize better performance at a lower cost by reducing data volumes. Arkeia’s deduplication is crucial to accelerating replication of on-premise backups to private or public clouds.
Attic is a deduplicating backup program. The main goal of attic is to provide an efficient and secure way to back up data. The data deduplication technique used makes Attic suitable for daily backups since only actual changes are stored. Main features: space efficient storage, optional data encryption, and off-site backups.
Pcompress is a utility to do compression/decompression and deduplication in parallel by splitting input data into chunks. It has a modular structure and includes support for multiple algorithms like LZMA, Bzip2, PPMD, LZ4, etc., with KECCAK/BLAKE2/SHA-256/512 chunk checksums. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features based on a Polynomial Fingerprinting scheme. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It has AES encryption capability and uses Scrypt from Tarsnap to generate per-session unique keys from passwords. It can work in pipe mode, reading from stdin and writing to stdout. It also provides some adaptive compression modes in which a suitable algorithm is chosen per chunk based on heuristics.
Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.
Lessfs is a high performance inline data deduplicating file system for Linux. Lessfs complies to the POSIX standard and is very useful for backup purposes as well as providing storage for virtual machine images. Although lessfs is a file system that is implemented in user space with FUSE, it offers decent performance. Lessfs is capable of handling data rates up to 350MB/sec. It supports filesystem encryption.
image-deduplication-tool is a script designed to scan specified paths and calculate the DCT hashes of all the images there. It compares the hashes to find closest-looking image pairs, despite various alternations (such as crop, rotation, gamma/color correction, noise, etc.), optionally presenting them in a feh image viewer for the operator to easily compare and remove one of the versions. It uses libpHash to produce and compare perceptual hashes.