RSS 4 projects tagged "Deduplication"

No download Website Updated 26 Sep 2011 BlackHole

Screenshot
Pop 92.81
Vit 1.43

BlackHole is an data de-duplicating network block device that also supports mirroring, snapshots, and support for multiple LUNs using the same data store. It is filesystem agnostic and has been tested with ext2/3/4, NTFS, ReiserFS, and the Oracle Cluster File System (OCFS2). It supports encryption, compression, and multiple storage backends. The hashing scheme used is user configurable. The program exports an NBD device which can be mounted in Linux and GNU/Hurd.

No download No website Updated 16 Feb 2014 Duke

Screenshot
Pop 236.45
Vit 12.29

Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.

Download No website Updated 02 Mar 2014 Pcompress

Screenshot
Pop 368.74
Vit 16.56

Pcompress is an archiver that can do compression/decompression and deduplication in parallel by splitting input data into chunks. It has a modular structure and includes support for multiple algorithms like LZMA, Bzip2, PPMD, LZ4, etc., with KECCAK/BLAKE2/SHA-256/512 chunk checksums. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features based on a Polynomial Fingerprinting scheme. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It has AES encryption capability and uses Scrypt from Tarsnap to generate per-session unique keys from passwords. It can work in pipe mode, reading from stdin and writing to stdout. It also provides some adaptive compression modes in which a suitable algorithm is chosen per chunk based on heuristics.

No download No website Updated 12 Feb 2012 image-deduplication-tool

Screenshot
Pop 40.66
Vit 28.26

image-deduplication-tool is a script designed to scan specified paths and calculate the DCT hashes of all the images there. It compares the hashes to find closest-looking image pairs, despite various alternations (such as crop, rotation, gamma/color correction, noise, etc.), optionally presenting them in a feh image viewer for the operator to easily compare and remove one of the versions. It uses libpHash to produce and compare perceptual hashes.

Screenshot

Project Spotlight

Piggydb

A knowledge creation system.

Screenshot

Project Spotlight

RosarioSIS

A Student Information System.