6 projects tagged "duplicate files"
Freedup finds and eliminates duplicate files by linking them, and thus reduces the amount of used disk space within one or more file systems. By default, hardlinks are used on a single device, symbolic links when the devices differ. A set of options allows you to modify the methods of file comparison, the hash functions, the linking behavior, and the reporting style. You may use batch or interactive mode. Freedup usually only considers identical files, but when comparing audio or graphics files, you may elect to ignore the tags.
HashCatalog is a program that can find duplicate files in one or more folders or between two lists of one or more folders. HashCatalog supports a regular expression mask to select files to be evaluated for duplicates. HashCatalog will always recurse directories in search of files. HashCatalog can also create an XML file containing a listing of files, along with enough information to be able to determine duplicates. This XML file can be used to allow searches for duplicates against removable media. The catalog can be supplied to the list of search locations, along with individual files and folders. The hash methods supported are: MD5, SHA, SHA1, SHA256, SHA384, and SHA512.
image-deduplication-tool is a script designed to scan specified paths and calculate the DCT hashes of all the images there. It compares the hashes to find closest-looking image pairs, despite various alternations (such as crop, rotation, gamma/color correction, noise, etc.), optionally presenting them in a feh image viewer for the operator to easily compare and remove one of the versions. It uses libpHash to produce and compare perceptual hashes.