fdupe is a small Perl script which recursively scans directories and finds any duplicate files. The script compares the file contents, not file names. It doesn't use any Perl modules and is very fast.
| Tags | Utilities tools |
|---|---|
| Licenses | Attribution-NonCommercial Creative Commons |
| Operating Systems | Mac OS X Unix Linux BSD Windows |
| Implementation | Perl |
Recent releases


Release Notes: A list of files can now be accepted from stdin.


Release Notes: This release skips checking of hard links. This significantly shortens the runtime on file systems containing many hard linked files.


No changes have been submitted for this release.
Recent comments
20 Oct 2011 23:32
sorry, wrong numbers :\
freundt@gonzo:pts/6:~> find /data/data-source/ice/archive/2011 -type f | wc -l
212374
freundt@gonzo:pts/6:~> du -sm /data/data-source/ice/archive/2011
20158
I moved a .git directory into it earlier and forgot to move it out of there before counting. Still, the times are guaranteed to be of the directory as it is now.
20 Oct 2011 23:26
It's an ftp mirror that is known to have tens of thousands of duplicates.
freundt@gonzo:pts/6:~> find /data/data-source/ice/archive/2011 -type f | wc -l
866949
freundt@gonzo:pts/6:~> du -sm /data/data-source/ice/archive/2011
49114
IMPORTANT: The directory is nfs4 and there's NO subdirectories.
20 Oct 2011 21:53
Thanks for the comment hropatyr,
unfortunately I can't reproduce your results.
Could anybody do some reproducible benchmarks?
20 Oct 2011 19:01
Disappointing, way too slow for practical purposes :(
I recommend fdupes (premium.caribe.net/~ad....
fdupes archive/2011 27.50s user 38.93s system 23% cpu 4:41.45 total
fdupe.pl archive/2011 259.65s user 289.21s system 49% cpu 18:31.07 total
A program to analyze your databases and check your data quality.