Wednesday, August 12, 2015
fdupes and equivs3e
I recently saw an announcement of fdupes on linuxtoday.
Upon investigating it a bit, I noticed that it uses almost exactly the same algorithm as my equivs3e program.
Both are intended to find duplicate files in a filesystem, quickly.
The main difference seems to be that fdupes is in C, and equivs3e is in Python. Also, fdupes accepts a directory in argv (like tar), while equivs3e expects to have "find /directory -type f -print0" piped into it (like cpio).
However, upon doing a quick performance comparison, it turns out that fdupes is quite a bit faster on large collections of small files, and equivs3e is quite a bit faster on collections of large files. I really don't know why the python code is sometimes outperforming the C code, given that they're so similar internally.
I've added a "related work" section on my equivs3e page that compares equivs3e and fdupes.
Anyway, I hope people find one or both of these programs useful.