Wednesday, August 12, 2015

fdupes and equivs3e


I recently saw an announcement of fdupes on linuxtoday.

Upon investigating it a bit, I noticed that it uses almost exactly the same algorithm as my equivs3e program.

Both are intended to find duplicate files in a filesystem, quickly.

The main difference seems to be that fdupes is in C, and equivs3e is in Python.  Also, fdupes accepts a directory in argv (like tar), while equivs3e expects to have "find /directory -type f -print0" piped into it (like cpio).

However, upon doing a quick performance comparison, it turns out that fdupes is quite a bit faster on large collections of small files, and equivs3e is quite a bit faster on collections of large files.  I really don't know why the python code is sometimes outperforming the C code, given that they're so similar internally.

I've added a "related work" section on my equivs3e page that compares equivs3e and fdupes.

Anyway, I hope people find one or both of these programs useful.

No comments:

Post a Comment