Wednesday, August 12, 2015
I recently completed another sorted dictionary comparison, and thought I'd share the results.
This time I've eliminated the different mixes of get/set. It's all 95% set and 5% get now.
Also, I added sorteddict, which proved to be an excellent performer.
And I added standard deviation to the graph and collapsible detail.
The latest comparison can be found here.
I recently saw an announcement of fdupes on linuxtoday.
Upon investigating it a bit, I noticed that it uses almost exactly the same algorithm as my equivs3e program.
Both are intended to find duplicate files in a filesystem, quickly.
The main difference seems to be that fdupes is in C, and equivs3e is in Python. Also, fdupes accepts a directory in argv (like tar), while equivs3e expects to have "find /directory -type f -print0" piped into it (like cpio).
However, upon doing a quick performance comparison, it turns out that fdupes is quite a bit faster on large collections of small files, and equivs3e is quite a bit faster on collections of large files. I really don't know why the python code is sometimes outperforming the C code, given that they're so similar internally.
I've added a "related work" section on my equivs3e page that compares equivs3e and fdupes.
Anyway, I hope people find one or both of these programs useful.