Wednesday, August 12, 2015

Latest python sorted dictionary comparison


I recently completed another sorted dictionary comparison, and thought I'd share the results.

This time I've eliminated the different mixes of get/set.  It's all 95% set and 5% get now.

Also, I added sorteddict, which proved to be an excellent performer.

And I added standard deviation to the graph and collapsible detail.

The latest comparison can be found here.

HTH someone.

fdupes and equivs3e


I recently saw an announcement of fdupes on linuxtoday.

Upon investigating it a bit, I noticed that it uses almost exactly the same algorithm as my equivs3e program.

Both are intended to find duplicate files in a filesystem, quickly.

The main difference seems to be that fdupes is in C, and equivs3e is in Python.  Also, fdupes accepts a directory in argv (like tar), while equivs3e expects to have "find /directory -type f -print0" piped into it (like cpio).

However, upon doing a quick performance comparison, it turns out that fdupes is quite a bit faster on large collections of small files, and equivs3e is quite a bit faster on collections of large files.  I really don't know why the python code is sometimes outperforming the C code, given that they're so similar internally.

I've added a "related work" section on my equivs3e page that compares equivs3e and fdupes.

Anyway, I hope people find one or both of these programs useful.