Friday, March 2, 2018
Tuesday, February 6, 2018
I've put a simple, Python 3.6 website dead link checker here.
You give it one or more URL's to search through, and one or more URL prefixes to mostly remain under, and it does the rest.
It's intended to be shell-callable, and can output CSV or JSON.
I hope people find it useful.
Saturday, January 13, 2018
I've put a Python 2.x / 3.x solution to the Alien Language Problem in my Subversion repo.
The rough idea is to take a list of sorted alien words (sorted in an alien order, even though they use the Roman alphabet), and to find what the order of that alphabet is.
Sunday, June 5, 2016
from-tableI've put from-table here.
It's a small python3 script that knows how to extract one or more HTML tables as CSV data. You can give it a URL or a file. It can extract to stdout or to a series of numbered filenames (one file per table).
I hope folks find it useful.
Sunday, October 18, 2015
- minimize storage requirements
- minimize bandwidth requirements
- emphasize parallel (concurrent backups of different computers) performance to some extent
- allow expiration of old data that is no longer needed
- backshift's variable-length, content-based blocking algorithm. This makes python inspect every byte of the backup, one byte at a time.
- backshift's use of xz compression. xz packs files very hard, reducing storage and bandwidth requirements, but it is known to be slower than something like gzip that doesn't compress as well.
Wednesday, August 12, 2015
I recently completed another sorted dictionary comparison, and thought I'd share the results.
This time I've eliminated the different mixes of get/set. It's all 95% set and 5% get now.
Also, I added sorteddict, which proved to be an excellent performer.
And I added standard deviation to the graph and collapsible detail.
The latest comparison can be found here.
I recently saw an announcement of fdupes on linuxtoday.
Upon investigating it a bit, I noticed that it uses almost exactly the same algorithm as my equivs3e program.
Both are intended to find duplicate files in a filesystem, quickly.
The main difference seems to be that fdupes is in C, and equivs3e is in Python. Also, fdupes accepts a directory in argv (like tar), while equivs3e expects to have "find /directory -type f -print0" piped into it (like cpio).
However, upon doing a quick performance comparison, it turns out that fdupes is quite a bit faster on large collections of small files, and equivs3e is quite a bit faster on collections of large files. I really don't know why the python code is sometimes outperforming the C code, given that they're so similar internally.
I've added a "related work" section on my equivs3e page that compares equivs3e and fdupes.
Anyway, I hope people find one or both of these programs useful.