Saturday, November 17, 2018

Python, Rust and C performance doing MD5

I put a performance comparison between Python, Rust and C doing MD5 calculations, here.

Interestingly, CPython and Pypy came out on top, even beating gcc and clang.

Granted, CPython and Pypy are probably calling the highly-optimized OpenSSL, but it's still noteworthy that sometimes Python can be pretty zippy.

Friday, March 2, 2018

The House Robber Problem

I've put a Genetic Algorithm-based solution to "The House Robber Problem" here.

The problem has us maximizing the value from houses robbed, subject to the constraint that no two adjacent houses can be robbed.

Tuesday, February 6, 2018

I've put a simple, Python 3.6 website dead link checker here.

You give it one or more URL's to search through, and one or more URL prefixes to mostly remain under, and it does the rest.

It's intended to be shell-callable, and can output CSV or JSON.

I hope people find it useful.

Saturday, January 13, 2018

A Python solution to the Alien Language Problem

I've put a Python 2.x / 3.x solution to the Alien Language Problem in my Subversion repo.

The rough idea is to take a list of sorted alien words (sorted in an alien order, even though they use the Roman alphabet), and to find what the order of that alphabet is.

Sunday, June 5, 2016


I've put from-table here.

It's a small python3 script that knows how to extract one or more HTML tables as CSV data.  You can give it a URL or a file.  It can extract to stdout or to a series of numbered filenames (one file per table).

I hope folks find it useful.

Sunday, October 18, 2015

Backshift not That slow, and for good reason

  • Backshift is a deduplicating backup program in Python.
  • At you can find a performance comparison between some backup applications.

  • The comparison did not compare backshift, because backshift was believed to have prohibitively slow deduplication.
  • Backshift is truly not a speed-demon. It is designed to:
    1. minimize storage requirements
    2. minimize bandwidth requirements
    3. emphasize parallel (concurrent backups of different computers) performance to some extent
    4. allow expiration of old data that is no longer needed
  • Also, it was almost certainly not backshift's deduplication that was slow, it was:
    1. backshift's variable-length, content-based blocking algorithm. This makes python inspect every byte of the backup, one byte at a time.
    2. backshift's use of xz compression. xz packs files very hard, reducing storage and bandwidth requirements, but it is known to be slower than something like gzip that doesn't compress as well.
  • Also, while the initial fullsave is slow, subsequent backups are much faster because they do not reblock or recompress any files that still have the same mtime and size as found in 1 of (up to) 3 previous backups.
  • Also, if you run backshift on Pypy, its variable-length, content-based blocking algorithm is many times faster than if you run it on CPython. Pypy is not only faster than CPython, it's also much faster than CPython augmented with Cython.
  • I sent G. P. E. Keeling an e-mail about this some time ago (the date of this writing is October 2015), but never received a response
  • Wednesday, August 12, 2015

    Latest python sorted dictionary comparison

    I recently completed another sorted dictionary comparison, and thought I'd share the results.

    This time I've eliminated the different mixes of get/set.  It's all 95% set and 5% get now.

    Also, I added sorteddict, which proved to be an excellent performer.

    And I added standard deviation to the graph and collapsible detail.

    The latest comparison can be found here.

    HTH someone.