Saturday, June 23, 2012

Python and tree datastructures: A performance comparison


At http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ I've put together a comparison of a few in-memory tree datastructures for Python.  I hope to add heaps to the comparison someday, hence the URL.

The trees have each been made to run on cpython 2.x, cpython 3.x, pypy and jython, and they are all passing pylint now.

I tossed out two of the datastructures I wanted to compare, for the following reasons:

  1. 2-3 trees were tossed because they were giving incorrect results.  This  may or may not be my fault.  I had to add a simple class to make them work like a dictionary.
  2. Red-Black trees were tossed because they were very slow with large trees, so much so that they were making the other datastructures in which red-black trees appeared hard to read - that is, the things with better performance were getting all squished together and kinda merging into one fat line.  The problem seemed to be related to garbage collection - perhaps the red-black tree implementation was creating and discarding lots of temporary variables or something.  This too may or may not be my fault.
The top three datastructures I continued to examine were:
  1. Splay trees - sometimes first in performance.
  2. AVL trees - sometimes first in performance.
  3. Treaps - sometimes first in performance, never worse than second in performance.



At http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ I've put together a comparison of a few in-memory tree datastructures for Python.  I hope to add heaps to the comparison someday, hence the URL.

The trees have each been made to run on cpython 2.x, cpython 3.x, pypy and jython, and they are all passing pylint now.

I tossed out two of the datastructures I wanted to compare, for the following reasons:

  1. 2-3 trees were tossed because they were giving incorrect results.  This  may or may not be my fault.  I had to add a simple class to make them work like a dictionary.
  2. Red-Black trees were tossed because they were very slow with large trees, so much so that they were making the other datastructures in which red-black trees appeared hard to read - that is, the things with better performance were getting all squished together and kinda merging into one fat line.  The problem seemed to be related to garbage collection - perhaps the red-black tree implementation was creating and discarding lots of temporary variables or something.  This too may or may not be my fault.
The top three datastructures I continued to examine were:
  1. Splay trees - sometimes first in performance.
  2. AVL trees - sometimes first in performance.
  3. Treaps - sometimes first in performance, never worse than second in performance.

Wednesday, June 6, 2012

Python Trees (and treap) evaluated

At http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ I've put together a comparison of a few in-memory tree datastructures for Python.  I hope to add heaps to the comparison someday, hence the URL.

The trees have each been made to run on cpython 2.x, cpython 3.x, pypy and jython, and they are all passing pylint now.

I tossed out two of the datastructures I wanted to compare, for the following reasons:

  1. 2-3 trees were tossed because they were giving incorrect results.  This  may or may not be my fault.  I had to add a simple class to make them work like a dictionary.
  2. Red-Black trees were tossed because they were very slow with large trees, so much so that they were making the other datastructures in which red-black trees appeared hard to read - that is, the things with better performance were getting all squished together and kinda merging into one fat line.  The problem seemed to be related to garbage collection - perhaps the red-black tree implementation was creating and discarding lots of temporary variables or something.  This too may or may not be my fault.
The top three datastructures I continued to examine were:
  1. Splay trees - sometimes first in performance.
  2. AVL trees - sometimes first in performance.
  3. Treaps - sometimes first in performance, never worse than second in performance.