At http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ I've put together a comparison of a few in-memory tree datastructures for Python. I hope to add heaps to the comparison someday, hence the URL.
The trees have each been made to run on cpython 2.x, cpython 3.x, pypy and jython, and they are all passing pylint now.
I tossed out two of the datastructures I wanted to compare, for the following reasons:
- 2-3 trees were tossed because they were giving incorrect results. This may or may not be my fault. I had to add a simple class to make them work like a dictionary.
- Red-Black trees were tossed because they were very slow with large trees, so much so that they were making the other datastructures in which red-black trees appeared hard to read - that is, the things with better performance were getting all squished together and kinda merging into one fat line. The problem seemed to be related to garbage collection - perhaps the red-black tree implementation was creating and discarding lots of temporary variables or something. This too may or may not be my fault.
The top three datastructures I continued to examine were:
- Splay trees - sometimes first in performance.
- AVL trees - sometimes first in performance.
- Treaps - sometimes first in performance, never worse than second in performance.