Tuesday, May 19, 2015


Python compared to C++ - machine performance

Someone posted to LinkedIn's Python Professionals forum with the subject "Why Python is so slow?" and said Python is 10x slower than C++ (on one specific microbenchmark).  You can guess what the rest of it was like.

So:
  • even though machine efficiency rarely matters anymore
  • even though algorithm improvements are usually better than worrying about using the fastest language available (assembler anyone?  I used to love it, but not anymore)
  • even though microbenchmarks are poor indicators of overall performance
  • even though the innermost loop of a program is usually the only part that makes a difference (if any part at all) for performance
...I decided to go ahead and compare raw performance for almost the same problem, across 2 C++ compilers and a smattering of Pythons.

First off, we probably mostly know by now that Python != CPython anymore.

The code I used for this comparison can be viewed using a web browser or checked out using Subversion at http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk/ .

BTW, the OP used range in Python 2.x; this should of course be xrange.  range is fine in 3.x, but in 2.x it's awful.

Anyway, here are the results.  Lower numbers are better:
1.000000 ./g++-t
0.974000 ./clang++-t
4.993000 ./cython2-t
5.981000 ./cython3-t
4.341000 ./cython2_types_t
4.917000 ./cython3_types_t
4.850000 ./cpython2-t
5.563000 ./cpython3-t
5.567000 ./cpython2+numba-t
5.491000 ./cpython3+numba-t
1.957000 ./pypy-t
1.979000 ./pypy3-t
1.152000 ./shedskin-t


Some interesting things to note (all relative to g++ -O3) :
  • clang++ was slightly faster than g++ sometimes.  The two C++'s were practically the same.
  • Naive Cython was slower than CPython.
  • Typed Cython was faster than CPython.
  • CPython was around 4.9 and 5.6x slower on this particular microbenchmark, not 10x (but perhaps using the wrong range would've made it 10x, I didn't check that).
  • Numba was faster once, and slower once than CPython - but not by a lot in either case.
  • pypy was a touch less than 2x slower, but that's for pure python code!
  • shedskin was only a hair slower than the pure C++ compilers.
I hope that helps someone put things in machine-efficiency-context.

We perhaps should also measure how long it takes to debug undefined memory references in the two languages - that's developer efficiency, and it's usually more important than machine efficiency.  ^(^

2018-02-10: I updated this here.