Last summer I wrote a post
comparing the performance of Numba and Cython
for optimizing array-based computation. Since posting, the page has received thousands of hits,
and resulted in a number of interesting discussions.
But in the meantime, the Numba package has come a long way both in its interface and its
Here I want to revisit those timing comparisons with a more recent Numba release, using the newer
and more convenient
autojit syntax, and also add in a few additional benchmarks for
completeness. I've also written this post entirely within an IPython notebook, so it can be
easily downloaded and modified.
As before, I'll use a pairwise distance function. This will take an array representing
M points in
N dimensions, and return the
M x M matrix of pairwise distances.
This is a nice test function for a few reasons. First of all, it's a very clean and
well-defined test. Second of all, it illustrates the kind of array-based operation that
is common in statistics, datamining, and machine learning. Third, it is a function that
results in large memory consumption if the standard numpy broadcasting approach is used
(it requires a temporary array containing
M * M * N elements), making it a good
candidate for an alternate approach.