Dynamic Programming in Python: Bayesian Blocks

Of all the programming styles I have learned, dynamic programming is perhaps the most beautiful. It can take problems that, at first glance, look ugly and intractable, and solve the problem with clean, concise code. Where a simplistic algorithm might accomplish something by brute force, dynamic programming steps back, breaks the task into a smaller set of sequential parts, and then proceeds in the most efficient way possible.

Bayesian Blocks

I'll go through an example here where the ideas of dynamic programming are vital to some very cool data analysis resuts. This post draws heavily from a recent paper by Jeff Scargle and collaborators (this is the Scargle of Lomb-Scargle Periodogram fame), as well as some conversations I had with Jeff at Astroinformatics 2012. The paper discusses a framework called Bayesian Blocks, which is essentially a method of creating histograms with bin sizes that adapt to the data (there's a bit more to it than that: here we'll focus on histograms for simplicity).

Quantum Python: Animating the Schrodinger Equation

Update: a reader contributed some improvements to the Python code presented below. Please see the pySchrodinger github repository for updated code

In a previous post I explored the new animation capabilities of the latest matplotlib release. It got me wondering whether it would be possible to simulate more complicated physical systems in real time in python. Quantum Mechanics was the first thing that came to mind. It turns out that by mixing a bit of Physics knowledge with a bit of computing knowledge, it's quite straightforward to simulate and animate a simple quantum mechanical system with python.

The Schrodinger Equation

The dynamics of a one-dimensional quantum system are governed by the time-dependent Schrodinger equation:

$$ i\hbar\frac{\partial \psi}{\partial t} = \frac{-\hbar^2}{2m} \frac{\partial^2 \psi}{\partial x^2} + V \psi $$

Numba vs Cython

For a more up-to-date comparison of Numba and Cython, see the newer post on this subject.

Often I'll tell people that I use python for computational analysis, and they look at me inquisitively. "Isn't python pretty slow?" They have a point. Python is an interpreted language, and as such cannot natively perform many operations as quickly as a compiled language such as C or Fortran. There is also the issue of the oft-misunderstood and much-maligned GIL, which calls into question python's ability to allow true parallel computing.

Many solutions have been proposed: PyPy is a much faster version of the core python language; numexpr provides optimized performance on certain classes of operations from within python; weave allows inline inclusion of compiled C/C++ code; cython provides extra markup that allows python and/or python-like code to be compiled into C for fast operations. But a naysayer might point out: many of these "python" solutions in practice are not really python at all, but clever hacks into Fortran or C.

Matplotlib Animation Tutorial

Matplotlib version 1.1 added some tools for creating animations which are really slick. You can find some good example animations on the matplotlib examples page. I thought I'd share here some of the things I've learned when playing around with these tools.

Basic Animation

The animation tools center around the matplotlib.animation.Animation base class, which provides a framework around which the animation functionality is built. The main interfaces are TimedAnimation and FuncAnimation, which you can read more about in the documentation. Here I'll explore using the FuncAnimation tool, which I have found to be the most useful.

Memoryview Benchmarks 2

In the previous post, I explored how cython typed memoryviews can be used to speed up repeated array operations. It became clear that typed memoryviews are superior to the ndarray syntax for slicing, and as fast as raw pointers for single element access. In the comments, Mathieu brought up an interesting question: is the ndarray syntax as good as typed memoryviews if you're not doing slicing?

The answer turns out to be yes, unless the compiler tries to inline your function.

Memoryview Benchmarks

There was recently a thread on cython-users which caught my eye. It has to do with memoryviews, a new way of working with memory buffers in cython.

I've been thinking recently about how to do fast and flexible memory buffer access in cython. I contributed the BallTree implementation for nearest neighbors searching in scikit-learn, and have been actively thinking about how to make it faster and more flexible, including adding the ability to specify distance metrics other than euclidean and minkowski.

In order to accomplish this, I'd like to have a set of distance metric functions which take two vectors and compute a distance. There would be many functions with similar call signatures which could then be plugged into a code that would iterate over a set of vectors and compute the appropriate distances.