Sparse SVDs in Python

After Fabian's post on the topic, I have recently returned to thinking about the subject of sparse singular value decompositions (SVDs) in Python.

For those who haven't used it, the SVD is an extremely powerful technique. It is the core routine of many applications, from filtering to dimensionality reduction to graph analysis to supervised classification and much, much more.

I first came across the need for a fast sparse SVD when applying a technique called Locally Linear Embedding (LLE) to astronomy spectra: it was the first astronomy paper I published, and you can read it here. In LLE, one visualizes the nonlinear relationship between high-dimensional observations. The computational cost is extreme: for N objects, one must compute the null space (intimately related to the SVD) of a N by N matrix. Using direct methods (e.g. LAPACK), this can scale as bad as $\mathcal{O}[N^3]$ in both memory and speed!

Minesweeper in Matplotlib

Lately I've been playing around with interactivity in matplotlib. A couple weeks ago, I discussed briefly how to use event callbacks to implement simple 3D visualization and later used this as a base for creating a working 3D Rubik's cube entirely in matplotlib.

Today I have a different goal: re-create minesweeper, that ubiquitous single-player puzzle game that most of us will admit to having binged on at least once or twice in their lives. In minesweeper, the goal is to discover and avoid hidden mines within a gridded minefield, and the process takes some logic and quick thinking.

A Primer on Python Metaclasses

Most readers are aware that Python is an object-oriented language. By object-oriented, we mean that Python can define classes, which bundle data and functionality into one entity. For example, we may create a class IntContainer which stores an integer and allows certain operations to be performed:

def __init__(self, i): self.i = int(i) def add_one(self): self.i += 1 ic.add_one() print(ic.i)
3

This is a bit of a silly example, but shows the fundamental nature of classes: their ability to bundle data and operations into a single object, which leads to cleaner, more manageable, and more adaptable code. Additionally, classes can inherit properties from parents and add or specialize attributes and methods. This object-oriented approach to programming can be very intuitive and powerful.

What many do not realize, though, is that quite literally everything in the Python language is an object.

Quaternions and Key Bindings: Simple 3D Visualization in Matplotlib

Matplotlib is a powerful framework, but its 3D capabilities still have a lot of room to grow. The mplot3d toolkit allows for several kinds of 3D plotting, but the ability to create and rotate solid 3D objects is hindered by the inflexibility of the zorder attribute: because it is not updated when the view is rotated, things in the "back" will cover things in the "front", obscuring them and leading to very unnatural-looking results.

I decided to see if I could create a simple script that addresses this. Though it would be possible to use the built-in mplot3d architecture to take care of rotating and projecting the points, I decided to do it from scratch for the sake of my own education.

We'll step through it below: by the end of this post we will have created a 3D viewer in matplotlib which I think is quite nice.

Sparse Graphs in Python: Playing with Word Ladders

The recent 0.11 release of scipy includes several new features, one of which is the sparse graph submodule which I contributed, with help from other developers. I'm pretty excited about this: there are some classic algorithms implemented, and it will open up whole new realms of computational possibilities in Python.

Before we start, I should say: this post is based on a lightning talk I gave at Scipy 2012, and some of the material below comes from a tutorial I wrote for the scipy documentation.

XKCD-style plots in Matplotlib

Update: the matplotlib pull request has been merged! See This post for a description of the XKCD functionality now built-in to matplotlib!

One of the problems I've had with typical matplotlib figures is that everything in them is so precise, so perfect. For an example of what I mean, take a look at this figure:

Image('http://jakevdp.github.com/figures/xkcd_version.png')

Sometimes when showing schematic plots, this is the type of figure I want to display. But drawing it by hand is a pain: I'd rather just use matplotlib. The problem is, matplotlib is a bit too precise. Attempting to duplicate this figure in matplotlib leads to something like this:

Blogging with IPython in Octopress

A few weeks ago, Fernando Perez, the creator of IPython, wrote a post about blogging with IPython notebooks. I decided to take a stab at making this work in Octopress.

I started by following Fernando's outline: I first went to http://github.com/ipython/nbconvert and obtained the current version of the notebook converter. Running nbconvert.py -f blogger-html filename.ipynb produces a separate html and header file with the notebook content. I inserted the stylesheet info into my header (in octopress, the default location is source/_includes/custom/head.html) and copied the html directly into my post.

I immediately encountered a problem. nbconvert uses global CSS classes and style markups, and some of these (notably the "hightlight" class and the <pre> tag formatting) conflict with styles defined in my octopress theme. The result was that every post in my blog ended up looking like an ugly hybrid of octopress and an ipython notebook. Not very nice.

So I did some surgery. Admittedly, this is a terrible hack, but the following code takes the files output by nbconvert, slices them up, and creates a specific set of CSS classes for the notebook markup, such that there's no longer a conflict with the native octopress styles (you can download this script here):

Why Python is the Last Language You'll Have To Learn

This week, for part of a textbook I'm helping to write, I spent some time reading and researching the history of Python as a scientific computing tool. I had heard bits and pieces of this in the past, but it was fascinating to put it all together and learn about how all the individual contributions that have made Python what it is today. All of this got me thinking: for most of us, Python was a replacement for something: IDL, MatLab, Java, Mathematica, Perl... you name it. But what will replace Python? Ten years down the road, what language will people be espousing in blogs with awkwardly-alliterated titles? As I thought it through, I became more and more convinced that, at least in the scientific computing world, Python is here to stay.