This is a bit of a niche topic, but I figured there might be one or two people out there who would find this useful (including my future self)... today I managed to implement a simple Python object which exposes the buffer protocol. If that means nothing to you, you may want to stop reading and instead browse this gallery of puppy gifs.
But if you're the kind of person who becomes mildly excited at the words Python buffer protocol, I hope this short post will help you in your quest...
One of the first things a scientist hears about statistics is that there is are two different approaches: frequentism and Bayesianism. Despite their importance, many scientific researchers never have opportunity to learn the distinctions between them and the different practical approaches that result. The purpose of this post is to synthesize the philosophical and pragmatic aspects of the frequentist and Bayesian approaches, so that scientists like myself might be better prepared to understand the types of data analysis people do.
I'll start by addressing the philosophical distinctions between the views, and from there move to discussion of how these ideas are applied in practice, with some Python code snippets demonstrating the difference between the approaches.
It's been a few weeks since I introduced mpld3, a toolkit for visualizing matplotlib graphics in-browser via d3, and a lot of progress has been made. I've added a lot of features, and there have also been dozens of pull requests from other users which have made the package much more complete.
One thing I recognized early was the potential to use mpld3 to add interactive bells and whistles to matplotlib plots. Yesterday, at the #HackAAS event at the American Astronomical Society meeting, I had a solid chunk of time to work this out. The result is what I'll describe and demonstrate below: a general plugin framework for matplotlib plots, which, given some d3 skills, can be utilized to modify your matplotlib plots in almost any imaginable way.
There are many people pushing in this direction in the Python world. Bokeh and Plotly are new visualization packages which have built Python APIs from scratch. The demos are beautiful and impressive, and the APIs are clean and intuitive. But, because matplotlib is so well-established in the Python world, it would be nice to be able to continue using it even in the age of browser-based visualization. To this end, some of the matplotlib core devs have been working on a WebGL viewer for matplotlib figures. I've seen a working demo, and it's very cool, but last I heard it still has a long way to go.
I started to try things out, and over the course of a few late nights, came up with a first attempt at a partial interactive D3 viewer for matplotlib images: the result is the
mpld3 package, available on my GitHub page.
The inspiration of my previous kernel density estimation post was a blog post by Michael Lerner, who used my JSAnimation tools to do a nice interactive demo of the relationship between kernel density estimation and histograms.
This was on the heels of Brian Granger's excellent PyData NYC Keynote where he live-demoed the brand new IPython interactive tools. This new functionality is very cool. Wes McKinney remarked that day on Twitter that "IPython's interact machinery is going to be a huge deal". I completely agree: the ability to quickly generate interactive widgets to explore your data is going to change the way a lot of people do their daily scientific computing work.
But there's one issue with the new widget framework as it currently stands: unless you're connected to an IPython kernel (i.e. actually running IPython to view your notebook), the widgets are useless. Don't get me wrong: they're incredibly cool when you're actually interacting with data. But the bread-and-butter of this blog and many others is static notebook views: for this purpose, widgets with callbacks to the Python kernel aren't so helpful.
This is where
ipywidgets comes in.
There are several options available for computing kernel density estimates in Python. The question of the optimal KDE implementation for any situation, however, is not entirely straightforward, and depends a lot on what your particular goals are. Here are the four KDE implementations I'm aware of in the SciPy/Scikits stack:
Each has advantages and disadvantages, and each has its area of applicability. I'll start with a table summarizing the strengths and weaknesses of each, before discussing each feature in more detail and running some simple benchmarks to gauge their computational cost:
Regardless of what you might think of the ubiquity of the "Big Data" meme, it's clear that the growing size of datasets is changing the way we approach the world around us. This is true in fields from industry to government to media to academia and virtually everywhere in between. Our increasing abilities to gather, process, visualize, and learn from large datasets is helping to push the boundaries of our knowledge.
But where scientific research is concerned, this recently accelerated shift to data-centric science has a dark side, which boils down to this: the skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry. While academia, with typical inertia, gradually shifts to accommodate this, the rest of the world has already begun to embrace and reward these skills to a much greater degree. The unfortunate result is that some of the most promising upcoming researchers are finding no place for themselves in the academic community, while the for-profit world of industry stands by with deep pockets and open arms.
The goal of this post is to dive into the Cooley-Tukey FFT algorithm, explaining the symmetries that lead to it, and to show some straightforward Python implementations putting the theory into practice. My hope is that this exploration will give data scientists like myself a more complete picture of what's going on in the background of the algorithms we use.
By enforcing these rules in sequential steps, beautiful and unexpected patterns can appear.
I was thinking about classic problems that could be used to demonstrate the effectiveness of Python for computing and visualizing dynamic phenomena, and thought back to a high school course I took where we had an assignment to implement a Game Of Life computation in C++. If only I'd had access to IPython and associated tools back then, my homework assignment would have been a whole lot easier!
Here I'll use Python and NumPy to compute generational steps for the game of life, and use my JSAnimation package to animate the results.