In recent months, a host of new tools and packages have been announced for working with data at scale in Python. For an excellent and entertaining summary of these, I'd suggest watching Rob Story's Python Data Bikeshed talk from the 2015 PyData Seattle conference. Many of these new scalable data tools are relatively heavy-weight, involving brand new data structures or interfaces to other computing environments, but Dask stands out for its simplicity. Dask is a light-weight framework for working with chunked arrays or dataframes across a variety of computational backends. Under the hood, Dask simply uses standard Python, NumPy, and Pandas commands on each chunk, and transparently executes operations and aggregates results so that you can work with datasets that are larger than your machine's memory.
In this post, I'll take a look at how dask can be useful when looking at a large dataset: the full extracted points of interest from OpenStreetMap. We will use Dask to manipulate and explore the data, and also see the use of matplotlib's Basemap toolkit to visualize the results on a map.