Plotting with Altair

Plotting Libraries in Python

The linchpin library for plotting in Python is without a doubt matplotlib. While infinitely flexible, however, for many applications matplotlib lacks user-friendly tools for quickly making common types of figures (scatter plots, linear fits, histograms, etc.). With that in mind, several other packages (most of which are actually built on matplotlib) have been created to provide a more user-friendly interface. Unlike in matplotlib, where you have to think in terms of the geometric objects you want to place on axes, all three of these alternative libraries allow for higher-level, more “declarative” code:

Imperative:

Declarative:

  • altair: altair, like plotnine also has a clear philosophy that underlies its syntax, and seems to be relatively popular among pure python users.

  • plotnine plotnine is designed to replicate the syntax of the extremely popular R package ggplot in Python. In most cases it works wonderfully. As a result, if you are already familiar with ggplot, plotnine is hard to beat.

  • seaborn: seaborn is also a declarative plotting language, although its syntax is a little less modular than plotnine or altair – e.g. there is a distinct command for plotting histograms, a command for kernel densities, a command for bivariate fits, etc. But still much easier than matplotlib.

Why Do We Plot Data?

Grammer of Graphics

  • Data

  • Transformations (skip for now?)

  • Mark

  • Encoding

  • Scale

  • Guide

Building a Scatterplot

Making Interactive

Saving a Plot

What’s Going on Beneath the Hood?

Altair –> Vega-Lite –> Vega –> D3 (javascript)

Exercises!

If you are enrolled in Practical Data Science at Duke, don’t do these exercises on your own – we’ll do them in class!

Plotting Exercises, Part 1