Plotting in Python#
Plotting is one of the most powerful tools available to the data scientist. Humans are naturally visual, and so perhaps it should come as no surprise that data visualizations are one of the best tools that we have for both understanding our data and communicating about it to others.
In this set of readings, we will begin with an overview of the ecosystem of tools available for data visualization, after which we will take a deep dive into one of the most powerful libraries for data science visualizations currently available — seaborn
— and discuss its organizing philosophy.
The Plotting Ecosystem#
Few parts of the Python ecosystem can feel quite as overwhelming as the world of plotting libraries. Their two main reasons for this. The first reason is that there are many philosophies of data visualization, and every approach to data visualization has given rise to at least one plotting library (and sometimes many more!). But the second reason is a little more subtle.
When you make a plot, the software you are working with can often be divided into two distinct layers. The top layer is what we will call the user layer. This is the plotting library that you knowingly installed and whose website and documentation you would consult if you were trying to figure out how to create a certain kind of plot.
Examples of popular user layers include seaborn
, plotnine
, Altair
, and pyplot
(from matplotlib
).
But whenever you make a plot, you are also relying on a bottom layer, usually referred to as the plotting backend, that is actually responsible for creating an image that can be interpreted by your operating system and displayed on your monitor.
Examples of popular plotting backends include matplotlib
, Plotly
, Bokeh
, Vega
, and D3
.
That makes plotting libraries a little like apps on your smartphone. Just as some apps feel exactly the same whether you’re running iOS or Android, some plotting libraries always look the same to the user but are able to use different plotting backends to actually generate their images. And just as there are apps they will only work on iOS or Android, some plotting packages only work with one plotting backend. And finally just as some apps come bundled with your phones operating system, so too do some plotting utilities come bundled with plotting backend libraries.
seaborn
and matplotlib
#
In the following tutorials, we will learn about the seaborn
library for statistical data visualization. We have chosen seaborn for several reasons.
First, seaborn is user-friendly, and because of its focus on statistical data visualization, it comes with a number of tools that make it quite easy to put together powerful and aesthetically pleasing data science visualizations.
Second, seaborn has in extremely composable interface modeled after the grammar of graphics. If you’ve ever used ggplot
in R, you will find it very familiarp
Finally, in the parlance we just introduced, seaborn is a “user layer” library that is built on top of the matplotlib
“plotting backend.” When it comes to plotting in Python, matplotlib is the original plotting backend. That means that its functionality is extremely well documented, it is really mature, and it has great compatibility. Moreover, matplotlib provides users with near infinite customizability.
seaborn inherits these benefits but none of the costs (matplotlib can be very finicky to work with directly). That’s because seaborn creates matplotlib objects, so if you ever find that its high level interface isn’t allowing you to do something you want, you can always fall back on matplotlib itself to customize to your heart’s content.