Plotting Exercises, Part 1

Exercise 1

Create a pandas dataframe from the “Datasaurus.txt” file using the code:

[1]:
import pandas as pd
import numpy as np

pd.set_option("mode.copy_on_write", True)

df = pd.read_csv(
    "https://raw.githubusercontent.com/nickeubank/practicaldatascience"
    "/master/Example_Data/Datasaurus.txt",
    delimiter="\t",
)

Note that the file being downloaded is not actually a CSV file. It is tab-delimited, meaning that within each row, columns are separated by tabs rather than commas. We communicate this to pandas with the delimiter="\t" option ("\t" is how we write a tab, as we will discuss in future lessons).

Exercise 2

This dataset actually contains 13 separate example datasets, each with two variables named example[number]_x and example[number]_y.

In order to get a better sense of what these datasets look like, write a loop that iterates over each example dataset (numbered 1 to 13) and print out the mean and standard deviation for example[number]_x and example[number]_y for each dataset.

For example, the first iteration of this loop might return something like:

Example Dataset 1:
Mean x: 23.12321978429576,
Mean y: 98.23980921730972,
Std Dev x: 21.2389710287,
Std Dev y: 32.2389081209832,
Correlation: 0.73892819281

(Though you shouldn’t get those specific values)

Exercise 3

Based only on these results, discuss what might you conclude about these example datasets with your partner. Write down your thoughts.

Execise 4

Write a loop that iterates over these example datasets, and using the seaborn library, plot a simple scatter plot of each dataset with the x variable on the x-axis and the y variable on the y-axis.

Hint: When writing this type of code, it is often best to start by writing code to do what you want for the first iteration of the loop. Once you have code that works for the first example dataset, then write the full loop around it.

Hint 2: To force Jupyter to display your charts when they’re generated within a loop, use the method .show() (e.g. my_chart.show()).

Hint 3: You will need to change the range of the axes to make the plots look good!

Exercise 5

Review you plots. How does your impression of how these datasets differ from what you wrote down in Exercise 3?

Economic Development and… Your Choice!

Exercise 6

Load the World Development Indicator data used in the plotting reading. Rather than picking a single year, pick a single country and look at how GDP per capita and one of the other variables in that dataset have evolved together over time.

Make any adjustments to the functional forms of your variables and/or axes needed to make the figure legible.

Exercise 7

Now add a second series. Facet your plot so that the two subplots are positioned so that they are effectively sharing the same time axes (e.g., if you draw a line up from 2010 on one plot, you get to 2010 on the other).

Rather than telling you exactly how to do it, however, I’ll point you to the seaborn tutorial. It has examples that don’t do exactly what you want, but should be close enough you can guess-and-check to the solution you want!

Use your detective skills (and some guess and check work) to figure out how to get it to work!