Welcome to the Command Line Basics Exercises!

In this exercise we’re going to get some practice navigating and exploring files and folders from the command line by looking at some data from New York City’s 311 system. 311 is a citizen hotline set up by the city of New York for reporting non-emergency issues to the city. 311 takes calls about all sorts of issues, from noise complaints to issues with street lights to complaints about restaurant hygeine violations and rodent sightings.

You can find the 311 data we’ll be working with in a zipped file called NYC_311calls_2018.zip here. Please download the file and place it somewhere easy to remember (desktop, downloads, etc.).

Exploring Files

Once you’ve made NYC_311calls_2018 your working directory, use ls to look at what’s in the folder. What you see should look something like this:

$ ls
311_SR_Data_Dictionary_2018.xlsx   NYC311_column_names.txt            raw data
CE-20170824.pdf                    README.md                          ~$311_SR_Data_Dictionary_2018.xlsx

Up until now, we’ve just been moving around at the level of the filesystem, seeing file names but not their contents. But if a file is a plain text file, we can also look at it’s contents. There are actually a few ways to do this, but the two most used options are cat (which will print the contents of the files to your screen), or less (which will open a small program to allow you to read through the document in a controlled manner). cat is quicker, but if you use cat with a big file, the whole file will just print out to your screen and you’ll end up overwhelmed (though you’ll be fine for a small file here).

Exercise 3

Do as the README.md suggests and read it first with the command cat README.md, then with the command less README.md (press q for quit to get out when you’re done).

Exercise 4

Now let’s do the same with CE-20170824.pdf. If less asks you a question, just type y.

What happened?! Unfortunately, CE-20170824.pdf was not a plain text file, but instead is what is referred to as a binary file. This distinction between plain text files and binary files will come up a lot, so let’s discuss it briefly.

The names “plain text” and “binary” are misnomers since everything on your computer is stored as 1s and 0s (i.e. binary), so don’t read too much into the name. The actual distinction between plain text and binary files is what those 1s and 0s are meant to represent. In a plain text file, the 1s and 0s just encode numbers and letters based on simple, commonly used codes (like ASCII or Unicode). Not only do these files not contain anything complicated (pictures, media, etc.), they don’t even include information like fonts, or formatting. But this simplicity makes them easy for simple systems to read, and thus they’re commonly used in programming because they’re so universal.

In a binary file, by contrast, the 1s and 0s encode much more complicated information. In this case, CE-20170824.pdf is a PDF file that includes images, different fonts, careful formatting, etc. As a result, it can only be openned by a PDF reader (like Preview or Adobe Reader) that knows how to interprete the file’s complicated content. If you open it with less, less tries to treat the 0s and 1s like they were just encoding simple letters and numbers, but since they don’t, the result is just gobblygook.

Exercise 5

Just because it’s not plain text doesn’t mean we don’t want to know what it is! So let’s use the open command. open FILENAME just asks your computer to do whatever it would do if you double-clicked on FILENAME. So if you type open CE-20170824.pdf, your computer will open the PDF in your default PDF reader.

Exercise 6

OK, so CE-20170824.pdf is just a paper someone wrote using this data. Since the name CE-20170824.pdf doesn’t tell us anything about this paper, let’s rename it using the mv command. Recall from DataCamp that mv stands for move, but that while it is moving files it can also rename them. If you “move” something from its current location back to its current location but with a different name, you’ve effectively re-named it!. So try re-naming CE-20170824.pdf to something more descriptive.

Organizing Files

Up till now, we haven’t done anything that wouldn’t have been easier to do using a mouse and a regular graphical user interface. But now let’s suppose we want to analyze the data from 311 calls placed on Thursdays and Fridays to see if city workers are less likely to address problems that are reported on Fridays.

In your normal operating system GUI< open up the raw data folder inside NYC_311calls_2018. As you will see, the folder is full of CSVs (comma-separated-values, a plain-text format for storing spreadsheets), with one file for each day.

Exercise 7

Without using the command line (or another progamming language), how you would pull out all the files for Thursdays and Fridays and move them to a new folder without using the command line? Would you strategy work if you had 10 years of data instead of 1 year of data?

Exercise 8

One of the advantages of the command line is that you can use wildcards (the * symbol) to identify any files with a given pattern. For example, if I wanted to list all the CSV files in raw data from February, I would type ls 311calls_2018_2_*.csv, since all the files from February (month 2) would have the same prefix (311calls_2018_2_) and suffix (.csv). Now, using the mv command and the * symbol, move all the Thursday and Friday files to a new folder. (Hint: you’ll probably need to make a new folder to put the files into first.)