Git and Github, Part 1

In these exercises, we’ll be practicing creating and contributing to projects using the “Github Flow” workflow you can find summarized here. Note this is not the only workflow out there – different companies or groups will often develop their own modification of this basic strategy – but most are pretty similar to this approach.

In these exercises, for the first time we’ll want to use the computers of BOTH people working in each pair, so label one computer as Computer A, and one as Computer B.

Part 0: Change your default editor!

By default, when git needs you to label a commit, it will open an editor called vim that is very hard to work with (it’s not just weird – it’s even hard to get out of once it’s openned!)

So before you do anything else, first change your default editor. You can do this with the command:

git config --global core.editor "COMMANDTOLAUNCHYOUREDITOR"

If you have VS Code installed and the command line tool setup (i.e. you set up VS Code so that if you type code in the command line, VS Code opens, as discussed here), then I would do:

git config --global core.editor "code --wait"

If not, then the best option is probably to use nano, which is an in-terminal editor we’ve discussed before that’s much easier to use than vim, and which you should already have if you’re using a bash console:

git config --global core.editor "nano"

Part 1: Making a New Repo!

To begin, we will create a repository where we will use data from the World Development Indicators to quickly plots GDP per capita against infant mortality.

(1) On Computer A, go to github and create a new repository “GDP_and_CO2”. When creating the repository, also make sure to add a README file.

(2) Go to Settings and add the user of Computer B as a collaborator. That will empower you to access and work with the data in the repository from either computer.

(3) Create a local copy of the repository on your Computer A using git clone and the URL provided for cloning your repository. Note that you cannot just use the URL for your repository’s webpage! If you use that, when you push and pull you’ll be trying to push and pull to a webpage, not to a git server. The URL you need for cloning should end in .git.

Note: Cloning a repository on github is one of two ways of getting a repository setup on your computer. The other option is to make a folder on your computer, make it a git repo with the command git init, add some files and commit then, go to github, create an empty repository, add that repository’s URL to as the “remote” address in the repository on your computer using git remote add origin www.github.com/[path to your repo].git, then pushing. This is good to know in case you already have a repo with a lot of files on your computer. But if you’re starting a new, empty repo, it’s much easier to use git clone, since cloning also ensures that your repo knows the URL of the github repository it should connect to when doing git push and git pull.

Note: The folder you have now cloned is your git repository. Everything from the level of the folder you just cloned down is part of that repository, so don’t put anything into that folder you don’t want in the repository. And don’t put other repositories into that repository!

(4) Now that you’ve cloned your repository, on Computer A edit the README.md file by adding some text about the project. Then git add those changes, git commit, and push. Check on github to ensure that your edits are evident there.

Note you can edit your README.md file any way you want! You can use Atom, nano, RStudio, VS Code, or any other editor. This is a critical thing to understand about git and github: it is a system for managing your files, not for managing your entire workflow. You still edit and run your code the way you always have. In this case, README.md is just a text file on your computer you can work with like any other text file.

Part 2: Making a Pull Request

As you’ve read, a common way to work with a github repository is to create branches where you develop new code. Creating a branch, as the name implies, is a way of creating a version of your repository that branches off from the master branch at a point. Once created, a branch exists independently of the master branch so that you can make edits to your code without worrying about interfering with anyone else who wants to use the master branch. This is useful in software (since people may be running the master branch), but it’s also often useful in data science (where you may be experimenting with changing how variables are defined, and won’t want that to impact other people till you’ve decided what you want to do.

Once you’re done working on a branch, you may decide that the changes you’ve made are amazing, and you want to merge those changes back into the master branch (you also may not! It’s always ok to throw away a branch that didn’t work! Branches are for experimenting safely).

In order to ensure that only good code ends up in the master branch, it is common practice when working on teams to not just merge you branch back into the master branch. Instead, you submit a “Pull Request” on github, which is basically a request (on github) for other people to come review your code. Pull requests, in other words, are primarily a mechanism for peer review. So let’s make our first pull request (common referred to as a PR)!

(5) Still on Computer A, let’s create a new branch we can experiment on. We can do this in one of two ways:

  • We can type git branch <new_branch_name> (to create the branch), then git checkout <new_branch_name> to switch to that branch, or

  • We can type git checkout -b <new_branch_name> to both create a new branch and switch to that branch.

“Switching” branches means that git (potentially) changes what files are visible to your computer and operating system. Because our new branch is, well… new, it still looks exactly the same as our master branch, so you won’t see any changes in your files. If you’re using Cmder on windows or Oh-My-Zsh on a Mac, though, you can see what branch your on by looking at your command line prompt. It should move from showing either “master” or “main” to your new branch name (here: “my_new_branch”) when you switch:

** Note: the primary branch of a github repo USED to be called master, but as of late 2020 the default is not main, so you’ll likely see both for a while!**

branch_change_in_terminal

Now any changes you make and commit to the files in your repository will be changes made to the active branch, not to master.

(6) Still on Computer A, let’s start writing some analysis code! Before we start, though, an important note: Git is only really build for simple, plaintext files, so please don’t use Jupyter Notebooks (we’ll discuss that in more detail below). You can use text files you edit and work with in VS Code, but not notebooks. For a refresher on how to use text files in VS Code (in a manner analogous to RStudio) see Jupyter Exercises.

OK, so now let’s create a regular Python (.py) analysis file in our repository. If you’re using Jupyter Lab, just create a new textfile, save it into the repository folder, then get to work (right-click the text file, select “Create Console for Editor”, then select Python 3 to create a kernel where you can run your code as you work).

In particular, import World Develoment Indicators from this url using pd.read_csv: https://media.githubusercontent.com/media/nickeubank/MIDS_Data/master/World_Development_Indicators/wdi_small_tidy_2015.csv

Then pull out the columns:

  • ‘Mortality rate, infant (per 1,000 live births)’

  • ‘GDP per capita (constant 2010 US$)’

  • ‘Country Name’

and plot mortality against GDP per capita.

(7) Once you’re done, add and commit the new analysis file (from in your command line session). Then push these files to github.

When you try and push them to github, you will be told that you can’t push them because there’s no branch on github to receive the push. That’s because you created your new branch locally, but it doesn’t exist yet on github. Follow the directions for telling github to make a branch to receive your push, then push again.

(8) Now in your browswer, navigate to the repository on github. There, switch over to your new branch:

switch_to_new_branch

And look for the “Pull Request” button:

new_pr_button

(9) This should take you to a Pull Request screen where you can title your PR and add comments. In the comments, ask the person who has Computer B to review your PR (you can use the @ and their user name to address them. So type @otheruser would you please review this?. Then click “Submit Pull Request”.

Part 3: Reviewing a Pull Request

(10) Now on Computer B, navigate to the repository from your browser. You should now be able to see your partner’s PR.

From Computer B, review the PR from Computer A. You can do this on two levels:

  1. Click on the “Files Changed” tab to see what code is being added or changed in this PR. You can also add comments here: click the plus sign that appears when you mouse over a number and leave a comment about a specific line of code (like “you need more comments to explain this code!”).

  2. You can also pull this branch so you can run it on your own computer.

(11) Let’s try the second startegy. On Computer B, clone the repository.

(12) Navigate (from the command line) into the now cloned repository. By default, you should be on the master branch.

Remember when I said that switching branches changes the files visible to the operating system? Open your repository folder using your operating system (File Explorer or Finder) and look in it. If you’re on the master branch, you won’t see the new analysis file.

(13) Now, from the command line, swith to the branch you’re reviewing for the PR (it should already be there, so you just have to do git checkout <branch_name>.

(14) Now look at the folder using File Explorer or Finder again – see how the analysis file has appeared? Open it in Jupyter Lab on Computer B and see if you can get it to run.

(15) Now go back to github and the PR, and suggest one change for the person who wrote the original code to make.

Part 4: Revising a Pull Request

Updating a PR is no different from changing any project in github – just open your files, edit your code, then add, commit, and push your changes, and the changes manifest in the PR.

(16) On Computer A, make the changes suggested by the Computer B reviewer. Add, commit, and push those changes.

(17) On Computer B, look to see that those changes are present, then click “Merge this PR”! Now if you look at the master branch of your repository, you should see that the analysis file now appears on master.

Congratulations!

You’ve completed your first full git / github cycle!

Now move on to Part 2 for some more exercises.