# Setting Up Your Computer for Data Science¶

One of the major learning goals of this class is for you to be comfortable managing all the software and settings required for you to do data science on your own computer.

Why deal with all the headaches of setting up your own environment, you may ask? Why not just use a cloud platform like Google Colab or a virtual machine with everything already set up?

Getting data science tools installed and working together is, for better or worse, a pretty core part of the day-to-day life of data scientists, and learning how to troubleshoot problems quickly is an important skill for being productive in the profession. But it is a skill that takes time and energy to learn, and so in most classes — which want to focus on teaching topics like statistical analysis or programming concepts — instructors choose to provide students with clean, ready-to-use environments so everyone can focus on those topics. For example, if the MIDS Python Bootcamp included a module on setting up Python environments instead of providing you with a clean virtual machine, you’d probably end up learning ~25% less programming!

But the problem with this approach is that if every course you take pursues this strategy, you may find that you don’t feel empowered to go do data science yourself when those clean VMs are taken away at the end of the semester. Moreover, it means you may not know enough about how data science tools work to debug problems on your own when they come up.

So in this course, we’re going to address environment setup head-on. That will probably mean you’ll get a little annoyed at the fragility of many of these tools, and you may get frustrated spending hours trying to find a setting that got set wrong (though we’ll try to minimize these experiences!), but try to think of this time not as wasted, but instead as part of your data science education!

## What We’ll Be Setting Up¶

To set ourselves up for this course (and hopefully our careers!), we’ll need to set up the following things:

• Python and the conda package manager: This is a Python-centric course, so the first thing we’ll need to do is install Python and a robust, data-science-appropriate package manager.

• Visual Studio Code (VS Code): In this course, we’ll become familiar with two editors: VS Code and Jupyter Lab. Both are extremely common in data science, and while you may eventually decide you prefer one or the other, we’ll spend time introducing both. But the first one we’ll set up is VS Code, because it is super useful for a huge range of not just data science tasks, but also environment setup.

• Augmented Command Line: As a data scientist, you’ll spend a lot of time working at the command line, so it’s a good idea to invest a little in setting up something more advanced than the default command line tool offered by your operating system (e.g., Terminal/CMD Prompt/Powershell). In addition, this will give us a chance to learn a little about how the command line works, which will be really important to effective troubleshooting.

## Installing Python with Miniconda¶

The first thing you’ll likely want to do on any computer you work with is install both Python and the package manager conda. This is necessary because unlike a language like R where you can install packages with the install.packages() command, Python doesn’t have an internal tool for installing packages. This means that we need a tool like conda if we want to use anything other than vanilla Python (e.g., tools for plotting, numpy, pandas, etc.).

Python has two main package managers: pip and conda. While most software engineers use pip, most data scientists like conda. That’s because while pip is good at installing Python libraries, conda is better at installing many of the big dependencies that underlie data science tools. Plus, if we install conda, it will come with pip, so we get the best of both worlds!

### Why Miniconda?¶

So the first thing we need to do to get started with Python is go to the Miniconda download page and download the most recent installer for our system (as of July 9, 2021, that’s Python 3.9).

Note that there are actually two well-known ways to get conda on your system — installing Anaconda from anaconda.com, and installing Miniconda from docs.conda.io. It is my strong recommendation that you use Miniconda. That’s because if you install Anaconda from anaconda.com, you get not only Python and the conda package manager, but also dozens of pre-loaded packages. And while that sounds great, the reality is that it tends to cause lots of package conflicts once you start adding anything new to your installation. Miniconda, as the name implies, is the “mini” version of the Anaconda package, and basically only includes Python and a couple core utilities (conda, pip, etc.). As a result, a Miniconda installation is much less likely to cause package conflict problems down the road.

If you already have a conda installation: My recommendation is to delete it and start fresh. Deleting your Python installation can feel scary once you’ve set stuff up, but you don’t want to get in the practice of being too precious about your Python installations, as you’ll often have to just delete it all to deal with software conflicts.

Thankfully, deleting Anaconda/Miniconda is easy — just delete the miniconda3 / anaconda3 folder you created during installation! The great thing about conda is that everything lives in that folder, so you can easily delete it and start fresh!

### Installation¶

1. Go to the miniconda install page.

2. Download a 64-bit version of Miniconda. The latest Python 3.x package is probably best. On a Mac, go with the pkg installer.

3. Run the installer, paying attention to the following options:

1. When you’re asked where to install the software, you want to install it “For me only,” not “Install for all users of this computer.” Note that as of July 2021, you may find the “For me only” option has a warning saying you can’t install there, but if you click a different option then click on the “For me only” option again, the warning goes away.

2. On Windows, you’ll be asked if you want to add Miniconda to your PATH variable. Although it recommends that you do not do this, DO add it to your PATH. This will be important when we change how our command line works.

4. Miniconda is installed!

Why did we want to install it “for me only” in step 3? To install software for all users, you have to install software at the level of your operating system so it’s visible to all users. And your computer is very protective of anything installed at the level of the operating system because of the dangers of computer viruses, so anything installed there can run into “permission” problems when it tries to run. Anything installed “for me only” gets installed in your user folder which your computer is less paranoid about, leading to fewer problems.

Now we also want to change one setting in Miniconda. When you install packages using conda, conda can actually pull from a number of different package repositories (called “channels”). The default for this is the “anaconda” channel, but the best channel is actually called conda-forge, so to set that as the default:

• Open the default command line on your computer (on a Mac, it’s Terminal in Applications > Utilities; on Windows, you can use PowerShell), and run the following two commands:

• conda config --add channels conda-forge (you may be told you already have it listed)

• conda config --set channel_priority strict

And we’re done!

## Installing VS Code¶

VS Code is kinda stupid-easy to install — just download it here!

Then I’d recommend learning how to integrate your Python installation with VS Code. To do so, go check out this video tutorial I made here (you’ll probably want to skip the first few sections on Miniconda and VS Code installation and jump to setup at about minute 7).

Then, if you’re on a Mac, we want to do one more important thing: set up your system so that if you type code [filename] on the command line, VS Code will open [filename] (this gets set up automatically with Windows, but requires a deliberate step on Macs):

• In VS Code, type Command-Shift-P. This will cause the Command Pallet to open at the top of your open window.

• Type Shell Command: Install code command in PATH and open that.

• Wait for confirmation.

### Set terminal.integrated.inheritEnv to False in VS Code¶

In order to ensure that the integrated terminal in VS Code works properly with Miniconda, we have to modify one setting. To do this, in VS Code, go down to the bottom-left corner and click on the gear icon, select “Command Pallet…,” and type “Preferences: Open Settings (JSON)” (note the (JSON) at the end). That should bring you to a relatively blank document. Between the curly braces, add the following:

"terminal.integrated.inheritEnv": false,


So that your file looks something like:

(Note the commas at the end of each line). If you forget the step VS Code will try to prompt you to set it later with this notification:

So if you see that notification make sure to select yes.

## Set Up an Augmented Command Line¶

The last thing we’re gonna do to fully set up our environment is install and configure a better command line tool. How we do this depends on your operating system, though, so please follow the appropriate link below: