Practical Data Science in Python#

Data Science is an intrinsically applied field, and yet all too often students are taught the advanced math and statistics behind data science tools, but are left to fend for themselves when it comes to learning the tools we use to do data science on a day-to-day basis or how to manage actual projects. This site is designed to fill that gap.

This site is designed to support several Duke Interdisciplinary Data Science Courses, all of which take the form of flipped-classroom, exercise and project-focused courses. These courses — and the material on this site — is designed to give students practical experience manipulating and analyzing real (often messy, error ridden, and poorly documented) data using the full range of bread-and-butter Python data science tools (like the command line, git, python (especially numpy and pandas), jupyter notebooks, and more).

The courses supported by these materials are:

  • Practical Data Science I (IDS 590): The best choice for most Duke students. This course requires zero prior experience with programming and begins with an introduction to Python, computational thinking, and the principles of good programming using the 7 Steps method. The class focus then shifts to data analysis with an emphasis on the type of analyses of interest to social scientists and public policy students.

  • Practical Data Science II (IDS 591): Building on the computational thinking skills developed in Practical Data Science I, this course introduces students to a range of methods of computational inquiry, including network analysis, geospatial analysis, and natural language processing (NLP). Throughout, the focus will be on developing hands-on experience implementing these methods with messy real-world data to ensure students are prepared to deploy these tools to answer the questions they care about. Requirements: Practical Data Science I, Intro Statistics.

  • Practical Data Science (MIDS) (IDS 720): A one semester version of Practical Data Science specifically tailored to Masters of Interdisciplinary Data Science (MIDS) students. As all MIDS students complete a mandatory, 4-week, in-person, intensive summer Python programming bootcamp in August before the start of classes, this class assumes a strong foundational understanding of the Python standard library. Because this class is a one-semester class and MIDS students all take a full NLP class, it skips some topics covered in Practical Data Science II, like NLP and GIS.

The fact Practical Data Science I&II are 500-level and Practical Data Science (MIDS) is 700-level does not reflect a difference in rigor! IDS 720 was created with MIDS students in mind, so we scheduled it as a 700-level (“graduate student only”) course. To make that material in the course accessible to advanced undergraduates, we have chosen to schedule Practical Data Science I & II as 500-level courses so both graduate and undergraduate students can enroll.

For more on each class, click on the appropriate class link on the left hand side menu!

This content is also a major component of a Coursera Python Data Science Foundations Specialization created by the authors of this site, along with Drew Hilton and Genevieve Lipp.

Are You An Instructor Using This Material In A Class?#

Feel free to reach out and we can provide you some additional resources.

Questions or comments?#

Please let me know! All source files (and underlying jupyter notebooks) for this site can be found on github, and you can raise issues there by creating a new issue or by emailing me at nick@nickeubank.com*.