The course site for Duke MIDS Fall 2022 Practical Data Science (IDS 720) Course
If you are not a Duke Masters in Data Science student, please see this page about how best to use this site!
Data Science is an intrinsically applied field, and yet all too often students are taught the advanced math and statistics behind data science tools, but are left to fend for themselves when it comes to learning the tools we use to do data science on a day-to-day basis or how to manage actual projects. This course is designed to fill that gap.
Practical Data Science is a flipped-classroom, exercise and project-focused course. It is designed to give students practical experience manipulating and analyzing manipulating real (often messy, error ridden, and poorly documented) data using the full range of bread-and-butter Python data science tools (like the command line, git, python (especially numpy and pandas), jupyter notebooks, and more). By the end of the course, students will be able to:
Manipulate and analyze data in any format, including cleaning, merging, and summarizing all standard tabular formats and levels of cleanliness, as well as large datasets and GIS data,
Identify and resolve data issues using defensive programming practices,
Setup and manage a data science programming environment on their own computers, including installing Python, managing packages with pip and conda, setting PATH variables, and working with VS Code,
Collaborate with colleagues effectively using git and github,
Plan and execute a full data science project from planning data manipulations through analysis and presentation of findings.
The full syllabus for this course can be downloaded here. Please note that this syllabus is subject to change up until the first day of class.
Questions or comments?¶
Please let me know! All source files (and underlying jupyter notebooks) for this site can be found on github, and you can raise issues there by creating a new issue, or by emailing me at firstname.lastname@example.org.