# MIDS Elective Suggestions#

This page includes some guidance on how to think about choosing electives as well as a summary of electives that have been taken by MIDS students in the past.

To be clear, the summary of electives below is NOT the full list of all electives available at Duke or even the full list of electives MIDS students have taken in the past — MIDS students have taken dozens of different courses as electives over the years! This is only meant to provide you a sense of some of the most popular courses and the areas in which you may wish to investigate electives!

It’s also worth emphasizing that the **best** resource on electives are older MIDS students — they’ve taken many of these courses, and so can speak to things like instructor quality and class workload!

**Data science is a quickly changing field, so courses are constantly being added, removed, and changed! Please speak to your peers and faculty for their advice on electives and always check course syllabi since they often change from year to year.**

## How To Think About Electives#

As you are deciding what electives to take, it can be helpful to start by thinking about your goals. To illustrate, here are a few common goals:

**Improving some foundational skills or knowledge:**Feel like you want to improve your foundational programming, math or statistics skills? Electives are a*great*opportunity for that, especially because many people like to learn this kind of material in a university class rather than trying to learn the material on their own later.**Get a pre-requisite done:**If there are specific classes you*really*want to take, check to see if there are classes you’re required to take to enroll in those classes. Many statistics courses at Duke, for example, require you to take a Introduction to Bayesian Statistics, so even if you don’t want to take that*particular*class, you may find it useful so you can take other stats department classes.**See if you like a substantive domain/type of data science:**Data science is an*extremely*diverse field, and electives are an opportunity to try on different data science hats to see which fits best. Curious about biomedical applications, or finance, or public policy? Try a data science class in that specific area and see if it resonates with you!

## Most Popular Electives#

The following is a list of the most popular electives taken by MIDS students from 2020-2022 (based on number of students who have taken each class, ordered by past enrollment):

IDS 721: Data Analysis at Scale in Cloud

## Course Description

Data Analysis at Scale in the Cloud is a project based course with extensive hands-on assignments. This course is designed to give students a comprehensive view of cloud computing including Big Data and Machine Learning. A variety of learning resources will be used including interactive labs on Cloud Platforms (Google, AWS, Azure).ECE 661: Computer Engineering Machine Learning and Deep Neural Nets

## Course Description

This course examines various computer engineering methods commonly performed in developing machine learning and deep neural network models. The focus of the course is on how to improve the training and inference performance in terms of model accuracy, size, runtime, etc. Techniques that are widely investigated and adopted in industrial companies and academic communities will be discussed and practiced. Programming practices on these techniques are designed with heavy utilization of the PyTorch package. Prerequisites: Computer Science 201 or ECE 551D or ECE 751D.STA 602: Bayesian Statistical Modeling and Data Analysis

## Course Description

Principles of data analysis and modern statistical modeling. Exploratory data analysis. Introduction to Bayesian inference, prior and posterior distributions, hierarchical models, model checking and selection, missing data, introduction to stochastic simulation by Markov chain Monte Carlo using a higher level statistical language such as R or Matlab. Applications drawn from various disciplines. Not open to undergraduate students or students who have taken Statistical Science 360. Recommended prerequisite: Statistical Science 611 or the following: Statistical Science 210 and (Statistical Science 230 or 240L) and (Mathematics 202, 202D, 212, or 222) and (Mathematics 216, 218, or 221, any of which may be taken concurrently).ECE 685D: Intro to Deep Learning

## Course Description

Provides an introduction to the machine learning technique called deep learning or deep neural networks. A focus will be the mathematical formulations of deep networks and an explanation of how these networks can be structured and ‘learned’ from big data. Discussion section covers practical applications, programming, and modern implementation practices. Example code and assignments will be given in Python with heavy utilization of PyTorch (or Tensorflow) package. The course and a project will cover various applications including image classification, text analysis, object detection, etc. Prerequisite: ECE 580, ECE 681, ECE 682D, Statistical Science 561D, or Computer Science 571D.BIOSTAT 823: Statistical Programming for Big Data

## Course Description

This course will extend the foundation laid in software tools for data science to allow for efficient computing involving very large data sets. This course will explore the use appropriate algorithms and data structures for intensive computations, improving computational performance by use of native code compilation, use of parallel computing to accelerate intensive computations, use appropriate algorithms and data structures for massive data set, and use of distributed computing to process massive data sets. Prerequisite: BIOSTAT 821 or permission of the Director of Graduate Studies. Credits: 2ECE 590-1: Theory and Practice of Algorithms

## Course Description

This course ties the mathematical theory of algorithms and graphs to their practical implementations. Students will learn about the mathematical structures that for the foundations for the behavior and analysis of algorithms from a variety of domains, with a particular emphasis on graphs. Students will also tie that theory to practice by writing code to implement those algorithms, and comparing experimentally observed runtimes to those projected by the mathematical theory.MATH 641: Probability

## Course Description

Designed to be a sequel to Statistical Science 711. The basic five topics are: martingales, Markov chains from an advanced viewpoint, ergodic theory, Brownian motion and its applications to random walks, Donsker’s theorem and the law of the iterated logarithm, and multidimensional Brownian motion, connection to PDE’s. For those who have not had 711, we will prove the law of large numbers using martingales and obtain versions of the central limit theorem from Donsker’s theorem. Course requires a knowledge of measure theory. Prerequisite: Statistical Science 711 or Mathematics 631.STA 663L: Statistical Computing and Computation

## Course Description

Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience.BIOSTAT 821: Software Tools for Data Science

## Course Description

A data scientist needs to master several different tools to obtain, process, analyze, visualize and interpret large biomedical data sets such as electronic health records, medical images, and genomic sequences. It is also critical that the data scientist masters the best practices associated with using these tools, so that the results are robust and reproducible. The course covers foundational tools that will allow students to assemble a data science toolkit, including the Unix shell, text editors, regular expressions, relational and NoSQL databases, and the Python programming language for data munging, visualization and machine learning. Best practices that students will learn include the Findable, Accessible, Interoperable and Reusable (FAIR) practices for data stewardship, as well as reproducible analysis with literate programming, version control and containerization. Prerequisite: Permission of the director of graduate studiesECE 551D: Programming, Data Structures, and Algorithms in C++

## Course Description

Students learn to program in C and C++ with coverage of data structures (linked lists, binary trees, hash tables, graphs), Abstract Data Types (Stacks, Queues, Maps, Sets), and algorithms (sorting, graph search, minimal spanning tree). Efficiency of these structures and algorithms is compared via Big-O analysis. Brief coverage of concurrent (multi-threaded) programming. Emphasis is placed on defensive coding, and use of standard UNIX development tools in preparation for students’ entry into real world software development jobs.

## Electives by Topic Area#

Below are a list of electives by topic area. Again, this is **not** an exhaustive list of electives you *can* take, or even an exhaustive list of electives MIDS students *have* taken — just some classes and topic areas to get you thinking!

### Bayesian Statistics#

POLSCI 643S: Applied Bayesian Modeling

## Course Description

This course covers the theoretical and applied foundations of Bayesian statistical analysis. It introduces the logic of Bayesian inference, the idea of regularization, the role of subjective priors, the likelihood, and the posterior distribution. We will discuss model checking and model comparison. Applied Bayesian models include Hierarchical models, factor analysis and item response theory models, treatment effect models, and generalized additive models. Throughout the course, we will focus on the flexible modeling of data arising in social/political science, as well as in public health. We will also pay close attention to the presentation and interpretation of substantive results.STA 601L / STA 602L: Bayesian Statistical Modeling and Data Analysis

## Course Description

Principles of data analysis and modern statistical modeling. Exploratory data analysis. Introduction to Bayesian inference, prior and posterior distributions, predictive distributions, hierarchical models, model checking and selection, missing data, introduction to stochastic simulation by Markov chain Monte Carlo using a higher level statistical language such as R or Matlab. Applications drawn from various disciplines.BIOSTAT 724: Introduction to Applied Bayesian Analysis

## Course Description

This is a first course in Bayesian statistical analysis for graduate students in biostatistics. The fundamentals of Bayesian inference are introduced, including Bayes’ Theorem and prior and posterior distributions. Bayesian inference is compared and contrasted with frequentist methods through application to common problems in biostatistics. Inference based on conjugate families, as well as a computation-based introduction to Markov chain Monte Carlo methods is presented. Bayesian regression models are introduced, including model checking and selection, followed by an introduction to Bayesian hierarchical regression models. The course format emphasizes applied data analysis and is more heavily weighted toward heuristics and computation-based exploration of Bayesian methods rather than an intense mathematical treatment. Students should have a working knowledge of probability theory, likelihood, and applied frequentist data analysis including linear and logistic regression, and an understanding of how calculus is used in biostatistical applications. Prerequisite: None. Credits: 3

### Computer Vision#

BME 548L: Machine Learning and Imaging

## Course Description

Welcome to Duke University’s Machine Learning and Imaging (BME 548) class! This class aims to teach you how they to improve the performance of you deep learning algorithms, by jointly optimizing the hardware that acquired your data. It primiarly focuses on imaging data - from cameras, microscopes, MRI, CT, and ultrasound systems, for example. It begins with overview of machine learning and imaging science, and then focuses on the intersection of the two fields. This class is for you if 1) you would with imaging systems and you would like to learn more about machine learning, 2) if you are familiar with machine learning and would like to know more about how your data is gathered, 3) if you work with both imaging systems and machine learning and would like to hear a new perspective on the topic, or 4) if you work with neither imaging systems nor machine learning but have a strong mathematical background and are motivated to learn about both.## Course Description

Image formation and analysis; feature computation and tracking; image, object, and activity recognition and retrieval; 3D reconstruction from images. Prerequisites: Mathematics 221, 218 or 216; Mathematics 212; Mathematics 230 or Statistical Science 230; Computer Science 101; Computer Science 230.

### Entrepreneurship & Business#

I&E 748: New Ventures: Discover

## Course Description

This course is designed to lead you to a eureka moment by teaching you how to explore the world around you for problems worth solving. Instead of jumping directly into problem solving and solution development—which can often be wasteful without a clear understanding of a given market and customer need—this course focuses on research, exploration, and discovery. It asks students to set aside pre-conceived notions, avoiding some of their own blind spots, in order to do the necessary work of collecting data about market and learning to assess it as objectively as possible. This course is ideal for anyone who wants to excel at finding white space for new innovation and entrepreneurial action.I&E 748: New Ventures: Deliver

## Course Description

Did your idea pass muster in New Ventures Develop? Do you have early revenue or evidence of product market fit and want to continue to refine your go to market strategy? New Ventures Deliver is the ideal course for serious entrepreneurs ready to push themselves to take the leap. In this course you will continue to test core hypothesis while you develop a milestone driven plan for go-to-market, sales, staffing, and fundraising.I&E 800: Business Fundamentals

## Course Description

Using entrepreneurship as a backdrop, this course provides a broad overview of business, including practical business fundamentals and theoretical frameworks for critical thinking. Students will experience the early stages of a typical startup, examine theoretical basis for startup success, understand managing and operating within an organization, and conduct a business analysis of competing companies.

### Ethics#

BIOETHICS 676: Ethical Technology Practicum

## Course Description

Interdisciplinary practicum aiming to provide foundational knowledge in legal, ethical and policy frameworks for developing safe and ethical approaches to use of technological developments together with a practical opportunity to use this knowledge and principles of ‘ethics by design’ to create ethical policies and uses of technology or design of the products or platform itself. In addition to developing substantive knowledge around ethical tech, the students are expected to develop practical skills around collaboration, analysis, research, drafting, and written and oral communication.

### Finance#

For Quantitative Finance Electives, please see this page.

### Geospatial (GIS)#

ENVIRON 559: Fundamentals of GIS and Geospatial Analysis

## Course Description

Fundamental aspects of geographic information systems and satellite remote sensing for environmental applications. Covers concepts of geographic data development, cartography, image processing, and spatial analysis. Gateway into more advanced training in geospatial analysis curriculum. Consent of instructor required.ENVIRON 558: Satellite Remote Sensing for Environmental Analysis

## Course Description

Environmental analysis using satellite remote sensing. Theoretical and technical underpinnings of remote sensing (corrections/pre-processing, image enhancement, analysis) with practical applications (land cover mapping, change detection e.g. deforestation mapping, forest health monitoring). Strong emphasis on hands-on processing and analysis. Will include variety of image types: multi-spectral, hyper-spectral, radar and others. Recommended prerequisite: familiarity with GIS.ENVIRON 859: Geospatial Data Analytics

## Course Description

Provide training in more advanced skills such as: GIS database programming, modeling applications, spatial decision support systems and Internet map server technologies. The course requires a fundamental knowledge of geospatial analysis theory, analysis tools, and applications. Consent of instructor required. Prerequisite: Environment 559 and Environment 761, 765, or 789.One may also pursue a full Nicholas School GIS Certificate

## Course Description

### Math, Probability, and Statistics#

MATH 641: Probability

## Course Description

Designed to be a sequel to Statistical Science 711. The basic five topics are: martingales, Markov chains from an advanced viewpoint, ergodic theory, Brownian motion and its applications to random walks, Donsker’s theorem and the law of the iterated logarithm, and multidimensional Brownian motion, connection to PDE’s. For those who have not had 711, we will prove the law of large numbers using martingales and obtain versions of the central limit theorem from Donsker’s theorem. Course requires a knowledge of measure theory. Prerequisite: Statistical Science 711 or Mathematics 631.MATH 712: Multivariate Calculus

## Course Description

Partial differentiation, multiple integrals, and topics in differential and integral vector calculus, including Green’s theorem, the divergence theorem, and Stokes’s theorem. An assignment will ask the student to relate this course to their research.MATH 718: Matrices and Vector Spaces

## Course Description

Solving systems of linear equations, matrix factorizations and fundamental vector subspaces, orthogonality, least squares problems, eigenvalues and eigenvectors, the singular value decomposition and principal component analysis, applications to data-driven problems. An assignment will ask the student to relate this course to their research.MATH 730: Probability

## Course Description

Probability models, random variables with discrete and continuous distributions. Independence, joint distributions, conditional distributions. Expectations, functions of random variables, central limit theorem. An assignment will ask the student to relate this course to their research.MATH 780: Calculus and Probability

## Course Description

Introduction to calculus of real-valued functions with an emphasis on applications to probability. Topics include an introduction to elementary functions, differentiation and applications, integration, and continuous probability distributions. Intended for graduate students in social and applied sciences.STA 611: Introduction to Mathematical Statistics

## Course Description

Formal introduction to basic theory and methods of probability and statistics: probability and sample spaces, independence, conditional probability and Bayes’ theorem; random variables, distributions, moments and transformations. Parametric families of distributions and central limit theorem. Sampling distributions, traditional methods of estimation and hypothesis testing. Elements of likelihood and Bayesian inference. Basic discrete and continuous statistical models.

### Machine Learning#

*Note: Courses in this area are constantly changing, so be sure to keep an eye out for new courses!*

COMPSCI 675D: Introduction to Deep Learning

## Course Description

Provides an introduction to the machine learning technique called deep learning or deep neural networks. A focus will be the mathematical formulations of deep networks and an explanation of how these networks can be structured and ‘learned’ from big data. Discussion section covers practical applications, programming, and modern implementation practices. Example code and assignments will be given in Python with heavy utilization of PyTorch (or Tensorflow) package. The course and a project will cover various applications including image classification, text analysis, object detection, etc. Prerequisite: ECE 580, ECE 681, ECE 682D, Statistical Science 561D, or Computer Science 571D.ECE 661: Computer Engineering Machine Learning and Deep Neural Nets

## Course Description

This course examines various computer engineering methods commonly performed in developing machine learning and deep neural network models. The focus of the course is on how to improve the training and inference performance in terms of model accuracy, size, runtime, etc. Techniques that are widely investigated and adopted in industrial companies and academic communities will be discussed and practiced. Programming practices on these techniques are designed with heavy utilization of the PyTorch package. Prerequisites: Computer Science 201 or ECE 551D or ECE 751D. Instructors: Y. Chen or H. LiECE 685D: Intro to Deep Learning

## Course Description

Provides an introduction to the machine learning technique called deep learning or deep neural networks. A focus will be the mathematical formulations of deep networks and an explanation of how these networks can be structured and ‘learned’ from big data. Discussion section covers practical applications, programming, and modern implementation practices. Example code and assignments will be given in Python with heavy utilization of PyTorch (or Tensorflow) package. The course and a project will cover various applications including image classification, text analysis, object detection, etc. Prerequisite: ECE 580, ECE 681, ECE 682D, Statistical Science 561D, or Computer Science 571D. Instructor: TarokhECE 689: Advanced Topics in Deep Learning.

## Course Description

Focus on advanced topics in deep learning, particularly methodological methods. This includes discriminative models (e.g., infinite/infinitesimal/physics-informed neural networks), generative models (normalizing flows, graphical models, Bayesian Neural Networks, non-parametric approaches), and topics on inference (e.g., exact and approximate inference methods). Assignments will provide an opportunity to implement techniques. Instructor: Tarokh

### Programming#

IDS 721: Data Analysis at Scale in Cloud

## Course Description

Data Analysis at Scale in the Cloud is a project based course with extensive hands-on assignments. This course is designed to give students a comprehensive view of cloud computing including Big Data and Machine Learning. A variety of learning resources will be used including interactive labs on Cloud Platforms (Google, AWS, Azure).BIOSTAT 821: Software Tools for Data Science

## Course Description

A data scientist needs to master several different tools to obtain, process, analyze, visualize and interpret large biomedical data sets such as electronic health records, medical images, and genomic sequences. It is also critical that the data scientist masters the best practices associated with using these tools, so the results are robust and reproducible. The course covers foundational tools that will allow students to assemble a data science toolkit, including the Unix shell, text editors, regular expressions, relational and NoSQL databases, and the Python programming language for data munging, visualization and machine learning. Best practices that students will learn include the Findable, Accessible, Interoperable and Reusable (FAIR) practices for data stewardship, as well as reproducible analysis with literate programming version control and containerization. Credits: 3MATH 560: Theory and Practice of Algorithms

## Course Description

The mathematical theory of algorithms and graphs and their practical implementations. Examines the foundational mathematical structures for the behavior and analysis of algorithms from a variety of domains, with a particular emphasis on graphs. Students tie theory to practice by writing code to implement algorithms, and compare experimentally observed run-times to those predicted by the mathematical theory. Recommended prerequisite: Computer Science 201; or recommended corequisite: ECE 551; or equivalent.MATH 561: Numerical Linear Algebra, Optimization and Monte Carlo Simulation

## Course Description

Singular Value Decomposition, Principle Component Analysis, QR Factorization, Least Square Problems, Conditioning and Stability, Direct Method for Linear Systems – Gaussian Elimination, Cholesky Factorization, Iterative Methods for Linear Systems – Conjugate Gradients, GMRES, Preconditioning, Eigenvalue Problem – Power Method, Rayleigh Quotient, Inverse Iteration, QR Algorithms, Newton Method for Nonlinear Equation, Multigrid Method and Fast Fourier Transform.STA 663L: Statistical Computing and Computation

## Course Description

Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience. Instructor consent required. Prerequisite: Statistics 521L, 523L; Statistics 532 (or co-registration).ECE 551D: Programming, Data Structures, and Algorithms in C++.

*Editorial Comment:*An extremely difficult course. Do**not**enroll lightly or concurrently with other difficult courses.## Course Description

Students learn to program in C and C++ with coverage of data structures (linked lists, binary trees, hash tables, graphs), Abstract Data Types (Stacks, Queues, Maps, Sets), and algorithms (sorting, graph search, minimal spanning tree). Efficiency of these structures and algorithms is compared via Big-O analysis. Brief coverage of concurrent (multi-threaded) programming. Emphasis is placed on defensive coding, and use of standard UNIX development tools in preparation for students’ entry into real world software development jobs. Not open to undergraduates. Instructors: Hilton, Lipp, Pastorino, or YounesECE 590-1: Theory and Practice of Algorithms

## Course Description

This course ties the mathematical theory of algorithms and graphs to their practical implementations. Students will learn about the mathematical structures that for the foundations for the behavior and analysis of algorithms from a variety of domains, with a particular emphasis on graphs. Students will also tie that theory to practice by writing code to implement those algorithms, and comparing experimentally observed runtimes to those projected by the mathematical theory.ECE 651: Software Engineering

## Course Description

Teaches students about all steps of the software development lifecycle: requirements definition, design, development, testing, and maintenance. The course assumes students are skilled object-oriented programmers from prior courses, but will include a rapid introduction to Java. Students complete team-based semester-long software project which will progress through all phases of the software lifecycle. Prerequisite: Electrical and Computer Engineering 551D or 751D. Instructors: Derby, Hilton, Noyce, Pastorino, or RahbarBIOSTAT 823: Statistical Programming for Big Data

## Course Description

This course will extend the foundation laid in software tools for data science to allow for efficient computing involving very large data sets. This course will explore the use appropriate algorithms and data structures for intensive computations, improving computational performance by use of native code compilation, use of parallel computing to accelerate intensive computations, use appropriate algorithms and data structures for massive data set, and use of distributed computing to process massive data sets. Prerequisite: BIOSTAT 821 or permission of the Director of Graduate Studies. Credits: 2

### Time Series#

ENVIRON 797: Time Series Analysis for Energy and Environment Applications

## Course Description

This course focuses on time series analysis, modeling, and forecasting, specifically within the context of energy and the environment. Lectures will include theory and applications using R programming language. Datasets from organizations like US Energy Information Administration (EIA), National Oceanic and Atmospheric Administration (NOAA), National Renewable Energy Laboratory (NREL) and US Geological Survey (USGS) will be used. Upon completion of the course, students will be able to use R to carry out basic statistical modeling and analysis as well as fitting models to data. The primary objective of the course is to empower students to extract meaningful predictions and insights from data.

## Social Networks#

SOCIOL 728: Introduction to Social Networks

## Course Description

Introduction to social network analysis (SNA). History of SNA; social-theoretical foundations of modern network analysis; data collection; data management; analysis and visualization tools. Survey of current applications of SNA within the social sciences.