Data 311: Machine Learning

Welcome

All of the course material can also be accessed through Canvas. The syllabus can be found in the Syllabus tab in the Navigation bar.

Lectures

Lectures will be uploaded here. Quarto includes a built in version of the reveal.js-menu plugin. You can access the navigation menu using the button located in the bottom left corner of the presentation1. Clicking the button opens a slide navigation menu that enables you to easily jump to any slide.

Print/Save to PDF:

Reveal presentations can be exported to PDF via a special print stylesheet.

  1. Toggle into Print View using the E key (or using the Navigation Menu)
  2. Open the in-browser2 print dialog (CTRL/CMD+P).
  3. Change the Destination setting to Save as PDF.
  4. Change the Layout to Landscape.
  5. Change the Margins to None.
  6. Enable the Background graphics option.
  7. Click Save 🎉

Schedule

Lecture Topic Supplementary Reading
1 Welcome! Introduction To R and RStudio
2 Notation and Terminology ISLR Ch 1
3

Assessing Regression Models

-MSE and Testing vs. Training MSE
- Decomposition of MSE
- Reducible error and Irreducible Error
- Bias-Variance Tradeoff

ISLR 2.2.1, 2.2.2
4 Linear Regression ISLR Section 3.1, 3.2
5

Extensions to the linear regression model: Interaction, Categorical Predictors, Polynomial regression.

KNN Regression (non-parametric approach)

ILSR Section 3.3, 3.4, 3.5, Lab 3.6
6 Logistic Regression ISLR Section 4.1, 4.2, 4.3
7 Assessing Classification Models ILSR Section 2.2.3
8 Classification models: Bayes Classifier, KNN Classification and Discriminant Analysis ILSR Sections 2.2.3 and 4.4.1, 2, 3
9 Distance measures: Euclidean Distance, Manhattan Distance, Mahalanobis Distance, Matching Binary Distance, Asymmetric Binary Distance, Gower’s Distance Ch 3 of MSR3
10 Hierarchical Clustering and \(k\) - means clustering ISLR 12.4.1, 12.4.2 and 12.4.3
11 Cross Validation ILSR 5.1
12 Bootstrap ILSR 5.2, 5.3
13 Classification and Regression trees ISLR Chapter 8.1
14 Bagging and Random Forests ISLR Chapter 8.2.1, 8.2.24
15 Boosting ISLR 8.2.3
16 Ridge Regression and the LASSO ISLR 6.1, 6.2
17 Dimensionality reduction with PCA ILSR 12.2
Midterm 2 Session
18 PCA regression and PLS ISLR 6.3.1, 6.3.2
19 Gaussian Mixture Models (GMM) (see slides for references)
20 Neural Networks ISLR 10.1, 10.2
Review session

Lab Schedule

Lab Topic
1 An Introduction to R and R markdown
2 Assessing Regression Models. This will require you to download this clock auction data set. (see Lab 3.6 of ISLR for more examples)
3 Make predictions, analyze diagnostic plots, identify potential problems in multiple linear regression, and compare multiple regression models using the test MSE
4 Logistic Regression and Classification Simulation
5 LDA/QDA and classification metrics
6 Hierarcical and k-means clustering
7 Cross-validation and Bootstraping
8 Tree-based methods
9 Ridge Regression/LASSO and PCA
10 PCAreg, PLS and Neural Nets

Footnotes

  1. You can also open the navigation menu by pressing the M key.↩︎

  2. Note: This feature has only been confirmed to work in Google Chrome and Chromium.↩︎

  3. Multivariate Statistics with R by Paul J. Hewson↩︎

  4. For more details see Random Forests with R by Robin Genuer, Jean-Michel Poggi (2020)↩︎