DATA 311: Machine Learning

DATA 311: Machine Learning

Author
Affiliation

Dr. Irene Vrbik

University of British Columbia Okanagan

Welcome to Data 311: Machine Learning!

Lectures have been created using Quarto which includes a built in version of the reveal.js-menu plugin. You can access the navigation menu using the button located in the bottom left corner of the presentation1. Clicking the button opens a slide navigation menu that enables you to easily jump to any slide.

Print/Save to PDF:

Reveal presentations can be exported to PDF via a special print stylesheet.

  1. Toggle into Print View using the E key (or using the Navigation Menu)
  2. Open the in-browser2 print dialog (CTRL/CMD+P).
  3. Change the Destination setting to Save as PDF.
  4. Change the Layout to Landscape.
  5. Change the Margins to None.
  6. Enable the Background graphics option.
  7. Click Save 🎉

Below is a tentative week-by-week schedule:

Lecture Topic Supporting Reading
1 Welcome! Introduction To R, RStudio, and Quarto 🎥 How-to videos on installing R and Rstudio
📄 R Basics Cheat Sheet
2 Notation and Terminology ISLR Ch 1
Lab 0: A refresher on R and introduction to Quarto documents
3

Assessing Regression Models

-MSE and Testing vs. Training MSE
- Reducible vs. Irreducible Error

ISLR 2.2.1, 2.2.3

Good reads: A Gentle Intro to Model Selection for ML

4

Bias Variance Tradeoff

- Decomposition of MSE

ISLR 2.2.2
5 Linear Regression ISLR Section 3.1, 3.2
Lab 1: Assessing Regression Models ISLR 2.2
5b

Extensions to the linear regression model: Interaction, Categorical Predictors, Polynomial regression.

KNN Regression (non-parametric approach)

ILSR Section 3.3, 3.4, 3.5, Lab 3.6
Lab 2: Regression Diagnostics and Predictive Modeling: Exploring Linear, Polynomial, and KNN Regression
6 Logistic Regression ISLR Section 4.1, 4.2, 4.3; ESL 4.4; ILSR Section 2.2.3 (for assessing classification models)
Lab 3: Fitting and Evaluating Logistic Regression Models
7 Classification models: Bayes Classifier, KNN Classification and Discriminant Analysis ILSR Sections 2.2.3 and 4.4.1, 2, 3; ESL 4.3
Lab 4: Fitting and Assess Classification models (logistic regression, KNN and LDA/QDA).
8 Cross Validation ILSR 5.1
9 Bootstrap ILSR 5.2, 5.3
Lab 5: Cross-validation and the Bootstrap
10 Classification and Regression trees ISLR Chapter 8.1
11 Bagging and Random Forests ISLR Chapter 8.2.1, 8.2.23

Lab 6: Classification and Regression Trees (CART)

Lab 7: Bagging, Boosting, and Random Forests

12 Boosting ISLR 8.2.3, ESL chapter 10, gbm() vignette
13 Ridge Regression and the LASSO ISLR 6.1, 6.2
14 Distance measures: Euclidean Distance, Manhattan Distance, Mahalanobis Distance, Matching Binary Distance, Asymmetric Binary Distance, Gower’s Distance Ch 3 of MSR4
Lab 8: Ridge Regression and LASSO
15 Hierarchical Clustering and k means clustering ISLR 12.4.1, 12.4.2 and 12.4.3
Lab 9: Clustering Techniques
16 Dimensionality reduction with PCA ILSR 12.2
17 PCA regression and PLS ISLR 6.3.1, 6.3.2
Lab 10:
18 Neural Networks ISLR 10.1, 10.2
19 Gaussian Mixture Models (GMM) (see slides for references)
Review session

Footnotes

  1. For more details see Random Forests with R by Robin Genuer, Jean-Michel Poggi (2020)↩︎

  2. Multivariate Statistics with R by Paul J. Hewson↩︎

  3. For more details see Random Forests with R by Robin Genuer, Jean-Michel Poggi (2020)↩︎

  4. Multivariate Statistics with R by Paul J. Hewson↩︎