Lecture 1: Introduction

DATA 311: Machine Learning

Dr. Irene Vrbik

University of British Columbia Okanagan

Welcome!

Welcome to DATA 311: Machine Learning

DATA_O 311 (3) Machine Learning

Regression, classification, resampling, model selection and validation, fundamental properties of matrices, dimension reduction, tree-based methods, unsupervised learning. [3-2-0]

Prerequisite: Either (a) one of STAT 205, STAT 230 or (b) a score more than 75% in one of APSC 254, BIOL 202, PSYO 373; and one of COSC 111, APSC 177.

A little about me

  • I am currently a Tenure-track Assistant Professor of Teaching

  • I have taught a variety of courses (from introductory data science and to graduate courses in statistics) at several institutions (Guelph, McGill, MDS Program)

  • I am currently the Data Science Program advisor, Articulation, and curriculum representative

Where can you find me?

Office: SCI 104 email: irene.vrbik@ubc.ca

Websites: irene.vrbik.ok.ubc.ca

Educational Background

  1. McMaster University, BSc (Mathematics & Statistics)

  2. University of Guelph, MSc (Applied Statistics)

    Thesis: Using Individual-level Models to Model Spatio-temporal Combustion Dynamics. This involved modelling the spatio-temporal combustion dynamics of fire in a Bayesian framework. Supervisors: Rob Deardon and Zeng Feng.

  3. University of Guelph, PhD (Applied Statistics)

    Thesis: Non-Elliptical and Fractionally-Supervised Classification. This involved model-based classification with a particular emphasis on non-elliptical distributions. Supervisor: Paul D. McNicholas.

Experience

Postdoctoral Fellow at McGill University Under the supervision of Dr. David Stephens, this work focused on the statistical and computational challenges associated with analyzing genetic data. It involved clustering and modeling HIV DNA sequences.

Postdoctoral Fellow at UBCO Awarded by NSERC (Natural Sciences and Engineering Research Council of Canada), this research involved collaborations with faculty from several disciplines (eg. Medical Physics, Biology, and Chemistry) and was supervised by Dr. Jason Loeppky.

Instructor at UBCO a three-year contract position in the Department of Computer Science, Mathematics, Physics, and Statistics.

Research Interests

Statistics and Machine Leaning in Curriculum Design

  • e.g. topics modeling in Data Science course calendars

Curricular Analytics the systematic analysis and evaluation of educational curricula to gain insights into various aspects of curriculum design, delivery, and assessment.

  • e.g. metric calculation for various pathways, curriculum visualization, course recommendation systems

Tools for teaching, learning, and technology

  • e.g. Prairie Learn: online problem-driven learning system for creating homework and tests

Course Syllabus

The course syllabus is a dynamic document which has been posted to Canvas and course website. Many administrative questions can be answered there.

Course Tools

Canvas will be use for most course related material:

  • Grades
  • Assignments (downloading/submitting)
  • Course announcements/discussions
  • Supplementary files (eg. data sets, code, etc…)

Lectures

  • Lectures will be posted on our course webpage.

  • Take time to learn how to:

    • navigate through the slides

    • export to PDF (good for tablet annotation)

    • use the clipboard (example Clipboard code HTML only)

Programming Language

  • Any necessary coding will be done in R:
  • Relevant code will be posted to Supplmentary Files Canvas when necessary. Most relevant pieces will be embedded in the slides and/or included in Labs
  • It is also recommended that you complete assignments using Rmarkdown in RStudio.

Clipboard code

How to use the clipboard

Hover over the code block below and you will see a copy icon in the top-right corner. This will copy the code to your clipboard for easy pasting into your R session. ⚠️ This feature only works in HTML output

x <- c(2, 4, 6, 8)
mean(x)
[1] 5
x + 2
[1]  4  6  8 10

Paged Data Frames

iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
12           4.8         3.4          1.6         0.2     setosa
13           4.8         3.0          1.4         0.1     setosa
14           4.3         3.0          1.1         0.1     setosa
15           5.8         4.0          1.2         0.2     setosa
16           5.7         4.4          1.5         0.4     setosa
17           5.4         3.9          1.3         0.4     setosa
18           5.1         3.5          1.4         0.3     setosa
19           5.7         3.8          1.7         0.3     setosa
20           5.1         3.8          1.5         0.3     setosa
21           5.4         3.4          1.7         0.2     setosa
22           5.1         3.7          1.5         0.4     setosa
23           4.6         3.6          1.0         0.2     setosa
24           5.1         3.3          1.7         0.5     setosa
25           4.8         3.4          1.9         0.2     setosa
26           5.0         3.0          1.6         0.2     setosa
27           5.0         3.4          1.6         0.4     setosa
28           5.2         3.5          1.5         0.2     setosa
29           5.2         3.4          1.4         0.2     setosa
30           4.7         3.2          1.6         0.2     setosa
31           4.8         3.1          1.6         0.2     setosa
32           5.4         3.4          1.5         0.4     setosa
33           5.2         4.1          1.5         0.1     setosa
34           5.5         4.2          1.4         0.2     setosa
35           4.9         3.1          1.5         0.2     setosa
36           5.0         3.2          1.2         0.2     setosa
37           5.5         3.5          1.3         0.2     setosa
38           4.9         3.6          1.4         0.1     setosa
39           4.4         3.0          1.3         0.2     setosa
40           5.1         3.4          1.5         0.2     setosa
41           5.0         3.5          1.3         0.3     setosa
42           4.5         2.3          1.3         0.3     setosa
43           4.4         3.2          1.3         0.2     setosa
44           5.0         3.5          1.6         0.6     setosa
45           5.1         3.8          1.9         0.4     setosa
46           4.8         3.0          1.4         0.3     setosa
47           5.1         3.8          1.6         0.2     setosa
48           4.6         3.2          1.4         0.2     setosa
49           5.3         3.7          1.5         0.2     setosa
50           5.0         3.3          1.4         0.2     setosa
51           7.0         3.2          4.7         1.4 versicolor
52           6.4         3.2          4.5         1.5 versicolor
53           6.9         3.1          4.9         1.5 versicolor
54           5.5         2.3          4.0         1.3 versicolor
55           6.5         2.8          4.6         1.5 versicolor
56           5.7         2.8          4.5         1.3 versicolor
57           6.3         3.3          4.7         1.6 versicolor
58           4.9         2.4          3.3         1.0 versicolor
59           6.6         2.9          4.6         1.3 versicolor
60           5.2         2.7          3.9         1.4 versicolor
61           5.0         2.0          3.5         1.0 versicolor
62           5.9         3.0          4.2         1.5 versicolor
63           6.0         2.2          4.0         1.0 versicolor
64           6.1         2.9          4.7         1.4 versicolor
65           5.6         2.9          3.6         1.3 versicolor
66           6.7         3.1          4.4         1.4 versicolor
67           5.6         3.0          4.5         1.5 versicolor
68           5.8         2.7          4.1         1.0 versicolor
69           6.2         2.2          4.5         1.5 versicolor
70           5.6         2.5          3.9         1.1 versicolor
71           5.9         3.2          4.8         1.8 versicolor
72           6.1         2.8          4.0         1.3 versicolor
73           6.3         2.5          4.9         1.5 versicolor
74           6.1         2.8          4.7         1.2 versicolor
75           6.4         2.9          4.3         1.3 versicolor
76           6.6         3.0          4.4         1.4 versicolor
77           6.8         2.8          4.8         1.4 versicolor
78           6.7         3.0          5.0         1.7 versicolor
79           6.0         2.9          4.5         1.5 versicolor
80           5.7         2.6          3.5         1.0 versicolor
81           5.5         2.4          3.8         1.1 versicolor
82           5.5         2.4          3.7         1.0 versicolor
83           5.8         2.7          3.9         1.2 versicolor
84           6.0         2.7          5.1         1.6 versicolor
85           5.4         3.0          4.5         1.5 versicolor
86           6.0         3.4          4.5         1.6 versicolor
87           6.7         3.1          4.7         1.5 versicolor
88           6.3         2.3          4.4         1.3 versicolor
89           5.6         3.0          4.1         1.3 versicolor
90           5.5         2.5          4.0         1.3 versicolor
91           5.5         2.6          4.4         1.2 versicolor
92           6.1         3.0          4.6         1.4 versicolor
93           5.8         2.6          4.0         1.2 versicolor
94           5.0         2.3          3.3         1.0 versicolor
95           5.6         2.7          4.2         1.3 versicolor
96           5.7         3.0          4.2         1.2 versicolor
97           5.7         2.9          4.2         1.3 versicolor
98           6.2         2.9          4.3         1.3 versicolor
99           5.1         2.5          3.0         1.1 versicolor
100          5.7         2.8          4.1         1.3 versicolor
101          6.3         3.3          6.0         2.5  virginica
102          5.8         2.7          5.1         1.9  virginica
103          7.1         3.0          5.9         2.1  virginica
104          6.3         2.9          5.6         1.8  virginica
105          6.5         3.0          5.8         2.2  virginica
106          7.6         3.0          6.6         2.1  virginica
107          4.9         2.5          4.5         1.7  virginica
108          7.3         2.9          6.3         1.8  virginica
109          6.7         2.5          5.8         1.8  virginica
110          7.2         3.6          6.1         2.5  virginica
111          6.5         3.2          5.1         2.0  virginica
112          6.4         2.7          5.3         1.9  virginica
113          6.8         3.0          5.5         2.1  virginica
114          5.7         2.5          5.0         2.0  virginica
115          5.8         2.8          5.1         2.4  virginica
116          6.4         3.2          5.3         2.3  virginica
117          6.5         3.0          5.5         1.8  virginica
118          7.7         3.8          6.7         2.2  virginica
119          7.7         2.6          6.9         2.3  virginica
120          6.0         2.2          5.0         1.5  virginica
121          6.9         3.2          5.7         2.3  virginica
122          5.6         2.8          4.9         2.0  virginica
123          7.7         2.8          6.7         2.0  virginica
124          6.3         2.7          4.9         1.8  virginica
125          6.7         3.3          5.7         2.1  virginica
126          7.2         3.2          6.0         1.8  virginica
127          6.2         2.8          4.8         1.8  virginica
128          6.1         3.0          4.9         1.8  virginica
129          6.4         2.8          5.6         2.1  virginica
130          7.2         3.0          5.8         1.6  virginica
131          7.4         2.8          6.1         1.9  virginica
132          7.9         3.8          6.4         2.0  virginica
133          6.4         2.8          5.6         2.2  virginica
134          6.3         2.8          5.1         1.5  virginica
135          6.1         2.6          5.6         1.4  virginica
136          7.7         3.0          6.1         2.3  virginica
137          6.3         3.4          5.6         2.4  virginica
138          6.4         3.1          5.5         1.8  virginica
139          6.0         3.0          4.8         1.8  virginica
140          6.9         3.1          5.4         2.1  virginica
141          6.7         3.1          5.6         2.4  virginica
142          6.9         3.1          5.1         2.3  virginica
143          5.8         2.7          5.1         1.9  virginica
144          6.8         3.2          5.9         2.3  virginica
145          6.7         3.3          5.7         2.5  virginica
146          6.7         3.0          5.2         2.3  virginica
147          6.3         2.5          5.0         1.9  virginica
148          6.5         3.0          5.2         2.0  virginica
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica

Warning

This interactivity will only work on the HTML version of the slides, not PDF.

Why R?

Pros

  • exposure to R in Statistics prerequisite course
  • Rich Ecosystem
  • Reproducibility
  • Textbook

Cons

  • Steep learning curve
  • Performance
  • Package Quality
  • Limited Industry Adoption

Lab Delivery

  • Labs will be held in person; students must be enrolled in a lab (which cannot conflict with other courses)

  • TAs provide guidance on carrying out analyses in R for the techniques discussed in lecture.

  • Knowledge of commands and programming techniques will be evaluated throughout the course.

  • Follow the instructions carefully and practice skills by completing (and redoing!) labs and assignments

Textbook

The main textbook reference for this course is:

ISLR: An Introduction to Statistical Learning with Applications in R (Second Edition). By: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani.

This book is available for free at statlearning.com; see resources here.

A secondary (less referenced) textbook is:

ESL: The Elements of Statistical Learning: data mining, inference, and prediction, 2nd edition. By: Hastie, Tibshirani, Friedman.

Can be downloaded for free at: hastie.su.domains/ElemStatLearn.

Lecture format

  • Slides will occasionally be supplemented with handwritten material.
  • Aside for doodling, substantial written material will be done digitally (on my iPad) and uploaded to Canvas.
  • Lectures may also include discussions which you will only gain access to by attending class.
  • You will not get the whole story by reading the slides!

Class Etiquette

  1. Please be respectful, especially to other students
  2. Please be present. Attendance will not be taken, but you are encouraged to come and learn together.
  3. Please restrict the use of electronic devices to course related material; other content could be distracting.
  4. Please be forgiving; instructors are people too, we will make mistakes.

Course Questions

In class

  • If you are stuck on a concept during lecture, please feel free to raise your hand and ask for clarification.
  • If you are needing help understanding something, chances are, other students are too!
  • I will do my best to answer questions on the fly or organize a more thoughtful answer to be presented first thing next class or posted to Canvas.

Course Questions

Outside of class

Outside of class, the general order in which I would suggest you asking course-related questions is:

  1. Consult the course syllabus
  2. Post your question on the public forum on Canvas*
  3. Come see me during student hours or visit your TA during lab (whichever comes first)
  4. e-mail (weekdays are best)

Machine Learning

Machine learning (ML) is a subfield of artificial intelligence (AI) that uses algorithms and statistical models to learn from data to perform complex tasks.

  • e.g. recommend TV shows you might like, determine if an e-mail is spam or not, predict the selling price of a home.

How does Machine Learning work

Machine Learning has been described1 as:

“The field of study that makes computers capable of learning without being explicitly programmed.”

Cats

Dogs

Traditional Programing

Input:

e.g. If claws are sharp and nose is small …

Output: cat

Machine Learning

In supervised machine learning the computer must learn these distinguishing patterns for itself in order to determine a set of rules by which future data will be sorted.

General concepts

  • To continue with the cats and dogs, the more examples a human is given, the better they would become at distinguishing between the two species.
  • The more variety in the samples, the easier it may become in detecting patterns and ultimately predicting the result.
  • This process is often iterative.

Why is ML important?

  • Many of the statistical techniques you’ve (probably) learned thus far are either completely inapplicable to much of this data, or only applicable on a subset.
  • With growing access to data and computing power, models can be built faster than ever and used in countless fields to gain useful insights.
  • This course will guide you through a few of the classical approaches in Machine Learning (ML)

Supervised ML

Involves training a model on a labeled examples. There are two main goals:

  • Classification: Assigning unseen examples into to predefined categories (e.g., spam vs. not spam).

  • Regression: Predicting continuous values (e.g., predicting house prices based on features).

Unsupervised ML

Involves training a model on unlabeled examples.

  • Clustering: Grouping similar data points into clusters (e.g., customer segmentation).
  • Dimensionality Reduction: Reducing the number of features while retaining the most important information (e.g., reduce image size).

iClicker

Students are expected to participate in iClicker questions by enrolling at: https://join.iclicker.com/YUHA

iClicker Example

iClicker Test Question (Not for Marks)

How are you feeling about your first semester so far?

  1. 🎉 Excited and ready

  2. 😅 A little overwhelmed

  3. 🤔 Still figuring things out

  4. ☕ Ask me after more coffee

If you cannot answer this question please follow the iClicker Cloud Student Guide: https://lthub.ubc.ca/guides/iclicker-cloud-student-guide/