Lecture 1: Introduction
University of British Columbia Okanagan
Welcome to DATA 311: Machine Learning
Regression, classification, resampling, model selection and validation, fundamental properties of matrices, dimension reduction, tree-based methods, unsupervised learning. [3-2-0]
Prerequisite: Either (a) one of STAT 205, STAT 230 or (b) a score more than 75% in one of APSC 254, BIOL 202, PSYO 373; and one of COSC 111, APSC 177.
I am currently a Tenure-track Assistant Professor of Teaching
I have taught a variety of courses (from introductory data science and to graduate courses in statistics) at several institutions (Guelph, McGill, MDS Program)
I am currently the Data Science Program advisor, Articulation, and curriculum representative
Office: SCI 104 email: irene.vrbik@ubc.ca
Websites: irene.quarto.pub, irene.vrbik.ok.ubc.ca
McMaster University, BSc (Mathematics & Statistics)
University of Guelph, MSc (Applied Statistics)
Thesis: Using Individual-level Models to Model Spatio-temporal Combustion Dynamics. This involved modelling the spatio-temporal combustion dynamics of fire in a Bayesian framework. Supervisors: Rob Deardon and Zeng Feng.
University of Guelph, PhD (Applied Statistics)
Thesis: Non-Elliptical and Fractionally-Supervised Classification. This involved model-based classification with a particular emphasis on non-elliptical distributions. Supervisor: Paul D. McNicholas.
Postdoctoral Fellow at McGill University Under the supervision of Dr. David Stephens, this work focused on the statistical and computational challenges associated with analyzing genetic data. It involved clustering and modeling HIV DNA sequences.
Postdoctoral Fellow at UBCO Awarded by NSERC (Natural Sciences and Engineering Research Council of Canada), this research involved collaborations with faculty from several disciplines (eg. Medical Physics, Biology, and Chemistry) and was supervised by Dr. Jason Loeppky.
Instructor at UBCO a three-year contract position in the Department of Computer Science, Mathematics, Physics, and Statistics.
Statistics and Machine Leaning in Curriculum Design
Curricular Analytics the systematic analysis and evaluation of educational curricula to gain insights into various aspects of curriculum design, delivery, and assessment.
Tools for teaching, learning, and technology
The course syllabus is a dynamic document which has been posted to our Canvas shell . Many administrative questions can be answered there:
A Flexible Assessment tool has been integrated into Canvas to allow students to choose how their final grades will be weighted (within predefined ranges).
This system was created for a Teaching & Learning Enhancement Fund (TLEF) project to streamline the processes involved in Flexible Assessment.
It is based on a Flexible Assessment approach devised by Dr. Candice Rideout, a Professor of Teaching in the Faculty of Land & Food Systems that has been shown to increase student satisfaction and self-regulation [1].
Grading Item | Default | Min Weight | Max Weight | Desired |
---|---|---|---|---|
Assignments | 20% | |||
(asgn 1) | 0 | 5 | [0--5] | |
(asgn 2) | 0 | 5 | [0--5] | |
(asgn 3) | 0 | 5 | [0--5] | |
(asgn 4) | 0 | 5 | [0--5] | |
Midterms | 40% | |||
(mid 1) | 0 | 20 | [0--20] | |
(mid 2) | 0 | 20 | [0--20] | |
Final Exam | 40% | 40 | 100 | [40--100] |
Total | 100% | must add to 100 |
You will select the desired weight of each grading item in the Flexible Assessment tab in Canvas (open after class)
For example, if you know that assignment 3 and midterm 2 falls in a heavy week for you, you might choose to move some or all of that weight to the final exam (see next slide for how that would look like in the Flexible Assessment).
Notice that that only the weight of the final exam can increase from the default.
Grading Item | Default | Allowable Range | Selected Weight |
---|---|---|---|
Assignments | |||
(asgn 1) | 5% | [0--5] | 5 |
(asgn 2) | 5% | [0--5] | 5 |
(asgn 3) | 5% | [0--5] | 0 |
(asgn 4) | 5% | [0--5] | 5 |
Midterms | |||
(mid 1) | 20% | [0--20] | 20 |
(mid 2) | 20% | [0--20] | 10 |
Final Exam | 40% | [40--100] | 40+10+5 = 55 |
Total | 100% | 100 |
Students may enter their desired percentages and comments from Sept 5 6:30 PM to September 18, 11:59 PM.
For students who do not enter desired percentages, final grades will be calculated using the default weighting scheme.
Canvas will be use for most course related material:
Lectures will be posted at irene.quarto.pub/data311-2023/. Take time to learn how to navigate through the slides, how to annotate them, and how to export to PDF.
A clipboard button appears when you hover over code1
Labs will be held in person.
Students must be enrolled in a lab (which cannot conflict with other courses)
TAs provide guidance on carrying out analyses in R for the techniques discussed in lecture.
Knowledge of commands and programming techniques will be evaluated throughout the course.
Follow the instructions carefully and practice skills by completing (and redoing!) labs and assignments
The main textbook reference for this course is:
ISLR: An Introduction to Statistical Learning with Applications in R (Second Edition). By: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani.
This book is available for free at statlearning.com; see resources here.
A secondary (less referenced) textbook is:
ESL: The Elements of Statistical Learning: data mining, inference, and prediction, 2nd edition. By: Hastie, Tibshirani, Friedman.
Can be downloaded for free at: hastie.su.domains/ElemStatLearn.
Outside of class, the general order in which I would suggest you asking course-related questions is:
Machine Learning is often described as:
“The field of study that makes computers capable of learning without being explicitly programmed.”
The above is attributed to Arthur Lee Samuel, an early American leader in AI, who also happened to coin the term “Machine Learning” in 1959 while at IBM.
Input:
e.g. If claws are sharp and nose is small …
Output: cat
In machine learning the computer must learn these distinguishing patterns for itself in order to determine a set of rules by which future data will be sorted.
Prediction
We will manly stay in the prediction landscape, but there many other examples of ML: