Lecture 1: Introduction
University of British Columbia Okanagan
Welcome to DATA 311: Machine Learning TEST!
Regression, classification, resampling, model selection and validation, fundamental properties of matrices, dimension reduction, tree-based methods, unsupervised learning. [3-2-0]
Prerequisite: Either (a) one of STAT 205, STAT 230 or (b) a score more than 75% in one of APSC 254, BIOL 202, PSYO 373; and one of COSC 111, APSC 177.
I am currently a Tenure-track Assistant Professor of Teaching
I have taught a variety of courses (from introductory data science and to graduate courses in statistics) at several institutions (Guelph, McGill, MDS Program)
I am currently the Data Science Program advisor, Articulation, and curriculum representative
Office: SCI 104 email: irene.vrbik@ubc.ca
Websites: irene.quarto.pub, irene.vrbik.ok.ubc.ca
McMaster University, BSc (Mathematics & Statistics)
University of Guelph, MSc (Applied Statistics)
Thesis: Using Individual-level Models to Model Spatio-temporal Combustion Dynamics. This involved modelling the spatio-temporal combustion dynamics of fire in a Bayesian framework. Supervisors: Rob Deardon and Zeng Feng.
University of Guelph, PhD (Applied Statistics)
Thesis: Non-Elliptical and Fractionally-Supervised Classification. This involved model-based classification with a particular emphasis on non-elliptical distributions. Supervisor: Paul D. McNicholas.
Postdoctoral Fellow at McGill University Under the supervision of Dr. David Stephens, this work focused on the statistical and computational challenges associated with analyzing genetic data. It involved clustering and modeling HIV DNA sequences.
Postdoctoral Fellow at UBCO Awarded by NSERC (Natural Sciences and Engineering Research Council of Canada), this research involved collaborations with faculty from several disciplines (eg. Medical Physics, Biology, and Chemistry) and was supervised by Dr. Jason Loeppky.
Instructor at UBCO a three-year contract position in the Department of Computer Science, Mathematics, Physics, and Statistics.
Statistics and Machine Leaning in Curriculum Design
Curricular Analytics the systematic analysis and evaluation of educational curricula to gain insights into various aspects of curriculum design, delivery, and assessment.
Tools for teaching, learning, and technology
The course syllabus is a dynamic document which has been posted to Canvas and course website. Many administrative questions can be answered there.
Canvas will be use for most course related material:
Lectures will be posted on our course webpage.
Take time to learn how to:
navigate through the slides
export to PDF (good for tablet annotation)
use the clipboard (example Clipboard code HTML only)
How to use the clipboard
Hover over the code block below and you will see a copy icon in the top-right corner:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Warning
This interactivity will only work on the HTML version of the slides, not PDF.
Labs will be held in person; students must be enrolled in a lab (which cannot conflict with other courses)
TAs provide guidance on carrying out analyses in R for the techniques discussed in lecture.
Knowledge of commands and programming techniques will be evaluated throughout the course.
Follow the instructions carefully and practice skills by completing (and redoing!) labs and assignments
The main textbook reference for this course is:
ISLR: An Introduction to Statistical Learning with Applications in R (Second Edition). By: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani.
This book is available for free at statlearning.com; see resources here.
A secondary (less referenced) textbook is:
ESL: The Elements of Statistical Learning: data mining, inference, and prediction, 2nd edition. By: Hastie, Tibshirani, Friedman.
Can be downloaded for free at: hastie.su.domains/ElemStatLearn.
Outside of class, the general order in which I would suggest you asking course-related questions is:
Machine learning (ML) is a subfield of artificial intelligence (AI) that uses algorithms and statistical models to learn from data to perform complex tasks.
Machine Learning has been described1 as:
“The field of study that makes computers capable of learning without being explicitly programmed.”
Input:
e.g. If claws are sharp and nose is small …
Output: cat
In supervised machine learning the computer must learn these distinguishing patterns for itself in order to determine a set of rules by which future data will be sorted.
Involves training a model on a labeled examples. There are two main goals:
Classification: Assigning unseen examples into to predefined categories (e.g., spam vs. not spam).
Regression: Predicting continuous values (e.g., predicting house prices based on features).
Involves training a model on unlabeled examples.
Students may choose to participate in iClicker questions by enrolling at: https://join.iclicker.com/BODT