Suggested Practice Problems

Legend

  • ๐Ÿ”ด Very Important
  • ๐ŸŸ  Important
  • ๐ŸŸก Kind of Important
  • ๐ŸŸข Not Very Important

Anything with a strikeout will not be tested on the final exam.

1: Introduction to Statistics

Descriptive statistics

Suggested Practice Problems
  • JB exercises (solutions found here) Ch 3: 2, 7, 10, 14
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.27,
  • ๐ŸŸ  LO: able to accurately calculate and interpret key descriptive statistics, including:
    • measures of central tendency (mean, median, mode) and
    • measures of variability (range, variance, standard deviation).
    • measures of location of data (quartiles and percentiles)
  • ๐Ÿ”ด LO: proficient in creating and analyzing graphical representations of data, such as
    • histograms, box plots, and scatter plots, to summarize and describe data distributions effectively.

Principles of random sampling

Suggested Practice Problems
  • JB exercises (solutions found here) Ch 2: 1, 2, 3, 4, 9, 15, 16, 20.
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 1: 1.13, 1.15, 1.17,
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.5, 2.6
  • ๐ŸŸ  LO: Understand the principles of probability sampling and how they form the basis for making statistical inferences from a sample to a population.
  • ๐Ÿ”ด LO: Understand difference between sample (of size \(n\)) and a population (either โ€œinfiniteโ€ or size \(N\))
  • ๐Ÿ”ด LO: Understand difference between sample statistics (e.g. \(\bar x\), \(s\), \(\hat p\)) and a population parameters (e.g. \(\mu\), \(\sigma\), \(p\))
  • ๐ŸŸข LO: Distinguish between different sampling designs (e.g. simple random, stratified, cluster)
  • ๐ŸŸก LO: Identify a target population

Types of data

Suggested Practice Problems
  • ๐Ÿ”ด LO: Identify variables as numerical (continuous or discrete) and categorical (nominal or ordinal)
  • ๐ŸŸ  LO: Differentiate between variables that are associated (positive or negative) and those that are independent.
  • ๐ŸŸ  LO: Detect outliers in various types of data set using graphical methods (box plots, scatter plots)
  • ๐ŸŸข LO: Understand the difference between observational studies and experiments

2: Summarizing Data

Suggested Practice Problems
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.10, 2.11, 2.12, 2.13, 2.15, 2.17
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.28, 2.31, 2.33, 2.34
  • JB exercises (solutions found here) Ch 3 Exercises: 31, 32, 34, 39, 36, 37, 39, 40, 41, 45, 46, 48, 49, 50

Storing Data

  • ๐ŸŸก LO: Describe and identify the basic data types in R (vectors, factors (ordered or unordered), lists, and data frames (with observations typically in rows, and variables in columns)
  • ๐ŸŸก LO: Data indexing: index vectors using [] and columns in a data frame using $ ; see note1
  • ๐ŸŸข LO: Understand the character data type (character or string) and logical data type in R (TRUE or FALSE):
  • ๐ŸŸก LO: Apply coercion in R to coerce data types in R to the appropriate data type (e.g. the as.factor() to convert a numeric vector to a factor.
  • ๐ŸŸข LO: Displaying data (using str(), head(), or View()
  • ๐ŸŸ  LO: construct and interpret contingency tables (along with marginals) to organize and two categorical variables.
  • ๐ŸŸก LO: Define a robust statistic (e.g. median, IQR) as measures that are not heavily affected by skewness and extreme outliers, and determine when they are more appropriate measured of center and spread compared to other similar statistics.

Plotting Data

  • ๐ŸŸก LO: create simple plots using functions like plot(), hist(), boxplot(), etc
  • ๐ŸŸข LO: create advanced plots like stacked/side-by-side bar plots, side-by-side boxplots
  • ๐ŸŸข LO: customize plot appearance by modifying attributes such as colors, labels, titles, axis limits, line types, etc.
  • ๐ŸŸข LO: explore themes and packages (like ggplot2) for more advanced and polished visualizations.
  • ๐ŸŸก LO: recognize and describe the common shapes of data distributions, including normal, skewed (right or left), uniform, and bimodal distributions.
  • ๐ŸŸ  LO: visually identify and interpret and estimate key statistical metrics such as the mean, mode, and interquartile range (IQR) from graphical representations including histograms and box plots.

3: Sampling Distributions

Suggested Practice Problems
  • JB exercises (solutions found here) Ch 7: 1, 6, 7, 8, 9, 10, 11, 12, 13, 19, 20, 21, 24, 25
  • ๐Ÿ”ด LO: Explain the concept of a sampling distribution and its importance in statistical inference.
  • ๐Ÿ”ด LO: Define the Central Limit Theorem and its significance in statistical theory.
  • ๐Ÿ”ด LO: Understand the conditions under which the CLT applies and its implications for sample means and proportions
  • ๐Ÿ”ด LO: Define standard error as the standard deviation of a sampling distribution, representing the variability of sample statistics around the population parameter.
  • ๐Ÿ”ด LO: Explain the conceptual difference between standard error and standard deviation, emphasizing their respective roles in describing variability in populations and samples.
  • ๐ŸŸข LO: Derive the sampling distributions for the sample mean, proportion, and variance
  • ๐ŸŸ  LO: Use the sampling distribution of sample statistic to create point estimates and confidence intervals.
  • ๐ŸŸ  LO: Apply knowledge of sampling distributions to the practical applications of hypothesis testing and constructing confidence intervals.

4: Getting Started with Quarto

Suggested Practice Problems
  • Wickham, ร‡etinkaya-Rundel, and Grolemund (2023) 28.3.1: 1, 2, 3; 28.5.5: 1, 2; 28.6.3: 1, 2, 3
  • ๐Ÿ”ด LO: Understand the advantages of using Quarto for reproducible document generation
  • ๐Ÿ”ด LO: Create .qmd documents using RStudio and demonstrate the ability to integrate:
    • executable code chunks, in-line code,
    • embedded figures and images
    • basic Markdown syntax (e.g. tables, headers, bold, italics, lists)
  • ๐ŸŸก LO: describe the key features of the YAML
  • ๐ŸŸข LO: identify and use common keyboard shortcuts
  • ๐ŸŸก LO: navigate the RStudio interface proficiently and explain its major components, including the script editor, console, environment pane, and visualization tools.
  • ๐ŸŸก LO: understand and customize code chunk options (eg. echo option controls whether the code within a code chunk is displayed in the output document.)
  • ๐ŸŸข LO: LaTeX equations
  • ๐ŸŸ  LO: explain the importance of setting a seed in random number generation for reproducibility
  • ๐ŸŸก LO: demonstrate the ability to use the set.seed()

5: Likelihood and Parameter Estimation

Suggested Practice Problems
  • JB exercises (solutions found here) Ch 7: 14, 15
  • Rice (2007) Section 8.10: 4, 5, 6, 7 (excluding part d), 16, 21, 27, 47, 50, 52, 60
  • ๐Ÿ”ด LO: Define, calculate, and identify point estimators
  • ๐Ÿ”ด LO: Define, construct, and interpret confidence intervals
  • ๐ŸŸ  LO: define and describe the Method of Moments for parameter estimation.
  • ๐ŸŸ  LO: use sample moments (such as sample means, variances, and higher moments) to estimate the parameters of a specified distribution.
  • ๐ŸŸข LO: derive moment equations (moments will be provided on an exam if needed)
  • ๐ŸŸ  LO: Define likelihood (and log-likelihood) in the context of statistical inference.
  • ๐ŸŸ  LO: interpret the likelihood function as a tool for statistical inference and its difference from probability.
  • ๐Ÿ”ด LO: Derive a maximum likelihood estimator (MLE)
  • ๐ŸŸก LO: Define and explain common considerations in statistical estimation, including bias, consistency, efficiency, sufficiency and asymptotic normality.

6/7: Confidence Intervals for Means, Proportions, and Variance

Suggested Practice Problems
  • JB exercises (solutions found here) Ch 8:

    Breakdown by

    • interpretation 4, 5, 6, 18, 19, 24, 35
    • CI for \(\mu\) with \(\sigma\) unknown: 8, 9, 10, 11, 13, 14, 26, 27, 29, 31, 32, 34, 42
    • basic calculations: 1, 2, 3, 7, 15, 16, 21
  • CI for \(p\): Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 6: 6.1, 6.5, 6.7, 6.9, 6.10, 6.11, 6.13, 6.15

  • CI for \(\sigma\) Ramachandran and Tsokos (2020) Ch 5: 5.6.1, 5.6.3, 5.6.5, 5.6.7, 5.6.11

  • ๐ŸŸ  LO: Understand what a pivotal quantity is and explain its role in statistical inference,
  • ๐Ÿ”ด LO: Construct a confidence interval given a particular confidence level; either in (a, b) form or in the form point estimate \(\pm\) margin of error.
  • ๐Ÿ”ด LO: Identify and compute a margin of error
  • ๐Ÿ”ด LO: Calculate the standard error for sample statistics using appropriate formulas.
  • ๐Ÿ”ด LO: Interpret a given confidence interval as the plausible range of values for a population parameter (e.g. \(\mu\), \(p\), or \(\sigma^2\)) in the context of probability and uncertainty โ€œWe are XX% confident that the true population parameter is in this intervalโ€, where XX% is the desired confidence level
  • ๐ŸŸ  LO: Understand and describe why we use โ€œconfidenceโ€ instead of the term โ€œprobabilityโ€
  • ๐ŸŸ  LO: Determine appropriate sample sizes based on desired confidence levels of precision (margin of error).
  • ๐ŸŸ  LO: Check the assumptions required for using this method (e.g. success-failure check: \(np \geq 10\) and \(n(1-p) \geq 10\))
  • ๐ŸŸ  LO: Identify factors that influence the margin of error (effectively the width) of a confidence intervals, including sample size, confidence level, and population variability.
  • ๐ŸŸ  LO: Calculate the required minimum sample size for a given margin of error at a given confidence level
  • ๐ŸŸข LO: Derive confidence intervals from the knowledge of sampling distributions and probability statements.
  • ๐ŸŸ  LO: Understand the relationship between confidence intervals and hypothesis testing.
  • ๐ŸŸ  LO: Use confidence intervals to make inferences about population parameters
  • ๐Ÿ”ด LO: Interpret QQ plots to assess the normality assumption
  • ๐Ÿ”ด LO: Extract critical values from Z-tables, t-tables and chi-square tables.
  • ๐Ÿ”ด LO: Approximate probabilities from Z-tables, t-tables and chi-square tables.
  • ๐Ÿ”ด LO: Demonstrate the ability to use functions like qnorm(), qt(), and qchisq() to find critical values

Non-parameter confidence intervals

Suggested Practice Problems

Ramachandran and Tsokos (2020) 12.1, 12.4, 12.5

  • ๐ŸŸ  LO: Explain what nonparametric tests are and identify situations where they are more appropriate than parametric tests.

  • LO: Construct a non-parameter confidence interval for the median

  • LO: Construct a non-parameter confidence interval for the variance using resampling (bootstrap) methods

8: Sampling Distribution Theory

Suggested Practice Problems
  • ๐ŸŸ  LO: Define sampling distribution and its significance in inferential statistics.
  • ๐ŸŸข LO: Develop an understanding of how the CDF, PDF and MGF can be used to derive relationships between different types of random variables
  • ๐ŸŸข LO: Gain hands-on experience in proving the distribution characteristics of pivotal quantities

9: Sampling from Finite Populations

Suggested Practice Problems

NA you can skip this section

  • ๐ŸŸก LO: Define finite population sampling and its significance in survey research and applied statistics.
  • ๐ŸŸ  LO: Define a simple random sample (SRS) from a population
  • ๐Ÿ”ด LO: Explain the importance of a simple random sampling (SRS) for statistical inference
  • ๐ŸŸ  LO: Describe the difference between a simple random sample without replacement (SRS) vs simple random sample with replacement (SRSWR)
  • ๐ŸŸก LO: Explain the differences between sampling from finite populations and sampling from infinite populations.
  • ๐ŸŸก LO: Understand how finite population characteristics influence sampling design, estimation, and inference.
  • ๐ŸŸ  LO: Define the finite population correction factor and its role in adjusting variance estimates for SRSWOR samples from finite populations
  • ๐ŸŸ  LO: apply the finite population correction factor to correct standard errors and confidence intervals for finite populations.

10: Properties of Parameter Estimators

Suggested Practice Problems

Devore, Berk, and Carlton (2021) 7.4: 51, 52, 53, 54, 55, 56,โ€ฆ

Study the concepts from lecture, there will be not long answer questions about these concepts.

  • ๐Ÿ”ด LO: Define an unbiased estimator
  • ๐Ÿ”ด LO: Determine whether a given estimator is unbiased
  • ๐ŸŸ  LO: Define and interpret2 the Mean Squared Error (MSE);
  • ๐ŸŸก LO: Understanding the decomposition of MSE and its significance in evaluating the performance of estimators.
  • ๐ŸŸ  LO: Calculate and compare the relative efficiency of two estimators
  • ๐ŸŸ  LO: understanding how to determine which estimator provides more precise estimates under given conditions.
  • ๐ŸŸก LO: Understand the concept of consistency in estimators (know that MLEโ€™s are consistent under certain conditions)
  • ๐ŸŸก LO: Define the Minimum Variance Unbiased Estimator (MVUE) within the context of statistical estimation.
  • ๐ŸŸก LO: Apply CRLB to identify if a unbiased estimator achieves is MVUE
  • ๐ŸŸ  LO: Identify and derive MVUEs using the Cramรฉr-Rao Lower bound (CRLB) theorem and understand the conditions under which an unbiased estimator achieves minimum variance.
  • ๐ŸŸ  LO: Use the Cramรฉr-Rao Lower Bound to find the lower bound of the variance of unbiased estimators
  • ๐ŸŸข LO: Define Fisherโ€™s information
  • ๐ŸŸข LO: Explain the significance of Fisherโ€™s information in statistical inference and parameter estimation.

11/12/13: Hypothesis Testing for one-sample

Steps involved in hypothesis testing:

1. State the Hypotheses:

  • Null Hypothesis (\(H_0\)โ€‹): Represents the default assumption, often stating that there is no effect or no difference (i.e. no-change or status-quo).
  • Alternative Hypothesis(\(H_A\)โ€‹): Represents the claim or research question you are testing for, stating that there is an effect or a difference.
  • Always construct hypotheses about population parameters (e.g. population mean \(\mu\)) and not the sample statistics (e.g. sample mean, \(\bar x\))

2. Identify/choose the Significance Level (\(\alpha\))

  • This is the probability of rejecting the null hypothesis when it is actually true.
  • Common choices for \(\alpha\) include 5% (most common) or 1%, 2%, or 10%
  • Unless otherwise specified, you can assume \(\alpha = 0.05\)

3. Select a Statistical Test

  • Choose an appropriate statistical test based on the type of data and the hypothesis being tested. Common tests include t-tests, z-tests, chi-square tests, ANOVA, etc.

4. Collect Data and Calculate Test Statistic

  • Collect a sample of data from the population of interest.
  • Calculate the appropriate test statistic based on the chosen test and the sample data.

5. Determine the Distribution of the Test Statistic

  • Identify the null distribution, i.e. the distribution of the test statistic โ€œunder the null hypothesisโ€ (i.e. assume \(H_0\) is true).

6. Calculate Critical Value or p-value

  • Based on the assumed sampling distribution of the test statistic, calculate either the critical value or the \(p\)-value.
    • You should be able to do this using the appropriate table as well as in R.

7. Make a Decision

  • For the critical value approach: if the test statistic falls with the rejection region, reject \(H_0\) in favour of \(H_A\), otherwise, fail to reject \(H_0\).
  • For the \(p\)-value approach: if the \(p\)-value is less than the significance level (\(\alpha\)), reject \(H_0\) in favour of \(H_A\), otherwise, fail to reject \(H_0\).
  • Note that we can never โ€œacceptโ€ the null hypothesis since the hypothesis testing framework does not allow us to confirm it.

8. Draw Conclusions

  • Based on the decision in step 7, draw conclusions about the population parameter(s) being tested.
  • Note that your interpretation must always be in context of the data โ€“ mention what the population is and what the parameter is (e.g. mean or proportion) and be sure to make reference to your alternative hypothesis.
Suggested Practice Problems
  • Test for proportions : Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 6 6.11
  • JB exercises (solutions found here) Chapter 9:
    • Tests for \(\mu\) when \(\sigma\) is known: : 1, 5, 6, 10, 12, 13, 15,
    • Test for \(\mu\) when \(\sigma\) is unknown: 30, 33, 35, 36, 37, 38, 39, 40, 41, 44, 47, 48, 56, 59, 60 61, 62, 63, 64, 65, 66, 67
  • ๐Ÿ”ด LO: Explain the concepts of null (\(H_0)\)and alternative hypotheses (\(H_A\)), test statistics, significance levels, and p-values.
  • ๐Ÿ”ด LO: Formulate the appropriate null hypothesis (either in symbols or in words) and and alternative hypotheses given a word problem (determine the appropriate direction of the alternative: it is upper-tailed, lower-tailed, or a two-sided hypothesis test)
  • ๐ŸŸ  LO: Define Type I and Type II errors. Note that the conclusion of a hypothesis test might be erroneous regardless of the decision we make.
    • Type 1 error is the probability of rejecting the null hypothesis when the null hypothesis is actually true.
    • Type 2 error is the probability of failing to reject the null hypothesis when the alternative hypothesis is actually true.
  • ๐Ÿ”ด LO: Define the significance level (alpha) and explain its role in hypothesis testing
  • ๐Ÿ”ด LO: Identify the appropriate test statistic for a given problem
  • ๐ŸŸ  LO: Define the null distribution and explain its role in hypothesis testing.
  • ๐Ÿ”ด LO: Calculate and identify an observed test statistic given some data
  • ๐ŸŸ  LO: Understand how sample size impacts the SE of point estimators
  • ๐Ÿ”ด LO: Define sample statistic as a point estimate for a population parameter, (e.g. the sample mean \(\bar x\) is used to estimate the population mean \(\mu\)) and note that point estimate and sample statistic are synonymous.
  • ๐Ÿ”ด LO: Explain the theoretical foundations of z-tests and t-tests, including their assumptions, conditions, and applicability
  • ๐Ÿ”ด LO: Explain why the t-distribution helps make up for the additional variability introduced by using s (sample standard deviation) in calculation of the standard error, in place of ฯƒ (population standard deviation).
  • ๐Ÿ”ด LO: Identify and check the assumptions and conditions necessary for valid z-tests and t-tests
    • Note: the independence of observations in a sample is provided by a simple random sampling design.
    • ๐Ÿ”ด LO: Use graphical methods (e.g., Q-Q plots, histograms) to assess the normality assumption
  • ๐Ÿ”ด LO: know when to use a \(t\)-test vs. \(z\)-test (refer to flowchart)
  • ๐Ÿ”ด LO: Describe the different characteristics of the standard normal (i.e. \(Z\)-distribution) as compared to the Student \(t\) distribution.
    • e.g. the \(t\)-distribution has a single parameter, degrees of freedom, and as the degrees of freedom increases this distribution approaches the normal distribution.
  • ๐Ÿ”ด LO: Perform one-sample z-tests and t-tests including assumptions checking, and steps outlined above.
  • ๐ŸŸ  LO: Understand the connection between 100*(1- \(\alpha\))% CI and two-sided hypothesis tests.
  • ๐Ÿ”ด LO: Define a \(p\)-value as the conditional probability of obtaining a sample statistic at least as extreme as the one observed given that the null hypothesis is true, i.e. \(\Pr(\text{observed or more extreme sample statistic | } H_0 \text{ true})\)
  • ๐ŸŸ  LO: Visualize on a null distribution the rejection region (values of the test statistic for which \(H_0\) will be rejected) and/or \(p\)-values (area under the curves)

15/16: Inference for Two Samples

Suggested Practice Problems
  • JB exercises (solutions found here) Chapter 10:
    • : 4, 6, 7, 9, 10, 11, 17, 23, 26, 27, 30, 31, 33
  • Diez, Barr, and ร‡etinkaya-Rundel (2016)

Many of the Learning Outcomes from 11/12/13: Hypothesis Testing for one-sample will carry over to this unit. In addition we have:

  • ๐ŸŸ  LO: Explain the objectives and applications of inference for two samples in research and data analysis.
  • ๐Ÿ”ด LO: Differentiate between independent and dependent populations.
  • ๐Ÿ”ด LO: Differentiate between the three types of \(t\)-tests: paired \(t\)-tests, Welch procedure and Pooled \(t\)-tests (refer to flowchart)
  • ๐Ÿ”ด LO: Perform two-sample \(t\)-tests, including assumptions checking, and steps outlined above.
  • ๐Ÿ”ด LO: Perform the two-sample \(t\)-tests in R using the t.test() function
    • Specify the appropriate arguments: alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95 .
    • Specify the data either using x, y, or formulas with data = ...
  • ๐Ÿ”ด LO: Interpret the output of t.test() function including test statistics, degrees of freedom, \(p\)-values, and confidence intervals.
  • ๐Ÿ”ด LO: Identify the degrees of freedom associated with different statistical tests.
  • ๐ŸŸ  LO: Define pooled variance and explain the conceptual basis in the context of two-sample hypothesis testing

17: ANOVA

Suggested Practice Problems
  • JB exercises (solutions found here) Chapter 14: 2, 3, 5, 8, 10, 12, 13, 16, 18, 19, 22, 23, 25, 26, 28
  • ๐ŸŸก LO: Explain the conceptual basis of ANOVA, including the partitioning of variance and the F-test for assessing group differences.
  • ๐Ÿ”ด LO: Define Analysis of Variance (ANOVA) as a statistical method used to compare means across multiple groups or treatments.
  • ๐Ÿ”ด LO: State the null and alternative hypothesis for a one-way anova (either in words or symbols)
  • ๐Ÿ”ด LO: State and check the assumptions underlying the model.
    • for checking normality we use visual aids (e.g. QQ plot)
    • for equal variance we use the rule of thumb \(0.5 < \frac{s_a}{s_b} < 2\) where \(s_a\) and \(s_b\) are the smallest and largest sample standard deviation, respectively.
  • ๐Ÿ”ด LO: interpret side-by-side boxplots to assess it the equal variance assumption is reasonable
  • ๐Ÿ”ด LO: Identify and calculate the appropriate degrees of freedoms in an one-way ANOVA
  • ๐ŸŸข LO: Explain the conceptual basis of degrees of freedom and its importance in hypothesis testing and parameter estimation.
  • ๐ŸŸก LO: Explain the meaning and importance of balanced study designs
  • ๐Ÿ”ด LO: Identify and describe the main components of an ANOVA table, including the sources of variation, degrees of freedom, sum of squares, mean squares, and the F-statistic.
  • ๐Ÿ”ด LO: Calculate missing values in an ANOVA table based on the relationships between cells
  • ๐ŸŸข LO: Interpret the sources of variation presented in the ANOVA table, such as between-groups variation, within-groups variation, and total variation.
  • ๐ŸŸ  LO: Perform a one-way ANOVA in R using the aov() function
    • know the formula notation y~x with data specification
  • ๐Ÿ”ด LO: Interpret the results of hypothesis testing (either in an ANOVA table or R aov() output) based on the F-statistic and \(p\)-value including decisions to reject or fail to reject the null hypothesis.
  • ๐ŸŸก LO: Describe why calculation of the p-value for ANOVA is always โ€œone sidedโ€.
  • ๐ŸŸ  LO: Explain the purpose and rationale for conducting post-hoc tests to identify specific group differences.
  • ๐ŸŸ  LO: Identify and applying appropriate post-hoc tests when necessary (e.g. pairwise pooled \(t\)-tests to discover which group means are different after a significant ANOVA result using the Bonferroni correction)
  • ๐ŸŸ  LO: Explain why multiple comparison procedures like Bonferonni correction are necessary to control for Type I errors in hypothesis testing.

18: Linear Regression and Correlation

Suggested Practice Problems
  • JB exercises (solutions found here) Chapter 15: uggested exercises: 2, 4, 6, 8, 9, 12, 14, 15, 16, 18, 20, 37, 40, 41, 42, 43

Corresponding practice problems: JB exercises (solutions found here) Ch 2: 12. Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.1, 2.2,

  • ๐ŸŸ  LO: Define and identify the explanatory variable (aka independent variable or predictor), and the response variable (aka the dependent/response variable).
  • ๐ŸŸ  LO: Use scatter plots (explanatory variable (\(x\)) on the x-axis and the response variable (\(y\)) on the y-axis) to describe the strength and direction (positive or negative) of the linear relationship
  • ๐ŸŸ  LO: Define simple linear regression (SLR) as a statistical method used to model the relationship between a single independent variable (predictor) and a continuous dependent variable (outcome or response variable).
  • ๐Ÿ”ด LO: State and check the assumptions for using SLR, i.e. linearity, nearly normal residuals, constant variability (homoscedasticity).
  • ๐Ÿ”ด LO: Identify and interpret the parameters of a SLR model (\(\beta_0\) and \(\beta_1\))
    • Interpret the slope as
      • โ€œFor each unit increase in x, we would expect y to increase/decrease on average by \(\mid \hat \beta_1 \mid\) unitsโ€
      • Note that whether the response variable increases or decreases is determined by the sign of \(\hat \beta_1\)
    • Interpret the intercept as
      • โ€œWhen x = 0, we would expect y to equal, on average, \(\hat \beta_0\)โ€
      • Explain why the intercept often does not have any practical significance
  • ๐ŸŸ  LO: Plot the and fitted SLR line and understand the graphical representation of the slope (\(\hat \beta_1\)) and intercept (\(\hat \beta_0\))
  • ๐ŸŸก LO: Define and identify residuals \(e_i\) as the difference between the observed \(y\) and predicted \(\hat y\) values of the response variable.
  • ๐ŸŸก LO: Explain how parameters are estimated using ordinary least squares (OLS), i.e. the OLS estimators are those that minimize the sum of the squared residuals
  • LO: Derive the OLS estimators
  • ๐Ÿ”ด LO: Make predictions based on the fitted line using \(\hat y = \hat \beta_0 + \hat \beta_1 x\)
  • ๐ŸŸก LO: Interpret the values of the Pearsonโ€™s correlation coefficient (\(r\))
  • ๐ŸŸก LO: Describe the relationship between Pearsonโ€™s correlation coefficient (\(r\)) and the coefficient of determination, denoted \(R^2\) (AKA R-squared value) in SLR:
    • This value is calculated as the square of the correlation coefficient, and is between 0 and 1, inclusive.
    • An R-squared value of 1 indicates a perfect fit of the regression model to the observed data
  • ๐ŸŸ  LO: Use residual plots to identify potential outliers (any unusual observations that stand out)
  • ๐ŸŸ  LO: Assess the Residual vs. Fitted plots to check the assumptions
  • ๐ŸŸ  LO: Assess the QQ Residuals plot to check the normality assumptions
  • ๐ŸŸก LO: Define extrapolation and distinguish it from interpolation (predicting for values of \(x\) that are in the range of the observed data).
  • ๐ŸŸก LO: Perform hypothesis tests on the slope3 coefficient in a simple linear regression model.
  • ๐Ÿ”ด LO: Fit a linear model in R using the lm(formula, data) function.
  • ๐Ÿ”ด LO: Identify (from the summary output of an lm() model in R) the parameter estimates (and therefore the fitted OLS line)
  • ๐Ÿ”ด LO: Interpret the summary output of an lm() model in R to determine if a significant linear relationship exists (by interpreting the \(p\)-value associated with the slope parameter)
  • ๐Ÿ”ด LO: Identify (from the summary output of an lm() model in R) the \(R^2\) value and interpret its meaning as the percentage of the variability in the response variable explained by the explanatory variable.

19: Chi-square tests

Suggested Practice Problems
  • Diez, Barr, and ร‡etinkaya-Rundel (2016) Ch 2: 2.25,

  • JB exercises (solutions found here) Chapter 13: 2, 3, 4, 6, 7, 8, 10, 11, 14, 15, 16, 17, 19, 21, 22, 23

  • ๐Ÿ”ด LO: construct and interpret contingency tables (along with marginals) to organize and two categorical variables.

  • ๐Ÿ”ด LO: Use side-by-side box plots for assessing the relationship between a numerical and a categorical variable

  • ๐ŸŸ  LO: Determine the size of a contingency table ( \(r \times c\) )

  • ๐Ÿ”ด LO: Conduct goodness-of-fit tests to determine how well observed data fit a specific distribution.

  • ๐Ÿ”ด LO: perform the two types of chi-square tests:

    1. tests for one-way tables (AKA goodness of fit tests) and
    2. test for two-way tables (AKA tests for independence)

    following the steps outlined above. Note you should be able to โ€ฆ.

    • calculated an expected cell count
    • compute the chi-squared test statistic by hand
    • find critical values/approximate \(p\)-values for this tests using the Chi-square table
  • ๐ŸŸ  LO: State and check the assumptions for these tests

  • ๐ŸŸ  LO: Use the chisq.test() to conduct the appropriate chi-squared tests

  • ๐ŸŸ  LO: Interpret the output of chisq.test()

References

Devore, J. L., K. N. Berk, and M. A. Carlton. 2021. Modern Mathematical Statistics with Applications. Springer Texts in Statistics. Springer International Publishing. https://books.google.ca/books?id=ghcsEAAAQBAJ.
Diez, D. M., C. D. Barr, and M. ร‡etinkaya-Rundel. 2016. OpenIntro Statistics. OpenIntro, Incorporated. https://books.google.ca/books?id=wfcPswEACAAJ.
Ramachandran, K. M., and C. P. Tsokos. 2020. Mathematical Statistics with Applications in r. Elsevier Science. https://books.google.ca/books?id=t3bLDwAAQBAJ.
Rice, J. A. 2007. Mathematical Statistics and Data Analysis. Advanced Series. Cengage Learning. https://books.google.ca/books?id=KfkYAQAAIAAJ.
Wickham, H., M. ร‡etinkaya-Rundel, and G. Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Oโ€™Reilly. https://books.google.ca/books?id=xU-gzwEACAAJ.

Footnotes

  1. You can alternative attach() your data frame which makes the columns of the data frame available as if they were named vector objects in R.โ†ฉ๏ธŽ

  2. Understand how larger MSE values indicate greater discrepancy between estimated and true values, while smaller MSE values indicate better performance.โ†ฉ๏ธŽ

  3. Note that a hypothesis test for the intercept is often irrelevant since itโ€™s usually out of the range of the data, and hence it is usually an extrapolation.โ†ฉ๏ธŽ