Type I/II Errors and Power

STAT 205: Introduction to Mathematical Statistics

Dr. Irene Vrbik

University of British Columbia Okanagan

Review

Population

πŸ‘©β€βš•οΈπŸ‘¨β€πŸ³πŸ§‘β€πŸ”¬πŸ‘©β€πŸŽ¨πŸ‘¨β€πŸš€πŸ§‘β€πŸ«πŸ‘©β€βœˆοΈπŸ‘¨β€βš–οΈπŸ§‘β€πŸŒΎπŸ‘©β€πŸ”§πŸ‘¨β€πŸŽ€πŸ§‘β€πŸ­πŸ‘©β€πŸš’πŸ‘¨β€πŸŽ“πŸ§‘β€βš•οΈπŸ‘©β€πŸ”¬πŸ‘¨β€πŸŽ¨πŸ§‘β€πŸš€πŸ‘©β€πŸ­πŸ‘¨β€πŸ”§

\(\theta = ?\)

\(\downarrow\)

Sample

πŸ‘¨β€πŸš€ πŸ‘©β€βœˆοΈ πŸ‘©β€πŸ”¬ πŸ§‘β€πŸŒΎ πŸ§‘β€πŸš€

\(\hat \theta\)

Review

Sample

πŸ‘¨β€πŸš€ πŸ‘©β€βœˆοΈ πŸ‘©β€πŸ”¬ πŸ§‘β€πŸŒΎ πŸ§‘β€πŸš€

\(\downarrow\)

\(\hat \theta\)

point estimates

Review

Sample

πŸ‘¨β€πŸš€ πŸ‘©β€βœˆοΈ πŸ‘©β€πŸ”¬ πŸ§‘β€πŸŒΎ πŸ§‘β€πŸš€

\(\downarrow\)

\[ \begin{align} \Big[\hat \theta + \text{ME}, \hat \theta + \text{ME}\Big] \\ \text{ confidence intervals} \end{align} \]

Review

Sample

πŸ‘¨β€πŸš€ πŸ‘©β€βœˆοΈ πŸ‘©β€πŸ”¬ πŸ§‘β€πŸŒΎ πŸ§‘β€πŸš€

\(\downarrow\)

\[\begin{align} H_0: \theta = \theta_0 \quad\quad \text{ vs } \quad\quad H_A: \begin{cases} \theta < \theta_0\\ \theta > \theta_0\\ \theta \neq \theta_0 \end{cases} \end{align}\]

Assuming \(H_0: \theta = \theta_0\) is true, we get our null distribution

\(\downarrow\)

Observed test statistics falling in the rejection region \(\implies\) reject \(H_0\).

Null distribution with two-tailed critical regions (\(\alpha\) = 0.05).

Simulation under the Null

Cases where we reject

In this simulation, 52 out of 1000 samples (5.2%) generated under the null hypothesis fall in the rejection regions.

Type I errors

  • All of the red dots represent the case where we rejected the null hypothesis when we shouldn’t have.

  • In other words, the 52 unlucky samples are examples of Type I errors.

  • The simulation demonstrates the Type I error rate which theroretically1 is equal to \(\alpha\) (default 0.05).

iClicker: Identifying a Type I Error

Exercise 1 A university administrator is investigating whether the average GPA of students at her university is different from the national average of 3.2. She collects a random sample of students and performs a hypothesis test. Let \(\mu\) represent the true average GPA of students at her university and

\[ H_0: \mu = 3.2 \quad\quad H_a: \mu \neq 3.2 \]

Under which condition would the administrator commit a Type I error?

  1. She concludes the university’s average GPA is not 3.2 when it actually is.

  2. She concludes the university’s average GPA is not 3.2 when it actually is not.

  3. She concludes the university’s average GPA is 3.2 when it actually is.

  4. She concludes the university’s average GPA is 3.2 when it actually is not.

Type I Error

Type I Error

A Type I Error occurs when we incorrectly reject the null hypothesis (\(H_0\)) even though it is actually TRUE.

The probability of making a Type I error is \(\alpha\), i.e. the significance level.

Why don’t we just set our significance level to 1%?

Well there is a tradeoff with another type of error,…

Type II Errors

  • A Type II Error occurs when we incorrectly fail to reject the null hypothesis (\(H_0\)) even though it is actually FALSE.
  • We denote the probability of making a Type II error by \(\beta\).
  • As we will see, if we lower \(\alpha\) there will be less chance of Type I errors but an increases risk of Type II errors.
  • Visualizing this one is a little harder since there are infinitely many alternative hypotheses for a given null hypothesis

Decision Matrix

Reality

\(H_0\) True \(H_0\) False
Reject \(H_0\) \(\textcolor{red}{\textsf{Type I error}}\) \(\textcolor{green}{\textsf{Correct}}\)
Fail to reject \(H_0\) \(\textcolor{green}{\textsf{Correct}}\) \(\textcolor{red}{\textsf{Type II error}}\)


Here columns represent the reality or underlying truth (that we never know), and rows represent out decision we make base on the hypothesis test.

Example

Error Calculations

Exercise 2 A packaging machine is supposed to fill 500 g bags of rice. Historical data suggest the fill weights are approximately Normal with known \(\sigma\) = 12 g. At \(\alpha\) = 0.05 you plan to take a sample of size 36 to test:

\[ \begin{align} H_0: \mu = 500 \quad \text{vs} \quad H_A: \mu < 500 \end{align} \] If the true mean is \(\mu = 496\),

  1. what is the probability of making a Type I error
  2. what is the probability of making a Type II error
  3. what is the power of the test.

Type I Error visualized

Figure 1: Null distribution with lower-tailed critical value at \(\bar x\) = 496.7

Type I in symbols

\[ \begin{align} \alpha &= \Pr(\text{Type I Error})\\ &= \Pr(\text{Reject }H_0 \mid \textcolor{NavyBlue}{H_0 \text{ is true}})\\ &= \Pr(\text{Reject }H_0 \mid \textcolor{NavyBlue}{\mu = 500})\\ &= \Pr(\textcolor{NavyBlue}{\bar X} < 496.7 \mid \textcolor{NavyBlue}{\mu = 500}) = 0.05 \end{align} \]

where \(\textcolor{NavyBlue}{\bar X \sim \text{Normal}(\mu_{\bar X} = 500, \sigma_{\bar X} = 12/\sqrt{36})}\)

pnorm(496.7102927,       # critical xbar value
    mean = 500,          # true mu value
    sd = 12/sqrt(36))    # SE = sigma/sqrt(n)
[1] 0.05

Null vs True Distribution

Figure 2: Null distribution (green) next to the true sampling distribution of \(\bar X\) (orange).

True Distribution

Figure 3: Type II error if failing to reject the null when the null is false.

Type II Visualized

Power (green shaded region) is the probability of rejecting the null when the null is false.

Type II in symbols

\[ \begin{align} \beta &= \Pr(\text{Type II Error})\\ &= \Pr(\text{Fail to reject }H_0 \mid \textcolor{orange}{H_0 \text{ is false}})\\ &= \Pr(\text{Fail to reject }H_0 \mid \textcolor{orange}{\mu = 496})\\ &= \Pr(\textcolor{orange}{\bar X} \geq 496.7 \mid \textcolor{orange}{\mu = 496}) = 0.3612 \end{align} \]

where \(\textcolor{orange}{\bar X \sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36})}\)

pnorm(496.7102927,       # critical xbar value
    mean = 496,          # true mu value
    sd = 12/sqrt(36),    # SE = sigma/sqrt(n)
    lower.tail = FALSE)
[1] 0.36124

True Distribution

Figure 4: Power is the probability of rejecting the null when the null is false.

Power Visualized

Power (green shaded region) is the probability of rejecting the null when the null is false.

Power in symbols

\[ \begin{align} \text{Power} &= \Pr(\text{Reject }H_0 \mid \textcolor{orange}{H_0 \text{ is false}})\\ &= \Pr(\text{Reject }H_0 \mid \textcolor{orange}{\mu = 496})\\ &= \Pr(\textcolor{orange}{\bar X} < 496.7 \mid \textcolor{orange}{\mu = 496}) = 0.6388 \end{align} \]

where \(\textcolor{orange}{\bar X \sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36})}\)

pnorm(496.7102927,       # critical xbar value
    mean = 496,          # true mu value
    sd = 12/sqrt(36))    # SE = sigma/sqrt(n)
[1] 0.63876

Power in symbols

\[ \begin{align} \text{Power} &= \Pr(\text{Reject }H_0 \mid \textcolor{orange}{H_0 \text{ is false}})\\ &= \Pr(\text{Reject }H_0 \mid \textcolor{orange}{\mu = 496})\\ &= \Pr(\textcolor{orange}{\bar X} < 496.7 \mid \textcolor{orange}{\mu = 496}) = 0.6388\\ &= 1 - \Pr(\textcolor{orange}{\bar X} \geq 496.7 \mid \textcolor{orange}{\mu = 496}) \\ &= 1 - \Pr(\text{Failing to reject }H_0 \mid \textcolor{orange}{\mu = 496})\\ & = 1 - \Pr(\text{Type II Error}) \end{align} \]

\[\text{Power} = 1 - \beta\]

iClicker

iClicker

Example 1 All else equal, if we reduce \(\alpha\) from 0.05 to 0.01, what happens to power.

  1. Power increases
  2. Power decreases
  3. Power stays the same
  4. Cannot be determined

Why is Power Important?

  • A test with low power means we might not detect a real effect, leading to Type II errors.

  • A high-power test increases the likelihood of detecting true differences.

  • So how can we increase the power?

iClicker

Effect of sample size on Type II errors

If we increase sample size but keep \(\alpha\) = 0.01 fixed, what happens to power?

  1. Power increases
  2. Power decreases
  3. Power stays the same
  4. Cannot be determined

Sample Size affect on Type II

The probability of making a type II error with a sample size of 36

Sample Size effect on Type II

The probability of making a type II error with a sample size of 41

Sample Size effect on Type II

The probability of making a type II error with a sample size of 43

Sample Size effect on Type II

The probability of making a type II error with a sample size of 46

Sample Size effect on Type II

The probability of making a type II error with a sample size of 51

Sample Size effect on Type II

The probability of making a type II error with a sample size of 56

Sample Size effect on Type II

The probability of making a type II error with a sample size of 61

Sample Size effect on Type II

The probability of making a type II error with a sample size of 66

Sample Size effect on Type II

The probability of making a type II error with a sample size of 76

Sample Size effect on Type II

The probability of making a type II error with a sample size of 86

Sample Size effect on Type II

The probability of making a type II error with a sample size of 96

Sample Size effect on Type II

The probability of making a type II error with a sample size of 106

Sample Size effect on Type II

The probability of making a type II error with a sample size of 116

Sample Size effect on Type II

The probability of making a type II error with a sample size of 126

Sample Size effect on Type II

The probability of making a type II error with a sample size of 136

iClicker

Effect of sample size on Power

If we increase sample size but keep \(\alpha\) = 0.01 fixed, what happens to power?

  1. Power increases
  2. Power decreases
  3. Power stays the same
  4. Cannot be determined

\[ \begin{align} \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36}) & \text{(True)}\\ \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36}) & \text{(Null)} \end{align} \]

Figure 5: The null/true distribution with a sample size of 36

\[ \begin{align} \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{108}) & \text{(True)}\\ \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{108}) & \text{(Null)} \end{align} \]

The null/true distribution with a sample size of 36 (solid) verses 108 (dotted)

\[ \begin{align} \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36}) & \text{critical } \bar X = 495.35 \\ \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{108}) & \text{critical } \bar X = 497.31 \end{align} \]

Null distribution (green) next to the true sampling distribution of \(\bar X\) (orange).

\[ \begin{align} \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36}) & \text{critical } \bar X = 495.35 \\ \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{108}) & \text{critical } \bar X = 497.31 \end{align} \]

Null distribution (green) next to the true sampling distribution of \(\bar X\) (orange).

\[ \begin{align} \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{36}) & \text{critical } \bar X = 495.35 \\ \bar X &\sim \text{Normal}(\mu_{\bar X} = 496, \sigma_{\bar X} = 12/\sqrt{108}) & \text{critical } \bar X = 497.31 \end{align} \]

Null distribution (green) next to the true sampling distribution of \(\bar X\) (orange).

iClicker

Effect of sample size on significance level

If we increase sample size but keep \(\alpha\) = 0.01 fixed, what happens to the signficance level?

  1. \(\alpha\) increases
  2. \(\alpha\) decreases
  3. \(\alpha\) stays the same
  4. Cannot be determined

Effect of Sample Size

What changes as \(n\) increases

\[ n \uparrow \implies \text{SE} \downarrow \implies \text{overlap} \downarrow \]

  • Type II error decreases: \(\beta \downarrow\)

  • Power increases: \(1-\beta \uparrow\)

iClicker

Effect of \(\alpha\) on Power

All else equal, if we increase or significance level \(\alpha\) what happens to power?

  1. Power increases
  2. Power decreases
  3. Power stays the same
  4. Cannot be determined

Increase Significance Level

Increase Significance Level

Influences on Power

  1. Significance Level (\(\alpha\))
    • Higher \(\alpha\) \(\implies\) Higher power
    • However, this also increases Type I errors

Increase Sample Size

Increase Sample Size

Influences on Power

  1. Significance Level (\(\alpha\))

    • Higher \(\alpha\) \(\implies\) Higher power
    • However, this also increases Type I errors
  2. Sample Size (\(n\))

    • Increasing the sample size reduces variability, making it easier to detect small effects.
    • A test with \(n = 20\) might fail to detect a difference, while \(n=2000\) increases power significantly.

Increase Effect Size

Increase Effect Size

Effect Size

  • In hypothesis tests, effect size refers to how different the true mean (\(\mu\)) is from the hypothesized value (\(\mu_0\)).

  • Larger effect sizes make it easier to detect a difference, while smaller effect sizes require larger sample sizes for detection.

  • For two sample tests for \(\mu\), the effect size (often denoted by \(\delta\)) is the magnitude of the difference between group means.

\[ \delta = |\mu - \mu_0| \]

Influences on Power

  1. Significance Level (\(\alpha\))

    • Higher \(\alpha\) \(\implies\) higher power
    • However, this also increases Type I errors
  2. Sample Size (\(n\))

    • Increasing the sample size reduces variability, making it easier to detect small effects.
  3. Effect Size (\(\delta\))

    • The larger the effect size, the higher the power

Definition

Type I Error (False Positive)

A Type I Error occurs when we incorrectly reject the null hypothesis (\(H_0\)) even though it is actually true. \(\Pr(\text{Type I error} \mid H_0 \text{ true}) = \alpha\).

Type II Error (False Negative)

A Type II Error occurs when we fail to reject the null hypothesis (\(H_0\)) even though the alternative hypothesis (\(H_A\)) is actually true. \(\Pr(\text{Type II error} \mid H_A \text{ true}) = \beta\).

Power (\(1- \beta\))

The power of a test is the probability of correctly rejecting (\(H_0\)) when \(H_A\)) is true. It measures the sensitivity of a test to detect an effect when one truly exists.

iClicker: Power

Identifying Statistical Power

Exercise 3 In the decision matrix shown below, which cell represents statistical power?

Reality \(H_0\) True \(H_0\) False
Reject \(H_0\) (a) Type I error (b) Correct
Fail to reject \(H_0\) (c) Correct (d) Type II error

(e) None of the above

Resources

Power Calculations