Lecture 12: Hypothesis Tests One-Sample Proportions

STAT 205: Introduction to Mathematical Statistics

Dr. Irene Vrbik

University of British Columbia Okanagan

March 15, 2024

Introduction

  • Last class we were introduced to the critical value approach for test of hypotheses about the population mean (with \(\sigma^2\) known) based on a single sample.

  • We will see how the same approach can be used for tests on population proportions.

  • We also look at alternative approach involving \(p\)-values.

  • We also show the connection between two-tailed hypothesis tests and confidence intervals.

Outline

In this lecture we will be covering

Recap of Rejection Method

  • This test procedure relies on a test statistic, i.e. a function of the sample data on which serves as a basis for making decisions about whether to reject or fail to reject \(H_0\).

  • It requires a critical value(s)1 which separates the

    • rejection region (RR): the set of observed test statistics for which \(H_0\) will be rejected) from the

    • acceptance”/fail-to-reject region: the set of observed test statistics for which we would fail to reject \(H_0\))

RR for Two-tail test

RR for upper-tail test

RR for lower-tail test

Critical Value approach for proportions

  1. Check Assumptions. If satisfied, 1. State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Find critical value:

    \[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: p \neq p_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: p < p_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: p > p_0 \end{cases}\]
  3. Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)

  4. Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).

Assumptions

Assumptions for Hypothesis Tests for Proportions

  1. We have a simple random sample
  2. The experiment can be modeled by the binomial distribution:
    • number of trials is fixed
    • trials are independent
    • only two possible outcomes (‘success’ and ‘failure’)
    • Probabilities constant for each trial
  3. Success-failure condition1: \(np \geq 10\) and \(n(1-p) \geq 10)\)

Exercise 1: chatGPT

According to a November 2023 research survey conducted by Pew Research1, about 13% of all U.S. teens have used the generative artificial intelligence (AI) chatbot in their schoolwork. Suppose we wish to investigate if this the proportion of students on our campus using generative AI to do homework. We survey 80 randomly selected UBCO students and find that 13 have admitted to using generative AI to do homework. Perform a formal hypothesis test for determining the proportion at UBCO differs from that of U.S. teens.

Check Assumption

  • To complete this question as outlined here we need to check the success-fail condition as stated in Lecture 3)

  • Check that \(np \geq 10\) and \(n(1-p) \geq 10)\). We use the hypothesized value for these checks.

    • here \(n\) = 80, our hypothesized value for \(p\) is 0.13

    • \(np\) = 10.4 ✅

    • \(np\) = 69.6 ✅

Solution to Exercise 1

  1. State null and alternative hypotheses

    \(H_0:\)

    \(H_A:\)

  2. Find the critical values:

  1. Calculate the test statistic:

    \[ \begin{equation} Z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}} = \end{equation} \]

  2. Conclusion:

An Alternative Test Procedure

  • We now consider an alternative hypothesis-testing method for for deciding whether to reject \(H_0\).

  • Like the rejection method, it will rely on a test statistic.

  • Unlike the rejection method, we will no longer require a critical value, but instead calculate a certain probability that goes by the name of a \(p\)-value.

  • While these two procedures should yield the same conclusion, the \(p\)-value will provide an intuitive measure of the strength of evidence in the data against \(H_0\).

P-value

Definition 1: \(p\)-value

The \(p\)-value is the probability, calculated assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as contradictory to \(H_0\) as the value calculated from the available sample.

It quantifies the chances of obtaining the observed data or data more favorable to the alternative than our current data set if the null hypothesis were true.

\(p\)-value approach for proportions

  1. State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)

  3. Calculate the \(p\)-value

    \[\begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: p \neq p_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: p < p_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: p > p_0 \end{cases}\]
  4. Conclusion: reject \(H_0\) if \(p\)-value is less than \(\alpha\) (typically 0.05), otherwise, fail to reject \(H_0\).

Redo example using p-values

Returning to Exercise 1, and using the same hypotheses, the \(p\)-value can be calculated as follows:

\[\begin{align} 2P(Z \geq |z_{obs}|) &= 2P(Z \geq |0.8643648|) \\ &= 2 (0.1936938)\\ &= 0.3873875\\ \end{align}\]

Since this \(p\)-value = \(0.3873875\) greater than \(\alpha = 0.05\), we fail to reject \(H_0\)

Comment

  • Note that the critical value approach and the \(p\)-value approach should provide the same conclusion (provided the hypotheses, data, and significance level is the same)

  • However, rather than a binary outcome (reject vs. fail to reject) the \(p\)-value gives information about the strength of evidence against the null hypothesis.

  • e.g a \(p\)-value of 0.04999 and 0.0000001 are both significant, but 0.0000001 indicates stronger evidence than 0.04999.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\)

Comments

  • When the observed test statistics falls in the rejection region, we will necessarily have a significant \(p\)-value and vice versa.

  • Observing 21 out of 80 in our sample who have used generative AI for homework (i.e. \(\hat p = \frac{21}{80}\) = 0.2625) yields a very small \(p\)-value (<0.001). Hence we have very strong evidence against the null hypothesis \(H_0\).

  • 17 out of 80 (i.e. \(\hat p = \frac{17}{n}\) = 0.2125 ) yields a significant \(p\)-value (0.028), but evidence is not as strong as a sample with 20 “yes”s.

  • 16 out of 80 (i.e. \(\hat p = \frac{16}{80}\) = 0.2) yields an almost significant \(p\)-value (0.063). Hence we have insufficient evidence against \(H_0\) but may still have our suspicions.

Interpreting \(p\)-values

Here are some guidelines for using the \(p\)-value to assess the evidence against the null hypothesis at \(\alpha = 0.05\).
\(p\)-value Evidence against \(H_0\) Significance code in R
\(0.1 \leq p \leq 1\) no evidence
\(0.05 < p \leq 0.10\) weak evidence .
\(0.01 < p \leq 0.05\) sufficient evidence *
\(0.001 < p \leq 0.01\) strong evidence **
\(0< p \leq 0.001\) very strong evidence ***

Heinz Example

All Alternatives

Returning to our Heinz ketchup example from last class. Conduct all three hypothesis test using the \(p\)-value approach for the Heinz Example with \(\bar x\)=19.86 \(n\)=7, and known population standard deviation \(\sigma = 0.2\). Do they all yield the same conclusion at a 5% significance level?

Lower Tailed Test

Upper Tailed Test

Two sided Test

Connection with Confidence Intervals

Suppose we find a (1-\(\alpha\))100% confidence interval for \(\mu\) using:

\[ \begin{align} \bar x \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \end{align} \]

And we wish to carry out a test of:

\[ \begin{align} H_0&: \mu = \mu_0 & H_A&: \mu = \mu_0 \end{align} \]

with a significance level of \(\alpha\)

Steps in formal hypothesis test

To answer this we would usually calculate our test statistic:

\[ Z_{obs} = \dfrac{\bar X - \mu_0}{\sigma/\sqrt{n}} \]

and check if it falls into the rejection region, or alternatively, calculate the corresponding \(p\)-value and compare it with \(\alpha\).

Alternatively, I could tell you the decision of that test by looking at the confidence interval….

Connection between Confidence Intervals and Two-Sided Hypothesis Tests

Consider a two-sided hypothesis test at a significance level of \(\alpha\)

\[ \begin{align} H_0&: \mu = \mu_0 & H_A&: \mu \neq \mu_0 \end{align} \]

Suppose we construct the (1- \(\alpha\))% confidence interval (CI) for \(\mu\), using

\[ \begin{align} \bar x \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \end{align} \]

Test Procedure:

  • If the \(\mu_0\) falls within the CI, we do not have sufficient evidence to reject \(H_0\).

  • If the \(\mu_0\) falls outside the CI, we have sufficient evidence to reject \(H_0\).

Example: Heinz

Retuning to the Two sided Test for Heinz ketchup example, let’s find the corresponding 95% CI:

\[ \begin{align} \bar x &\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\\ 19.86 &\pm \frac{0.2}{\sqrt{7}}\\ 19.86 &\pm 0.1481594 \\ [19.71184&, 20.00816] \end{align} \]

Since \(\mu_0 = 20\) falls within the 95% CI, we have insufficient evidence to reject the null hypothesis.

Example: generative AI

A similar argument can be made with tests for proportions.

Here we have a 95% CI based on 16 out of 80 “yes”s.

\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{p_0(1-p_0)}{n}}\\ 0.2 &\pm 1.959964 \frac{0.13(1- 0.13)}{\sqrt{80}}\\ 0.2 &\pm 0.07369439 \\ [0.1263056&, 0.2736944] \end{align} \]

Since \(p_0\) = 0.13 lies within this CI we would fail to reject the null hypothesis that \(p = 0.13\)

A similar argument can be made with tests for proportions.

Here we have a 95% CI based on 17 out of 80 “yes”s.

\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{p_0(1-p_0)}{n}}\\ 0.2125 &\pm 1.959964 \frac{0.13(1- 0.13)}{\sqrt{80}}\\ 0.2125 &\pm 0.07369439 \\ [0.1388056&, 0.2861944] \end{align} \]

Since \(p_0\) = 0.13 does not lie within this CI would have sufficient evidence to reject the null hypothesis that \(p = 0.13\)

References

Devore, J. L., K. N. Berk, and M. A. Carlton. 2021. Modern Mathematical Statistics with Applications. Springer Texts in Statistics. Springer International Publishing. https://books.google.ca/books?id=ghcsEAAAQBAJ.