STAT 205: Introduction to Mathematical Statistics
University of British Columbia Okanagan
March 15, 2024
Last class we were introduced to the critical value approach for test of hypotheses about the population mean (with \(\sigma^2\) known) based on a single sample.
We will see how the same approach can be used for tests on population proportions.
We also look at alternative approach involving \(p\)-values.
We also show the connection between two-tailed hypothesis tests and confidence intervals.
In this lecture we will be covering
This test procedure relies on a test statistic, i.e. a function of the sample data on which serves as a basis for making decisions about whether to reject or fail to reject \(H_0\).
It requires a critical value(s)1 which separates the
rejection region (RR): the set of observed test statistics for which \(H_0\) will be rejected) from the
“acceptance”/fail-to-reject region: the set of observed test statistics for which we would fail to reject \(H_0\))
Critical Value approach for proportions
Check Assumptions. If satisfied, 1. State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]
Find critical value:
\[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: p \neq p_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: p < p_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: p > p_0 \end{cases}\]Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)
Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).
Assumptions for Hypothesis Tests for Proportions
Exercise 1: chatGPT
According to a November 2023 research survey conducted by Pew Research1, about 13% of all U.S. teens have used the generative artificial intelligence (AI) chatbot in their schoolwork. Suppose we wish to investigate if this the proportion of students on our campus using generative AI to do homework. We survey 80 randomly selected UBCO students and find that 13 have admitted to using generative AI to do homework. Perform a formal hypothesis test for determining the proportion at UBCO differs from that of U.S. teens.
State null and alternative hypotheses
\(H_0:\)
\(H_A:\)
Find the critical values:
Calculate the test statistic:
\[ \begin{equation} Z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}} = \end{equation} \]
Conclusion:
We now consider an alternative hypothesis-testing method for for deciding whether to reject \(H_0\).
Like the rejection method, it will rely on a test statistic.
Unlike the rejection method, we will no longer require a critical value, but instead calculate a certain probability that goes by the name of a \(p\)-value.
While these two procedures should yield the same conclusion, the \(p\)-value will provide an intuitive measure of the strength of evidence in the data against \(H_0\).
Definition 1: \(p\)-value
The \(p\)-value is the probability, calculated assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as contradictory to \(H_0\) as the value calculated from the available sample.
It quantifies the chances of obtaining the observed data or data more favorable to the alternative than our current data set if the null hypothesis were true.
\(p\)-value approach for proportions
State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]
Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)
Calculate the \(p\)-value
\[\begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: p \neq p_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: p < p_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: p > p_0 \end{cases}\]Conclusion: reject \(H_0\) if \(p\)-value is less than \(\alpha\) (typically 0.05), otherwise, fail to reject \(H_0\).
Returning to Exercise 1, and using the same hypotheses, the \(p\)-value can be calculated as follows:
\[\begin{align} 2P(Z \geq |z_{obs}|) &= 2P(Z \geq |0.8643648|) \\ &= 2 (0.1936938)\\ &= 0.3873875\\ \end{align}\]Since this \(p\)-value = \(0.3873875\) greater than \(\alpha = 0.05\), we fail to reject \(H_0\) …
Note that the critical value approach and the \(p\)-value approach should provide the same conclusion (provided the hypotheses, data, and significance level is the same)
However, rather than a binary outcome (reject vs. fail to reject) the \(p\)-value gives information about the strength of evidence against the null hypothesis.
e.g a \(p\)-value of 0.04999 and 0.0000001 are both significant, but 0.0000001 indicates stronger evidence than 0.04999.
\(p\)-value | Evidence against \(H_0\) | Significance code in R
|
---|---|---|
\(0.1 \leq p \leq 1\) | no evidence | |
\(0.05 < p \leq 0.10\) | weak evidence | . |
\(0.01 < p \leq 0.05\) | sufficient evidence | * |
\(0.001 < p \leq 0.01\) | strong evidence | ** |
\(0< p \leq 0.001\) | very strong evidence | *** |
All Alternatives
Returning to our Heinz ketchup example from last class. Conduct all three hypothesis test using the \(p\)-value approach for the Heinz Example with \(\bar x\)=19.86 \(n\)=7, and known population standard deviation \(\sigma = 0.2\). Do they all yield the same conclusion at a 5% significance level?
Suppose we find a (1-\(\alpha\))100% confidence interval for \(\mu\) using:
\[ \begin{align} \bar x \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \end{align} \]
And we wish to carry out a test of:
\[ \begin{align} H_0&: \mu = \mu_0 & H_A&: \mu = \mu_0 \end{align} \]
with a significance level of \(\alpha\)…
To answer this we would usually calculate our test statistic:
\[ Z_{obs} = \dfrac{\bar X - \mu_0}{\sigma/\sqrt{n}} \]
and check if it falls into the rejection region, or alternatively, calculate the corresponding \(p\)-value and compare it with \(\alpha\).
Alternatively, I could tell you the decision of that test by looking at the confidence interval….
Connection between Confidence Intervals and Two-Sided Hypothesis Tests
Consider a two-sided hypothesis test at a significance level of \(\alpha\)
\[ \begin{align} H_0&: \mu = \mu_0 & H_A&: \mu \neq \mu_0 \end{align} \]
Suppose we construct the (1- \(\alpha\))% confidence interval (CI) for \(\mu\), using
\[ \begin{align} \bar x \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \end{align} \]
Test Procedure:
If the \(\mu_0\) falls within the CI, we do not have sufficient evidence to reject \(H_0\).
If the \(\mu_0\) falls outside the CI, we have sufficient evidence to reject \(H_0\).
Retuning to the Two sided Test for Heinz ketchup example, let’s find the corresponding 95% CI:
\[ \begin{align} \bar x &\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\\ 19.86 &\pm \frac{0.2}{\sqrt{7}}\\ 19.86 &\pm 0.1481594 \\ [19.71184&, 20.00816] \end{align} \]
Since \(\mu_0 = 20\) falls within the 95% CI, we have insufficient evidence to reject the null hypothesis.
A similar argument can be made with tests for proportions.
Here we have a 95% CI based on 16 out of 80 “yes”s.
\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{p_0(1-p_0)}{n}}\\ 0.2 &\pm 1.959964 \frac{0.13(1- 0.13)}{\sqrt{80}}\\ 0.2 &\pm 0.07369439 \\ [0.1263056&, 0.2736944] \end{align} \]
Since \(p_0\) = 0.13 lies within this CI we would fail to reject the null hypothesis that \(p = 0.13\)
A similar argument can be made with tests for proportions.
Here we have a 95% CI based on 17 out of 80 “yes”s.
\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{p_0(1-p_0)}{n}}\\ 0.2125 &\pm 1.959964 \frac{0.13(1- 0.13)}{\sqrt{80}}\\ 0.2125 &\pm 0.07369439 \\ [0.1388056&, 0.2861944] \end{align} \]
Since \(p_0\) = 0.13 does not lie within this CI would have sufficient evidence to reject the null hypothesis that \(p = 0.13\)
Comments
When the observed test statistics falls in the rejection region, we will necessarily have a significant \(p\)-value and vice versa.
Observing 21 out of 80 in our sample who have used generative AI for homework (i.e. \(\hat p = \frac{21}{80}\) = 0.2625) yields a very small \(p\)-value (<0.001). Hence we have very strong evidence against the null hypothesis \(H_0\).
17 out of 80 (i.e. \(\hat p = \frac{17}{n}\) = 0.2125 ) yields a significant \(p\)-value (0.028), but evidence is not as strong as a sample with 20 “yes”s.
16 out of 80 (i.e. \(\hat p = \frac{16}{80}\) = 0.2) yields an almost significant \(p\)-value (0.063). Hence we have insufficient evidence against \(H_0\) but may still have our suspicions.