
(p-value approach)
University of British Columbia Okanagan
We are still testing
\[ \begin{align} H_0&: \mu = \mu_0 &&& H_A&: \begin{cases} \mu \ne \mu_0 & \text{two-sided}\\ \mu < \mu_0 & \text{left-tailed}\\ \mu > \mu_0 & \text{right-tailed} \end{cases} \end{align} \]
Last class we covered the critical value approach for hypothesis tests concerning the population mean.
Today we will cover the (more common) p-value approach.
The p-value approach relies on a null distribution (the distribution assuming the null hypothesis is correct), determined by the appropriate test statistic.
When \(\sigma\) is know
\[ \dfrac{\bar X - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1) \]
Conditions for \(Z\)-statistic
Same as conditions as for \(t\)-statistic 1️⃣2️⃣3️⃣ and 4️⃣ population standard deviation, \(\sigma\) is known.
When \(\sigma\) is unknown
\[ \dfrac{\bar X - \mu_0}{s/\sqrt{n}} \sim t_{\nu = n-1} \]
Conditions for \(t\)-statistic
1️⃣ Simple Random Sample
2️⃣ Observations independent
3️⃣ Population is normal or \(n \geq 30\)
With the critical value approach we asked:
Did our observed test statistic cross the critical value threshold (aka the rejection boundary)?
If our test statistic falls in the Rejection Region
\[ \implies \text{Reject } H_0 \]
If our test statistic falls outside the Rejection Region
\[ \implies \text{Fail to reject } H_0 \]
In the more realistic situation where \(\sigma\) is unknown, the null distribution is the \(t\)-distribution with \(\nu = n-1\) degrees of freedom.
For a right-tailed test at \(\alpha\) = 0.05 and a null distribution with 5 degrees of freedom, the critical value is at 2.015
An observed test statistic of 0 lead us fail to reject the null hypothesis.
An observed test statistic of 1.99 lead us fail to reject the null hypothesis.
An observed test statistic of 2.02 lead us reject the null hypothesis.
An observed test statistic of 5.99 lead us reject the null hypothesis.
Two very different results are treated identically using the critical value approach.
The critical value approach does note tells us:
how far close \(t_{obs}\) is to the rejection boundary
how strongly the data disagree with \(H_0\)
The \(p\)-value approach solves this problem \(\dots\)
Critical value approach asks:
Did \(t_{obs}\) fall into the rejection region?
\(\implies\) binary result Yes (reject)/ No (fail to reject)
\(p\)-value approach asks:
How surprising is \(t_{obs}\) if \(H_0\) were true?
\(\implies\) \(p\)-value (probability between 0 and 1)
The \(p\)-value provides a measure of evidence \(\dots\)
Evidence measures:
How inconsistent the observed data are with the assumption that \(H_0\) is true.
Results that happen often under \(H_0 \implies\) weak evidence
Results that rarely happen under \(H_0 \implies\) strong evidence
An observed test statistic of 0 not surprising under the null hypothesis (little evidence).
An observed test statistic of 1.99 somewhat surprising under the null hypothesis (some evidence).
An observed test statistic of 2.02 somewhat surprising under the null hypothesis (significant evidence).
An observed test statistic of 5.99 somewhat surprising under the null hypothesis (strong evidence).
An observed test statistic of 5.99 somewhat surprising under the null hypothesis (strong evidence).
\(p\)-value
The \(p\)-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true.
Right-tailed test
\(\Pr(t_{\nu} \geq t_{obs} \mid H_0)\)
Left-tailed test
\(\Pr(t_{\nu} \leq t_{obs} \mid H_0)\)
Two-sided test
\(2\cdot\Pr(t_{\nu} \geq |t_{obs}| \mid H_0)\)
The alternative hypothesis determines which tail(s) are “extreme”.
Decision (critical value approach)
🚫 If \(t_{obs}\) falls in rejection region
\[ \implies \text{Reject } H_0 \]
🟢 If \(t_{obs}\) falls outside rejection region
\[ \implies \text{Fail to reject } H_0 \]
Decision (\(p\)-value approach)
🚫 If \(p\)-value \(< \alpha\)
\[ \implies \text{Reject }H_0 \]
🟢 If \(p\)-value \(\geq \alpha\)
\[ \implies \text{Fail to reject }H_0 \]
Both methods always give the same conclusion.
| \(p\)-value | Evidence against \(H_0\) | Decision |
|---|---|---|
| \(0.1 \leq p \leq 1\) | no evidence | 🟢 Fail to reject \(H_0\) |
| \(0.05 < p \leq 0.10\) | weak evidence | 🟢 Fail to reject \(H_0\) |
| \(0.01 < p \leq 0.05\) | sufficient evidence | 🚫 Reject \(H_0\) |
| \(0.001 < p \leq 0.01\) | strong evidence | 🚫 Reject \(H_0\) |
| \(0< p \leq 0.001\) | very strong evidence | 🚫 Reject \(H_0\) |
Hypothesis Tests Critical value approach (using \(Z\))
State hypotheses \(H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}\)
Compute test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\frac{\sigma}{\sqrt{n}}}\sim N(0,1)\)
Determine the critical value \(z^* = \begin{cases} P(Z \geq z^*) = \alpha/2 &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z \geq z^*) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z \leq z^*) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\)
Make a Decision \(\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $z_{obs}$ falls in the RR}\\ \text{fail to reject $H_0$} & \text{if $z_{obs}$ falls outside the RR} \end{cases}\)
State the Conclusion in context.
Hypothesis Tests Critical value approach (using \(t\))
State hypotheses \(H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}\)
Compute test statistic \(t_{obs} = \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\sim t_{\nu = n - 1}\)
Determine the critical value \(t^* = \begin{cases} P(t_{\nu = n - 1} \geq t^*) = \alpha/2 &\text{ if } H_A: \mu \neq \mu_0 \\ P(t_{\nu = n - 1} \geq t^*) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(t_{\nu = n - 1} \leq t^*) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\)
Make a Decision \(\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $t_{obs}$ falls in the RR}\\ \text{fail to reject $H_0$} & \text{if $t_{obs}$ falls outside the RR} \end{cases}\)
State the Conclusion in context.
Hypothesis Tests \(p\)-value approach (using \(Z\))
State hypotheses \(H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}\)
Compute test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\frac{\sigma}{\sqrt{n}}}\sim N(0,1)\)
Calculate the \(p\)-value \(= \begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: \mu < \mu_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: \mu > \mu_0 \end{cases}\)
Make a Decision \(\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $p$-value $< \alpha$}\\ \text{fail to reject $H_0$} & \text{if $p$-value $\geq \alpha$} \end{cases}\)
State the Conclusion in context.
Hypothesis Tests \(p\)-value approach (using \(t\))
State hypotheses \(H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}\)
Compute test statistic \(t_{obs} = \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\sim t_{\nu=n-1}\)
Calculate the \(p\)-value \(= \begin{cases} 2P(t_{\nu=n-1} \geq |t_{obs}|) &\text{ if } H_A: \mu \neq \mu_0 \\ P(t_{\nu=n-1} \leq t_{obs}) &\text{ if } H_A: \mu < \mu_0 \\ P(t_{\nu=n-1} \geq t_{obs}) &\text{ if } H_A: \mu > \mu_0 \end{cases}\)
Make a Decision \(\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $p$-value $< \alpha$}\\ \text{fail to reject $H_0$} & \text{if $p$-value $\geq \alpha$} \end{cases}\)
State the Conclusion in context.
Coffee shop fill machine ☕
Exercise 1 A coffee shop advertises that its machine dispenses 12 oz of coffee per cup. The owner wants to check if the machine is properly calibrated. To investigate, the owner takes a random sample of size 20 cups and records the amount dispensed (in oz).
coffee <- c(
12.05, 11.87, 12.23, 11.73, 11.84, 12.03, 11.96, 12.02, 11.84, 12.07,
11.87, 11.90, 12.01, 11.87, 11.90, 11.89, 12.00, 11.99, 11.92, 12.02
)Using a significance level of \(\alpha = 0.05\), conduct the appropriate hypothesis test to determine whether the machine is properly calibrated. You may assume the population distribution of dispensed amounts is normally distributed.
The owner is concerned if the machine is:
Both directions are a problem. Hence:
\[ \begin{align} H_0&: \mu= 12 &&& H_0&: \mu \neq 12 \end{align} \]
With \(\sigma\) unknown and a normal population:
\[ \begin{align} t_{obs} &= \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\\ &= \dfrac{11.9505 - 12}{\frac{0.1095673}{\sqrt{20}}} \\ &= -2.0204082 \end{align} \]
Caluations in R (mu0 = 12)

Since the \(p\)-value (0.058) \(> \alpha\) (0.05)
\[ \implies \text{Fail to reject }H_0 \]
Note
We would have come to the same conclusion using the critical value approach, and using confidence intervals.
Since our observed test statistic falls outside the rejection region, we fail to reject the null hypothesis.
\[ \begin{align} \bar x &\pm t_{\nu, \alpha/2} \frac{s}{\sqrt{n}}\\ 11.9505 &\pm 2.0930241 \frac{0.1095673}{\sqrt{20}}\\ 11.9505 &\pm 0.0512791\\ [ 11.899&, 12.002] \quad \leftarrow 95\text{\% CI for }\mu \end{align} \] Since the hypothesized value \(\mu_0\) = 12 falls in the 95% CI for the population mean \(\mu\) we fail to reject the null hypothesis.
With a \(p\)-value of 0.058, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that the average amount of coffee dispensed by the machine differs from 12 oz.
iClicker
When testing \(H_0: \mu = 20\) vs. \(H_A: \mu \neq 20\), we obtain a \(p\)-value of 0.015. Which statement is correct about a 99% confidence interval for \(\mu\) based on the same data?
iClicker
When testing \(H_0: \mu = 20\) vs. \(H_A: \mu \neq 20\), we obtain a \(p\)-value of 0.015. Which statement is correct about a 95% confidence interval for \(\mu\) based on the same data?
Suppose we approach this situation from the consumer’s perspective.
Now, we are only concerned if the machine is underfilling cups (we’re not going to complain if we get more coffee than we paid for).
In this case, we perform a lower-tailed test.
1️⃣ \(H_0:\mu=12 \quad H_0:\mu<12\)
2️⃣ \(t_{obs} = -2.02\)
3️⃣ \(\begin{align} p\text{-val} &= \Pr(t_{19} \leq -2.02)\\ &= 0.0288327 \end{align}\)
4️⃣ Reject \(H_0\) (\(p\)-value \(< \alpha\))
5️⃣ With a \(p\)-value of 0.029, there is sufficient evidence to suggest that the coffee machine is underfilling and dispensing less than 12 oz per cup on average.

Suppose we approach this situation from the greedy owner perspective.
They are only concerned if the machine is overrfilling cups (and are not concerned if customers receive slightly less than they paid for).
In this case, we perform a upper-tailed test.
1️⃣ \(H_0:\mu=12 \quad H_0:\mu>12\)
2️⃣ \(t_{obs} = -2.02\)
3️⃣ \(\begin{align} p\text{-val} &= \Pr(t_{19} \geq -2.02)\\ &= 0.9711673 \end{align}\)
4️⃣ Fail to reject \(H_0\) (\(p\)-value \(> \alpha\))
5️⃣ With a \(p\)-value of 0.971, there is insufficient evidence to suggest that the coffee machine overfilling and dispensing more than 12 oz per cup on average.

x |
a numeric vector of data |
alternative |
specifies1 \(H_A\). Either: "two.sided" (default), "greater" or "less". |
mu |
a number indicating the hypothesized value of the mean, i.e \(\mu_0\) |
conf.level |
Confidence level = 1 - \(\alpha\) (default 95%, i.e. \(\alpha = 0.05\)) |
One Sample t-test
data: coffee
t = -2.0204, df = 19, p-value = 0.05767
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
11.89922 12.00178
sample estimates:
mean of x
11.9505
Notice we also get the same CI we constructed here
Notice on the R output for one-sided tests we have the one-sided CI constructed using
\[ \begin{align} (-\infty &, \bar x + t_{19, 1-\alpha} \times s/\sqrt{n})\\ (-\infty &, 11.9505 + 1.729 \times \frac{0.11}{\sqrt{20}})\\ (-\infty &, 11.9505 + 0.0423638)\\ \end{align} \] \[ (-\infty, 11.993) \]
Here we have the one-sided CI using the other tail:
\[ \begin{align} (\bar x + t_{19, \alpha} \times s/\sqrt{n}&, \infty)\\ (11.9505 -1.729 \times \frac{0.11}{\sqrt{20}}&, \infty)\\ (11.9505 -0.0423638&, \infty)\\ \end{align} \] \[ (11.908, \infty) \]
iClicker
Consider the following output t.test()
One Sample t-test
data: x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
17.66720 22.51405
sample estimates:
mean of x
20.09062
What was the sample size used for this hypothesis test?
iClicker
Consider the following output t.test()
One Sample t-test
data: x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
17.66720 22.51405
sample estimates:
mean of x
20.09062
What is \(\mu_0\), the hypothesize value for the mean?
iClicker
Consider the following output t.test()
One Sample t-test
data: x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
17.66720 22.51405
sample estimates:
mean of x
20.09062
What is the significance level of this test?
iClicker
Consider the following output t.test()
One Sample t-test
data: x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
17.66720 22.51405
sample estimates:
mean of x
20.09062
Using the same data and significance level, would we reject the hypothesis
\[ H_0: \mu = 18 \text { vs. } H_A: \mu > 18 \]
The critical value approach and the \(p\)-value approach should provide the same conclusion (provided the hypotheses, data, and significance level is the same)
Rather than a binary outcome (reject vs. fail to reject) the \(p\)-value gives information about the strength of evidence against the null hypothesis.
e.g a \(p\)-value of 0.04999 and 0.0000001 are both significant, but 0.0000001 indicates stronger evidence than 0.04999.
t.test() can be used to conduct these tests in R.