stat205 – Hypothesis Testing for the Mean

Introduction

We are still testing

\[ \begin{align} H_0&: \mu = \mu_0 &&& H_A&: \begin{cases} \mu \ne \mu_0 & \text{two-sided}\\ \mu < \mu_0 & \text{left-tailed}\\ \mu > \mu_0 & \text{right-tailed} \end{cases} \end{align} \]

Last class we covered the critical value approach for hypothesis tests concerning the population mean.
Today we will cover the (more common) p-value approach.

Test Statistic

The p-value approach relies on a null distribution (the distribution assuming the null hypothesis is correct), determined by the appropriate test statistic.

When $\sigma$ is know

\[ \dfrac{\bar X - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1) \]

Conditions for $Z$-statistic

Same as conditions as for $t$-statistic 1️⃣2️⃣3️⃣ and 4️⃣ population standard deviation, $\sigma$ is known.

When $\sigma$ is unknown

\[ \dfrac{\bar X - \mu_0}{s/\sqrt{n}} \sim t_{\nu = n-1} \]

Conditions for $t$-statistic

1️⃣ Simple Random Sample
2️⃣ Observations independent
3️⃣ Population is normal or $n \geq 30$

Critical Value Approach

With the critical value approach we asked:

Did our observed test statistic cross the critical value threshold (aka the rejection boundary)?

If our test statistic falls in the Rejection Region

\[ \implies \text{Reject } H_0 \]

If our test statistic falls outside the Rejection Region

\[ \implies \text{Fail to reject } H_0 \]

t-statistic

In the more realistic situation where $\sigma$ is unknown, the null distribution is the $t$-distribution with $\nu = n-1$ degrees of freedom.

Critical value

For a right-tailed test at $\alpha$ = 0.05 and a null distribution with 5 degrees of freedom, the critical value is at 2.015

Limitation

An observed test statistic of 0 lead us fail to reject the null hypothesis.

Limitation

An observed test statistic of 1.99 lead us fail to reject the null hypothesis.

Limitation

An observed test statistic of 2.02 lead us reject the null hypothesis.

Limitation

An observed test statistic of 5.99 lead us reject the null hypothesis.

Limitations

Two very different results are treated identically using the critical value approach.
The critical value approach does note tells us:
- how far close $t_{obs}$ is to the rejection boundary
- how strongly the data disagree with $H_0$

The $p$-value approach solves this problem $\dots$

Critical values to $p$-values

Critical value approach asks:

Did $t_{obs}$ fall into the rejection region?

$\implies$ binary result Yes (reject)/ No (fail to reject)

$p$-value approach asks:

How surprising is $t_{obs}$ if $H_0$ were true?

$\implies$ $p$-value (probability between 0 and 1)

The $p$-value provides a measure of evidence $\dots$

Evidence

Evidence measures:

How inconsistent the observed data are with the assumption that $H_0$ is true.

Results that happen often under $H_0 \implies$ weak evidence
Results that rarely happen under $H_0 \implies$ strong evidence

p-value

An observed test statistic of 0 not surprising under the null hypothesis (little evidence).

p-value

An observed test statistic of 1.99 somewhat surprising under the null hypothesis (some evidence).

p-value

An observed test statistic of 2.02 somewhat surprising under the null hypothesis (significant evidence).

p-value

An observed test statistic of 5.99 somewhat surprising under the null hypothesis (strong evidence).

p-value (zoomed)

An observed test statistic of 5.99 somewhat surprising under the null hypothesis (strong evidence).

$p$-value

$p$-value

The $p$-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true.

Right-tailed test

$\Pr(t_{\nu} \geq t_{obs} \mid H_0)$

Left-tailed test

$\Pr(t_{\nu} \leq t_{obs} \mid H_0)$

Two-sided test

$2\cdot\Pr(t_{\nu} \geq |t_{obs}| \mid H_0)$

The alternative hypothesis determines which tail(s) are “extreme”.

Making a Decision

Decision (critical value approach)

🚫 If $t_{obs}$ falls in rejection region

\[ \implies \text{Reject } H_0 \]

🟢 If $t_{obs}$ falls outside rejection region

\[ \implies \text{Fail to reject } H_0 \]

Decision ($p$-value approach)

🚫 If $p$-value $< \alpha$

\[ \implies \text{Reject }H_0 \]

🟢 If $p$-value $\geq \alpha$

\[ \implies \text{Fail to reject }H_0 \]

Both methods always give the same conclusion.

Interpreting $p$-values

*Guideline language for quantifying evidence against* $H_0$ *using* $p$*-values at* $\alpha = 0.05$.
$p$-value	Evidence against $H_0$	Decision
$0.1 \leq p \leq 1$	no evidence	🟢 Fail to reject $H_0$
$0.05 < p \leq 0.10$	weak evidence	🟢 Fail to reject $H_0$
$0.01 < p \leq 0.05$	sufficient evidence	🚫 Reject $H_0$
$0.001 < p \leq 0.01$	strong evidence	🚫 Reject $H_0$
$0< p \leq 0.001$	very strong evidence	🚫 Reject $H_0$

Hypothesis Tests Critical value approach (using $Z$)

State hypotheses $H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}$
Compute test statistic $z_{obs} = \dfrac{\bar x - \mu_0}{\frac{\sigma}{\sqrt{n}}}\sim N(0,1)$
Determine the critical value $z^* = \begin{cases} P(Z \geq z^*) = \alpha/2 &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z \geq z^*) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z \leq z^*) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}$
Make a Decision $\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $z_{obs}$ falls in the RR}\\ \text{fail to reject $H_0$} & \text{if $z_{obs}$ falls outside the RR} \end{cases}$
State the Conclusion in context.

Hypothesis Tests Critical value approach (using $t$)

State hypotheses $H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}$
Compute test statistic $t_{obs} = \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\sim t_{\nu = n - 1}$
Determine the critical value $t^* = \begin{cases} P(t_{\nu = n - 1} \geq t^*) = \alpha/2 &\text{ if } H_A: \mu \neq \mu_0 \\ P(t_{\nu = n - 1} \geq t^*) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(t_{\nu = n - 1} \leq t^*) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}$
Make a Decision $\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $t_{obs}$ falls in the RR}\\ \text{fail to reject $H_0$} & \text{if $t_{obs}$ falls outside the RR} \end{cases}$
State the Conclusion in context.

Hypothesis Tests $p$-value approach (using $Z$)

State hypotheses $H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}$
Compute test statistic $z_{obs} = \dfrac{\bar x - \mu_0}{\frac{\sigma}{\sqrt{n}}}\sim N(0,1)$
Calculate the $p$-value $= \begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: \mu < \mu_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: \mu > \mu_0 \end{cases}$
Make a Decision $\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $p$-value $< \alpha$}\\ \text{fail to reject $H_0$} & \text{if $p$-value $\geq \alpha$} \end{cases}$
State the Conclusion in context.

Hypothesis Tests $p$-value approach (using $t$)

State hypotheses $H_0 : \mu = \mu_0 \quad \text{ vs. } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases}$
Compute test statistic $t_{obs} = \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\sim t_{\nu=n-1}$
Calculate the $p$-value $= \begin{cases} 2P(t_{\nu=n-1} \geq |t_{obs}|) &\text{ if } H_A: \mu \neq \mu_0 \\ P(t_{\nu=n-1} \leq t_{obs}) &\text{ if } H_A: \mu < \mu_0 \\ P(t_{\nu=n-1} \geq t_{obs}) &\text{ if } H_A: \mu > \mu_0 \end{cases}$
Make a Decision $\rightarrow \begin{cases} \text{reject $H_0$} & \text{if $p$-value $< \alpha$}\\ \text{fail to reject $H_0$} & \text{if $p$-value $\geq \alpha$} \end{cases}$
State the Conclusion in context.

Example

Coffee shop fill machine ☕

Exercise 1 A coffee shop advertises that its machine dispenses 12 oz of coffee per cup. The owner wants to check if the machine is properly calibrated. To investigate, the owner takes a random sample of size 20 cups and records the amount dispensed (in oz).

coffee <- c(
  12.05, 11.87, 12.23, 11.73, 11.84, 12.03, 11.96, 12.02, 11.84, 12.07,
  11.87, 11.90, 12.01, 11.87, 11.90, 11.89, 12.00, 11.99, 11.92, 12.02
)

Using a significance level of $\alpha = 0.05$, conduct the appropriate hypothesis test to determine whether the machine is properly calibrated. You may assume the population distribution of dispensed amounts is normally distributed.

1. Hypotheses

The owner is concerned if the machine is:

underfilling cups ❌ (customers unhappy)
overfilling cups ❌ (loosing money)

Both directions are a problem. Hence:

\[ \begin{align} H_0&: \mu= 12 &&& H_0&: \mu \neq 12 \end{align} \]

2. Test Statistic (Null Distribution)

With $\sigma$ unknown and a normal population:

2. (Observed) Test Statistic

\[ \begin{align} t_{obs} &= \dfrac{\bar x - \mu_0}{\frac{s}{\sqrt{n}}}\\ &= \dfrac{11.9505 - 12}{\frac{0.1095673}{\sqrt{20}}} \\ &= -2.0204082 \end{align} \]

Caluations in R (mu0 = 12)

Code

xbar = mean(coffee)
s = sd(coffee)
n = length(coffee)
tobs = (xbar - mu0)/(s/sqrt(n))

3. p-value

$p$-value \[ \begin{align} &= 2*\Pr(t_{19} > |t_{obs}|)\\ &= 2*\Pr(t_{19} > 2.0204082)\\ &= 2*0.0288327\\ &= 0.0576654 \end{align} \]

2*pt(abs(-2.0204082), 
    df=19, 
    lower.tail = FALSE)

[1] 0.05766541

4. Decision

Since the $p$-value (0.058) $> \alpha$ (0.05)

\[ \implies \text{Fail to reject }H_0 \]

Note

We would have come to the same conclusion using the critical value approach, and using confidence intervals.

5. Decision (critical value)

Since our observed test statistic falls outside the rejection region, we fail to reject the null hypothesis.

5. Decision (confidence interval)

\[ \begin{align} \bar x &\pm t_{\nu, \alpha/2} \frac{s}{\sqrt{n}}\\ 11.9505 &\pm 2.0930241 \frac{0.1095673}{\sqrt{20}}\\ 11.9505 &\pm 0.0512791\\ [ 11.899&, 12.002] \quad \leftarrow 95\text{\% CI for }\mu \end{align} \] Since the hypothesized value $\mu_0$ = 12 falls in the 95% CI for the population mean $\mu$ we fail to reject the null hypothesis.

5. Conclusion

With a $p$-value of 0.058, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that the average amount of coffee dispensed by the machine differs from 12 oz.

iClicker

iClicker

When testing $H_0: \mu = 20$ vs. $H_A: \mu \neq 20$, we obtain a $p$-value of 0.015. Which statement is correct about a 99% confidence interval for $\mu$ based on the same data?

It will definitely contain 20
It will definitely not contain 20
It might contain 20
There is not enough information to determine

iClicker

iClicker

When testing $H_0: \mu = 20$ vs. $H_A: \mu \neq 20$, we obtain a $p$-value of 0.015. Which statement is correct about a 95% confidence interval for $\mu$ based on the same data?

It will definitely contain 20
It will definitely not contain 20
It might contain 20
There is not enough information to determine

Same Data Different Alternative

Suppose we approach this situation from the consumer’s perspective.
Now, we are only concerned if the machine is underfilling cups (we’re not going to complain if we get more coffee than we paid for).
In this case, we perform a lower-tailed test.

Lower-tailed test

1️⃣ $H_0:\mu=12 \quad H_0:\mu<12$

2️⃣ $t_{obs} = -2.02$

3️⃣ $\begin{align} p\text{-val} &= \Pr(t_{19} \leq -2.02)\\ &= 0.0288327 \end{align}$

4️⃣ Reject $H_0$ ($p$-value $< \alpha$)

5️⃣ With a $p$-value of 0.029, there is sufficient evidence to suggest that the coffee machine is underfilling and dispensing less than 12 oz per cup on average.

Same Data Different Alternative

Suppose we approach this situation from the greedy owner perspective.
They are only concerned if the machine is overrfilling cups (and are not concerned if customers receive slightly less than they paid for).
In this case, we perform a upper-tailed test.

Upper-tailed test

1️⃣ $H_0:\mu=12 \quad H_0:\mu>12$

2️⃣ $t_{obs} = -2.02$

3️⃣ $\begin{align} p\text{-val} &= \Pr(t_{19} \geq -2.02)\\ &= 0.9711673 \end{align}$

4️⃣ Fail to reject $H_0$ ($p$-value $> \alpha$)

5️⃣ With a $p$-value of 0.971, there is insufficient evidence to suggest that the coffee machine overfilling and dispensing more than 12 oz per cup on average.

t.test

t.test(x, alternative = c("two.sided", "less", "greater"),
       mu = 0, conf.level = 0.95, ...)

`x`	a numeric vector of data
`alternative`	specifies¹ $H_A$. Either: `"two.sided"` (default), `"greater"` or `"less"`.
`mu`	a number indicating the hypothesized value of the mean, i.e $\mu_0$
`conf.level`	Confidence level = 1 - $\alpha$ (default 95%, i.e. $\alpha = 0.05$)

R results

Two-tailed test

t.test(coffee, mu = mu)


    One Sample t-test

data:  coffee
t = -2.0204, df = 19, p-value = 0.05767
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
 11.89922 12.00178
sample estimates:
mean of x 
  11.9505

Notice we also get the same CI we constructed here

R results

Lower-tailed test

t.test(coffee, mu = mu, alternative = "less")


    One Sample t-test

data:  coffee
t = -2.0204, df = 19, p-value = 0.02883
alternative hypothesis: true mean is less than 12
95 percent confidence interval:
     -Inf 11.99286
sample estimates:
mean of x 
  11.9505

One-side CI

Notice on the R output for one-sided tests we have the one-sided CI constructed using

\[ \begin{align} (-\infty &, \bar x + t_{19, 1-\alpha} \times s/\sqrt{n})\\ (-\infty &, 11.9505 + 1.729 \times \frac{0.11}{\sqrt{20}})\\ (-\infty &, 11.9505 + 0.0423638)\\ \end{align} \] \[ (-\infty, 11.993) \]

xbar + qt(alpha, df=n-1, lower.tail=FALSE)*s/sqrt(n)

[1] 11.99286

R results

Upper-tailed test

t.test(coffee, mu = mu, alternative = "greater")


    One Sample t-test

data:  coffee
t = -2.0204, df = 19, p-value = 0.9712
alternative hypothesis: true mean is greater than 12
95 percent confidence interval:
 11.90814      Inf
sample estimates:
mean of x 
  11.9505

One-side CI

Here we have the one-sided CI using the other tail:

\[ \begin{align} (\bar x + t_{19, \alpha} \times s/\sqrt{n}&, \infty)\\ (11.9505 -1.729 \times \frac{0.11}{\sqrt{20}}&, \infty)\\ (11.9505 -0.0423638&, \infty)\\ \end{align} \] \[ (11.908, \infty) \]

xbar + qt(alpha, df=n-1, lower.tail=FALSE)*s/sqrt(n)

[1] 11.99286

iClicker

iClicker

Consider the following output t.test()


    One Sample t-test

data:  x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
 17.66720 22.51405
sample estimates:
mean of x 
 20.09062

What was the sample size used for this hypothesis test?

31
32
18
Impossible to say

iClicker

iClicker

Consider the following output t.test()


    One Sample t-test

data:  x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
 17.66720 22.51405
sample estimates:
mean of x 
 20.09062

What is $\mu_0$, the hypothesize value for the mean?

1.9622
0.058764
20.09062
18

iClicker

iClicker

Consider the following output t.test()


    One Sample t-test

data:  x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
 17.66720 22.51405
sample estimates:
mean of x 
 20.09062

What is the significance level of this test?

0.90
0.95
0.97
0.98
None of the above

iClicker

iClicker

Consider the following output t.test()


    One Sample t-test

data:  x
t = 1.9622, df = 31, p-value = 0.05876
alternative hypothesis: true mean is not equal to 18
97 percent confidence interval:
 17.66720 22.51405
sample estimates:
mean of x 
 20.09062

Using the same data and significance level, would we reject the hypothesis

\[ H_0: \mu = 18 \text { vs. } H_A: \mu > 18 \]

Yes, we would reject $H_0$
No, we would fail to reject $H_0$
Not information to determine

Summary

The critical value approach and the $p$-value approach should provide the same conclusion (provided the hypotheses, data, and significance level is the same)
Rather than a binary outcome (reject vs. fail to reject) the $p$-value gives information about the strength of evidence against the null hypothesis.
e.g a $p$-value of 0.04999 and 0.0000001 are both significant, but 0.0000001 indicates stronger evidence than 0.04999.
t.test() can be used to conduct these tests in R.

\(p\)-value	Evidence against \(H_0\)	Decision
\(0.1 \leq p \leq 1\)	no evidence	🟢 Fail to reject \(H_0\)
\(0.05 < p \leq 0.10\)	weak evidence	🟢 Fail to reject \(H_0\)
\(0.01 < p \leq 0.05\)	sufficient evidence	🚫 Reject \(H_0\)
\(0.001 < p \leq 0.01\)	strong evidence	🚫 Reject \(H_0\)
\(0< p \leq 0.001\)	very strong evidence	🚫 Reject \(H_0\)

`x`	a numeric vector of data
`alternative`	specifies¹ \(H_A\). Either: `"two.sided"` (default), `"greater"` or `"less"`.
`mu`	a number indicating the hypothesized value of the mean, i.e \(\mu_0\)
`conf.level`	Confidence level = 1 - \(\alpha\) (default 95%, i.e. \(\alpha = 0.05\))

Hypothesis Testing for the Mean

Introduction

Test Statistic

Critical Value Approach

t-statistic

Critical value

Limitation

Limitation

Limitation

Limitation

Limitations

Critical values to \(p\)-values

Evidence

p-value

p-value

p-value

p-value

p-value (zoomed)

\(p\)-value

Making a Decision

Interpreting \(p\)-values

Example

1. Hypotheses

2. Test Statistic (Null Distribution)

2. (Observed) Test Statistic

3. p-value

4. Decision

5. Decision (critical value)

5. Decision (confidence interval)

5. Conclusion

iClicker

iClicker

Same Data Different Alternative

Lower-tailed test

Same Data Different Alternative

Upper-tailed test

t.test

R results

Two-tailed test

R results

Lower-tailed test

One-side CI

R results

Upper-tailed test

One-side CI

iClicker

iClicker

iClicker

iClicker

Summary