STAT 205: Introduction to Mathematical Statistics
University of British Columbia Okanagan
March 15, 2024
Estimating a parameter from sample data can involve:
A single number (a point estimate).
An interval of plausible values (a confidence interval).
Often, the goal of an investigation isn’t just parameter estimation, but deciding between two contradictory claims about the parameter.
This fall under statistical inference, specifically hypothesis testing.
In this lecture we will be covering
Terminology:
Significance Level (\(\alpha\)-level),
The Critical Value Approach for hypothesis tests for population mean
A statistical hypothesis, or just hypothesis, is a claim or assertion either about the value of a single parameter (i.e., a characteristic of a population or a probability distribution), about the values of several parameters, or about the form of an entire probability distribution.
Today we’ll be focusing on hypothesis testing for single parameters
In any hypothesis-testing problem, there are two contradictory hypotheses under consideration.
Definition 1: Null and Alternative Hypothesis
A null hypothesis (\(H_0\)): often represents a skeptical perspective or a claim to be tested. The alternative hypothesis (\(H_A\) or \(H_a\) or \(H_1\)): while \(H_a\) represents an alternative claim under consideration and is often represented by a range of possible parameter values.
Objective: based on sample information, decide which of the two hypotheses is most likely.
Example 1: GPA of America Colleges
We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0).
Example 2: Voting turn out
We want to test whether the proportion of registered voters in Santa Clara County who voted in the primary election is more than 30%.
Example 3: Average college duration
We want to test if college students take less than five years to graduate from college, on the average.
In statistics, hypothesis-testing problems are formulated so that the null hypothesis is initially assumed to be true.
This initial claim will not be rejected in favor of the alternative claim unless sample evidence provides strong evidence for the latter.
As a mathematical convention \(H_0\) will always be written with an equal sign
The choice comparison symbol for \(H_A\) will depend on the wording of the hypothesis test.
Example 1: GPA of America Colleges
\[ \begin{align} H_0: \mu &= 2.0 & H_A: \mu &\neq 2.0 \end{align} \]
Example 2: Voting turn out
\[ \begin{align} H_0: p &= 0.30 & H_A: p &> 0.30 \end{align} \]
Example 3: Average college duration
\[ \begin{align} H_0: \mu &= 5 & H_A: \mu &< 5 \end{align} \]
Null Hypothesis | Alternative Hypothesis | Description |
---|---|---|
\(H_0: \mu = 2\) | \(H_A: \mu \neq 2\) | Two-tailed test for population mean |
\(H_0: p = 0.30\) | \(H_0: p > 0.30\) | Upper-tailed test for population proportion |
\(H_0: \mu = 5\) | \(H_A: \mu < 5\) | Lower-tailed test for population mean |
The hallmarks of hypothesis testing are akin to the US court system. We start off with the belief that the defendant is not guilty and seek strong evidence to suggest guilt, i.e.
\[ \begin{align} H_0: &\text{ not guilty} & H_A: &\text{ guilty} \end{align} \]
Note: Even if the jurors leave unconvinced of guilt beyond a reasonable doubt, this does not mean they believe the defendant is innocent.
There are two important errors that can be made in this court room analogy:
We refer to these as Type I and Type II errors respectively.
Null Hypothesis is True | Null Hypothesis is False | |
---|---|---|
Fail to reject \(H_0\) | Correct | Type II Error |
Reject \(H_0\) | Type I error | Correct |
The probability of making a Type I error is
\[ \Pr({\text{Reject $H_0$}\mid H_0 \text{ is TRUE}}) = \alpha \]
The probability of making a Type II error is\[ \Pr({\text{Fail to Reject $H_0$}\mid H_0 \text{ is FALSE}}) = \beta \]
Heinz claims that their ketchup bottle contain 20 oz. Of course, there is some inherent error at the manufacturing plant so not all bottles will weigh exactly 20 oz.
We’ll assume that the amount of ketchup dispensed into Heinz tomato ketchup bottles is normally distributed and the standard deviation is known to be \(\sigma = 0.2\)
Customers are suspicious that Heinz is actually under-filling their bottles so they collect some data from 30 bottles to test this hypothesis …
We formulate the claim in a null hypothesis1 which represents the status quo, or the commonly accepted fact.
\[\begin{equation*} H_0: \mu = 20\text{oz} \end{equation*}\]More generally we right:
\[\begin{equation} H_0: \mu = \mu_0 \end{equation}\]where \(\mu_0\) is some number (in this case 20). We refer to \(\mu_0\) as the null value or hypothesize value
Even if this null hypothesis is true, we don’t expect \(\bar X\) to be exactly 20. If fact, we know exactly what this distribution should look like …
Exercise 1: Alternative Hypothesis
Which of the following correctly states the null and alternative hypotheses for this Heinz tomato ketchup example?
A. \(H_0: \bar x = 20 \text{ vs. } H_A: \bar x < 20\)
B. \(H_0: \mu = 20 \text{ vs. } H_A: \mu~ {\leq} ~ 20\)
C. \(H_0: \mu = 20 \text{ vs. } H_A: \mu < 20\)
D. \(H_0: \mu < 20 \text{ vs. } H_A: \mu = 20\)
E. \(H_0: \mu = 20 \text{ vs. } H_A: \mu \neq 20\)
Hypotheses can be expressed in words, notice that they will always involve population parameter(s), not sample statistics.
Null hypothesis \(H_0 : \mu =20 \text{ oz}\) or
Alternative hypothesis \(H_A: \mu < 20 \text{ oz}\) or
To test our hypotheses, we need to compare them against our observed data.
Suppose we found that the average weight in that sample of size 30 was 19.88 oz.
Thus they have gathered some evidence that Heinz bottles contain less than 20oz of ketchup. But what if \(\bar x\) was 19.99 oz, 19.98 oz, or 19.97 oz, etc,
🤔 So how small does \(\bar x\) need to get before we stop believing \(H_0\) in support of \(H_1\)?
To put another way, is the \(\bar x\) obtained from a sample of size \(n\) likely to have arisen by chance?
If our observed \(\bar x\) is very unlikely, we call this finding statistically significant.
Since the probability of observing \(\bar X\) equal to any1 \(\bar x\) will be zero, we will compute the probability of obtaining results at least as extreme as our observed \(\bar x\).
Since we know the sampling distribution of \(\bar X\) we can easily find these probabilities.
Assuming that Heinz is not lying, what is the probability that we observe an \(\bar x\) of 19.88 oz or less in a sample of 30 bottles?
\[ \begin{align} \Pr(\bar X \leq 19.88) &= \Pr\left(Z \leq \dfrac{\bar X - \mu}{\sigma/\sqrt{n}}\right) \\ &= \Pr\left(Z \leq \dfrac{19.88 - \mu}{0.2/\sqrt{30}}\right) \\ &= \Pr\left(Z \leq -3.2863353\right) \\ &= 0.0005075005 \text{ (from R)}\\ & < 0.001 \text{ (from Z-table) } \end{align} \]
Note that this probability was computed “under the null hypothesis”, i.e. assuming \(H_0\) was true.
The lower that probability, the more evidence that they might be lying.
The higher that probability, the less evidence that they might be lying.
The significance level, or “alpha-level” denoted by \(\alpha\) (most commonly \(\alpha = 0.05\)), provides a measure for the strength of the evidence needed before we stop believing the null.
To put this another way, it provides a cut-off value that splits our support in “accept” and “rejection” regions.
If we end up with an \(\bar x\) that is falls in the rejection region, then it is safe(-ish) to reject \(H_0\).
If observe an \(\bar x\) that falls into the “acceptance” region, or then we fail to reject \(H_0\).
Now rather than defining a cut-off value for \(\bar x\), we define critical value, i.e. the smallest \(z_{obs}\) value we will tolerate before we reject the null.
The critical value, denoted \(z_{crit}\) or \(z_{\alpha}\) is the cut-off point on the standard normal curve which separates or rejection region and acceptance region.
For our Heinz example, with \(\alpha = 0.05\), \(z_{crit}\) is the value that satisfies the following probability statement:
Exercise 2: Alternative Hypothesis
Use the \(Z\)-table or R, find the critical value defined below by \(P(Z < z_{crit}) = 0.05\).
Critical Value Approach
Critical Value approach for a single population mean
State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]
Find critical value:
\[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\]Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)
Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).
Warning
Rejecting the null hypothesis, is not the same proving \(H_A\) is true.
Generic conclusion when \(H_0\) is rejected:
since the observed test statistics falls in the rejection region we reject the null hypothesis in favour of the alternative.
Specific Conclusion for Example 1 (\(\bar x = 19.88\). At a \(\alpha = 0.05\)):
At a 5% significance level, there is sufficient evidence to suggest that the Heinz is under-filling their ketchup bottles and that the average amount of ketchup in their bottles is less than 20 oz.
Warning
Failing to reject the null hypothesis, is the same as proving \(H_0\) is true.
Generic conclusion when we fail to reject \(H_0\):
since the observed test statistics falls in the “acceptance” (fail to reject) region we fail to reject the null hypothesis.
Specific Conclusion for Example 1 (\(\bar x = 19.88\). At a \(\alpha = 0.05\)):
At a 5% significance level, there is insufficient evidence to suggest that the Heinz is filling their ketchup bottles below the advertize 20 oz.
What we just performed was a lower-tailed hypothesis test.
In some situations we may choose to any one of these three alternative hypotheses:
\[ H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test}\\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \]
The direction of our alternative hypothesis will depend on the situation at hand.
Consider our Heinz ketchup example:
consumers of Heinz ketchup may be considered with the lower-tailed test (i.e. are we getting ripped off and getting less than we paid for?)
Heinz corporation, on the other hand, would be more concerned with the upper-tailed test (i.e. is the company supplying more than they need to?)
As a general rule of thumb, choose the two-sided alternative unless you have good reason to only be interested in one particular side.