Lecture 11: Hypothesis Testing one-sample mean

STAT 205: Introduction to Mathematical Statistics

Dr. Irene Vrbik

University of British Columbia Okanagan

March 15, 2024

Motivation

  • Estimating a parameter from sample data can involve:

    • A single number (a point estimate).

    • An interval of plausible values (a confidence interval).

  • Often, the goal of an investigation isn’t just parameter estimation, but deciding between two contradictory claims about the parameter.

  • This fall under statistical inference, specifically hypothesis testing.

Outline

In this lecture we will be covering

What is a hypothesis test?

  • A statistical hypothesis, or just hypothesis, is a claim or assertion either about the value of a single parameter (i.e., a characteristic of a population or a probability distribution), about the values of several parameters, or about the form of an entire probability distribution.

  • Today we’ll be focusing on hypothesis testing for single parameters

Null/Alternative Hypotheses

In any hypothesis-testing problem, there are two contradictory hypotheses under consideration.

Definition 1: Null and Alternative Hypothesis

A null hypothesis (\(H_0\)): often represents a skeptical perspective or a claim to be tested. The alternative hypothesis (\(H_A\) or \(H_a\) or \(H_1\)): while \(H_a\) represents an alternative claim under consideration and is often represented by a range of possible parameter values.

Objective: based on sample information, decide which of the two hypotheses is most likely.

Example 1: GPA of America Colleges

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0).

Example 2: Voting turn out

We want to test whether the proportion of registered voters in Santa Clara County who voted in the primary election is more than 30%.

Example 3: Average college duration

We want to test if college students take less than five years to graduate from college, on the average.

Setup

  • In statistics, hypothesis-testing problems are formulated so that the null hypothesis is initially assumed to be true.

  • This initial claim will not be rejected in favor of the alternative claim unless sample evidence provides strong evidence for the latter.

  • As a mathematical convention \(H_0\) will always be written with an equal sign

  • The choice comparison symbol for \(H_A\) will depend on the wording of the hypothesis test.

Example 1: GPA of America Colleges

\[ \begin{align} H_0: \mu &= 2.0 & H_A: \mu &\neq 2.0 \end{align} \]

Example 2: Voting turn out

\[ \begin{align} H_0: p &= 0.30 & H_A: p &> 0.30 \end{align} \]

Example 3: Average college duration

\[ \begin{align} H_0: \mu &= 5 & H_A: \mu &< 5 \end{align} \]

Terminology

Null Hypothesis Alternative Hypothesis Description
\(H_0: \mu = 2\) \(H_A: \mu \neq 2\) Two-tailed test for population mean
\(H_0: p = 0.30\) \(H_0: p > 0.30\) Upper-tailed test for population proportion
\(H_0: \mu = 5\) \(H_A: \mu < 5\) Lower-tailed test for population mean

Evidence

  • The null hypothesis will be rejected in favour of the alternative hypothesis only if sample provides strong evidence against \(H_0\).
  • If the sample does not strongly contradict \(H_0\), we will continue to accept the status quo formalized in the null hypothesis.
  • The two possible conclusions from a hypothesis-testing analysis are then reject \(H_0\) or fail to reject \(H_0\).

Court System

The hallmarks of hypothesis testing are akin to the US court system. We start off with the belief that the defendant is not guilty and seek strong evidence to suggest guilt, i.e.

\[ \begin{align} H_0: &\text{ not guilty} & H_A: &\text{ guilty} \end{align} \]

Note: Even if the jurors leave unconvinced of guilt beyond a reasonable doubt, this does not mean they believe the defendant is innocent.

Type of errors

There are two important errors that can be made in this court room analogy:

  1. We could convict an innocent person (rejects the null hypothesis) based on the evidence, even though the person is actually innocent, or
  2. Let a guilty person go free (fails to reject the null hypothesis) despite the person being guilty.

We refer to these as Type I and Type II errors respectively.

Error probabilities

Types of decisions one can make in a formal hypothesis test.
Null Hypothesis is True Null Hypothesis is False
Fail to reject \(H_0\) Correct Type II Error
Reject \(H_0\) Type I error Correct

The probability of making a Type I error is

\[ \Pr({\text{Reject $H_0$}\mid H_0 \text{ is TRUE}}) = \alpha \]

The probability of making a Type II error is\[ \Pr({\text{Fail to Reject $H_0$}\mid H_0 \text{ is FALSE}}) = \beta \]

Ketchup Example

  • Heinz claims that their ketchup bottle contain 20 oz. Of course, there is some inherent error at the manufacturing plant so not all bottles will weigh exactly 20 oz.

  • We’ll assume that the amount of ketchup dispensed into Heinz tomato ketchup bottles is normally distributed and the standard deviation is known to be \(\sigma = 0.2\)

  • Customers are suspicious that Heinz is actually under-filling their bottles so they collect some data from 30 bottles to test this hypothesis …

Null Hypothesis

We formulate the claim in a null hypothesis1 which represents the status quo, or the commonly accepted fact.

\[\begin{equation*} H_0: \mu = 20\text{oz} \end{equation*}\]

More generally we right:

\[\begin{equation} H_0: \mu = \mu_0 \end{equation}\]

where \(\mu_0\) is some number (in this case 20). We refer to \(\mu_0\) as the null value or hypothesize value

Sampling Distribution

Even if this null hypothesis is true, we don’t expect \(\bar X\) to be exactly 20. If fact, we know exactly what this distribution should look like …

Under the null hypothesis the sampling distrbution of the sample mean is Normal\((20, 2/\sqrt{n} = 2/\sqrt{20} = 0.365)\)

Question: Alternative Hypothesis

Exercise 1: Alternative Hypothesis

Which of the following correctly states the null and alternative hypotheses for this Heinz tomato ketchup example?

A. \(H_0: \bar x = 20 \text{ vs. } H_A: \bar x < 20\)

B. \(H_0: \mu = 20 \text{ vs. } H_A: \mu~ {\leq} ~ 20\)

C. \(H_0: \mu = 20 \text{ vs. } H_A: \mu < 20\)

D. \(H_0: \mu < 20 \text{ vs. } H_A: \mu = 20\)

E. \(H_0: \mu = 20 \text{ vs. } H_A: \mu \neq 20\)

Alternative Hypothesis

  • In addition to a null hypothesis, we need to also formulate an alternative hypothesis denoted \(H_A\) (or \(H_1\) in some texts)
  • This is based on our research question of interest and is often the hypothesis the researcher is hoping to “prove”.
  • In our example, we are trying to investigate if the Heinz is under-filling their bottles, i.e. \[ H_A: \mu < 20 \text{ oz} \]
  • To “prove” this we will need to see strong supporting evidence in the form of data.

Hypothesis in words

Hypotheses can be expressed in words, notice that they will always involve population parameter(s), not sample statistics.

Null hypothesis \(H_0 : \mu =20 \text{ oz}\) or

  • \(H_0\): the true (or “long-run”) average amount of ketchup in Heinz tomato ketchup bottles is 20 ounces.

Alternative hypothesis \(H_A: \mu < 20 \text{ oz}\) or

  • \(H_A:\) the long-run average amount of ketchup in Heinz tomato ketchup bottles is less than 20 ounces.

Gathering Evidence

  • To test our hypotheses, we need to compare them against our observed data.

  • Suppose we found that the average weight in that sample of size 30 was 19.88 oz.

  • Thus they have gathered some evidence that Heinz bottles contain less than 20oz of ketchup. But what if \(\bar x\) was 19.99 oz, 19.98 oz, or 19.97 oz, etc,

🤔 So how small does \(\bar x\) need to get before we stop believing \(H_0\) in support of \(H_1\)?

Making Decisions based on data

  • To put another way, is the \(\bar x\) obtained from a sample of size \(n\) likely to have arisen by chance?

  • If our observed \(\bar x\) is very unlikely, we call this finding statistically significant.

  • Since the probability of observing \(\bar X\) equal to any1 \(\bar x\) will be zero, we will compute the probability of obtaining results at least as extreme as our observed \(\bar x\).

  • Since we know the sampling distribution of \(\bar X\) we can easily find these probabilities.

Evidence

Assuming that Heinz is not lying, what is the probability that we observe an \(\bar x\) of 19.88 oz or less in a sample of 30 bottles?

\[ \begin{align} \Pr(\bar X \leq 19.88) &= \Pr\left(Z \leq \dfrac{\bar X - \mu}{\sigma/\sqrt{n}}\right) \\ &= \Pr\left(Z \leq \dfrac{19.88 - \mu}{0.2/\sqrt{30}}\right) \\ &= \Pr\left(Z \leq -3.2863353\right) \\ &= 0.0005075005 \text{ (from R)}\\ & < 0.001 \text{ (from Z-table) } \end{align} \]

Comment

  • Note that this probability was computed “under the null hypothesis”, i.e. assuming \(H_0\) was true.

  • The lower that probability, the more evidence that they might be lying.

  • The higher that probability, the less evidence that they might be lying.

High Evidence

Here, the probability of observing a \(\bar x\) as low as 19.88 is very low (< 0.001). Hence, this sample is very unlikely to have happened by chance assuming \(\mu\) is indeed 20 oz. So we have strong evidence to suggest Heinz is lying.

Low Evidence

If instead \(\bar x = 19.99\) then \(P(\bar X \leq 19.99) = 0.3920956\). Since this sample is much more likely under the assumption \(\mu\)=20 oz we would have insufficient evidence to suggest that Heinz is lying.

Significance Level

  • The significance level, or “alpha-level” denoted by \(\alpha\) (most commonly \(\alpha = 0.05\)), provides a measure for the strength of the evidence needed before we stop believing the null.

  • To put this another way, it provides a cut-off value that splits our support in “accept” and “rejection” regions.

  • If we end up with an \(\bar x\) that is falls in the rejection region, then it is safe(-ish) to reject \(H_0\).

  • If observe an \(\bar x\) that falls into the “acceptance” region, or then we fail to reject \(H_0\).

Sample mean cutoff

Any sample with an \(\bar x\) of 19.94 or less will be statistically significant and supply sufficient evidence to reject H0.

Test statistic

  • While it is easy to understand the cutoff value in terms of \(\bar x\), in practice we will be defining this cut-off in terms of the standardized \(z\)-score.
  • That is, rather than working with probabilities on \(\bar X\) we consider the following test statistic
\[\begin{equation}\label{eq:teststat} Z_{obs} = \dfrac{\bar X - \mu_0}{\sigma_{\bar X}} = \dfrac{\bar X - \mu_0}{\sigma/{\sqrt{n}}} \sim N(0,1) \end{equation}\]
  • We denote the observed test statistic by \(z_{obs}\)

Critical Value

  • Now rather than defining a cut-off value for \(\bar x\), we define critical value, i.e. the smallest \(z_{obs}\) value we will tolerate before we reject the null.

  • The critical value, denoted \(z_{crit}\) or \(z_{\alpha}\) is the cut-off point on the standard normal curve which separates or rejection region and acceptance region.

  • For our Heinz example, with \(\alpha = 0.05\), \(z_{crit}\) is the value that satisfies the following probability statement:

\[\begin{align*} P(Z < z_{crit}) &= \alpha = 0.05 \end{align*}\]

Question: Finding z-crit

Exercise 2: Alternative Hypothesis

Use the \(Z\)-table or R, find the critical value defined below by \(P(Z < z_{crit}) = 0.05\).

  1. -1.645

  2. -1.96

  3. 1.645

  4. 1.96

  5. none of the above

qnorm(0.05)
[1] -1.644854

Critical z-value for a lower-tailed hypothesis test with a significance level of \(\alpha = 0.05\)

Notice how the observed test statistic from our first example (\(\bar x\) = 19.88) falls in the so-called rejection region.

Notice how the observed test statistic from our first example (\(\bar x\) = 19.99) falls in the so-called “acceptance” (or fail to reject) region.

Critical Value Approach

Critical Value Approach

  1. Choose significance level \(\alpha\) (usually taken to be 0.05)
  2. Formulate your null and alternative hypothesis
  3. Find the critical value that defines our rejection region
  4. Calculate the observed test statistic \(z_{obs}\)
  5. Make a decision:
    • If \(z_{obs}\) falls in the rejection region we reject \(H_0\)
    • If \(z_{obs}\) falls outside the rejection region we fail to reject \(H_0\)

Critical Value approach for a single population mean

  1. State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Find critical value:

    \[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\]
  3. Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)

  4. Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).

Critical Regions

Rejecting the null

Warning

Rejecting the null hypothesis, is not the same proving \(H_A\) is true.

Generic conclusion when \(H_0\) is rejected:

since the observed test statistics falls in the rejection region we reject the null hypothesis in favour of the alternative.

Specific Conclusion for Example 1 (\(\bar x = 19.88\). At a \(\alpha = 0.05\)):

At a 5% significance level, there is sufficient evidence to suggest that the Heinz is under-filling their ketchup bottles and that the average amount of ketchup in their bottles is less than 20 oz.

Failing to reject the null

Warning

Failing to reject the null hypothesis, is the same as proving \(H_0\) is true.

Generic conclusion when we fail to reject \(H_0\):

since the observed test statistics falls in the “acceptance” (fail to reject) region we fail to reject the null hypothesis.

Specific Conclusion for Example 1 (\(\bar x = 19.88\). At a \(\alpha = 0.05\)):

At a 5% significance level, there is insufficient evidence to suggest that the Heinz is filling their ketchup bottles below the advertize 20 oz.

Direction of Alternative Hypothesis

  • What we just performed was a lower-tailed hypothesis test.

  • In some situations we may choose to any one of these three alternative hypotheses:

\[ H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test}\\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \]

The direction of our alternative hypothesis will depend on the situation at hand.

Alternative Alternatives

Consider our Heinz ketchup example:

  • consumers of Heinz ketchup may be considered with the lower-tailed test (i.e. are we getting ripped off and getting less than we paid for?)

  • Heinz corporation, on the other hand, would be more concerned with the upper-tailed test (i.e. is the company supplying more than they need to?)

As a general rule of thumb, choose the two-sided alternative unless you have good reason to only be interested in one particular side.

References

Devore, J. L., K. N. Berk, and M. A. Carlton. 2021. Modern Mathematical Statistics with Applications. Springer Texts in Statistics. Springer International Publishing. https://books.google.ca/books?id=ghcsEAAAQBAJ.