Lecture 13: One-sample \(t\)-test

STAT 205: Introduction to Mathematical Statistics

Dr. Irene Vrbik

University of British Columbia Okanagan

March 15, 2024

Introduction

So far we have covered two hypothesis tests for a single sample:

  1. Hypothesis test for the mean \(\mu\)
  2. Hypothesis tests for the proportion \(p\)

The distribution of the test statistics associated with each of these tests is the standard normal. For obvious reasons, these are referred to \(Z\)-tests.

Z-tests

Test statistic for \(\mu\) with \(\sigma\) known

\(H_0: \mu = \mu_0 \quad \text{vs} \quad H_A: \mu \neq \mu_0\)

\[ \dfrac{\bar{X} - \mu_0}{\dfrac{\sigma}{\sqrt{n}}} \sim N(0,1) \\ \]

Test statistic for \(p\)

\(H_0: p = p_0 \quad \text{vs} \quad H_A: p \neq p_0\)

\[ \dfrac{\hat p - p_0}{\sqrt{ \dfrac{ p_0 ( 1 - p_0 )}{n} }} \sim N(0,1) \]

\[ \begin{equation} \text{test statistic} = \dfrac{\text{point estimator} - \text{null value}}{SE(\text{point estimator})} \end{equation} \]

Sampling Distribution of \(\bar{X}\) (CLT)

For sufficiently large i.i.d sample (usually \(n > 30\)), the sampling distribution of \(\bar{X}\) is Normal with mean \(\mu_{\bar{X}} = \mu\) (equal to the mean of the population), and standard error \(\sigma_{{\bar{X}}} = \sigma/\sqrt{n}\) (the standard deviation of the population divided by the square root of the sample size \(n\)). Note: the only assumption about the population is that the mean exists.

Distribution of \(\bar{X}\) under the null

The null hypothesis assumes that the population has a mean of \(\mu_0\). Hence the sampling distribution of \(\bar{X}\), assuming \(H_0\) is correct, is Normal with mean \(\mu_{\bar{X}} = \mu_0\), and standard error \(\sigma_{{\bar{X}}} = \sigma/\sqrt{n}\) (assuming population standard deviation is known).

Distribution of \(\frac{\bar{X}- \mu_0}{\sigma/\sqrt{n}}\) under the null

We can standardize any \(X \sim N(\mu, \sigma)\) random variable by subtracting the mean then dividing by the standard deviation, i.e. \(Z = \frac{X - \mu}{\sigma} \sim N(0,1)\). Doing this with the rv \(ar{X}\) yields our test statistic \(\frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1)\). See plinko animation here

Recap of Critical value Approach

  • From our theoretical knowledge of the sampling distribution, we know that test statistic values of 0 are more likely.

Recap of Critical value Approach

  • From our theoretical knowledge of the sampling distribution, we know that test statistic values of 0 are more likely.

  • On the flip side, we know that values of 1.96 or more (or -1.96 or less) only have a 5% chance of occurring by chance.

Recap of Critical value Approach

  • From our theoretical knowledge of the sampling distribution, we know that test statistic values of 0 are more likely.

  • On the flip side, we know that values of 1.96 or more (or -1.96 or less) only have a 5% chance of occurring by chance.

  • For a two-sided hypothesis test at \(\alpha = 0.05\), we fail to reject the hypothesis for all test statistics between -1.96 and 1.96 (teal balls)

Rejection Region

More generally, we will reject any null hypothesis for which the test statistic that in our rejection region defined in red below

It is determined by:

  • \(\alpha\) (most often 0.05)
  • the direction of \(H_A\)
  • the distribution of the test statistic

Critical Value approach for a single population mean

  1. State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Find critical value:

    \[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\]
  3. Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)

  4. Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).

P-value Approach

Alternatively we could make decisions by comparing the \(p\)-value with the significance-level \(\alpha\).

  • The \(p\)-value is how likely it is to observe a test statistics more extreme than the one we observed and compare this probability

  • If \(p\)-value \(<\alpha\), we reject \(H_0\)

P-value approach for a single population mean

  1. State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)

  3. Calculate the \(p\)_value

    \[\begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: p \neq p_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: p < p_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: p > p_0 \end{cases}\]
  4. Conclusion: reject \(H_0\) if if \(p\)-value is less than \(\alpha\) (typically 0.05), otherwise, fail to reject \(H_0\).

From \(Z\) to \(t\)

  • Notice that the test statistic used for hypothesis test for the population mean \(\mu\) requires that \(\sigma\) is known.

  • However, it is more common that \(\sigma\) is unknown and that we need to estimate \(\sigma\) from the sample data.

  • In this case, we use the \(t\)-test whose test statistic follows a t-distribution with \(n-1\) degrees of freedom. \[\begin{equation} \dfrac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim t_{\nu = n-1} \end{equation}\]

The critical values plotted are based on the standard normal. The larger the degrees of freedom, the more closely the \(t\)-distribution resembles the standard normal distribution.

Notice how the critical values change when we calculate them based on a \(t\)-distribution with 13 degrees of freedom (standard normal in light red, \(t_{\nu = 3}\) in darker red).

Comment

  • The concepts from \(Z\)-tests and CIs carry over to \(t\)-tests and CIs using the t-distribution in place of the standard normal.

  • Even when \(\sigma\) is known, this test is also useful for the \(t\)-test is more robust to the variability of smaller samples.

  • For large samples, the distinction between assuming \(\sigma\) is known or unknown becomes less critical since \(s\) tends to be a good estimate of \(\sigma\)

  • Since the \(t\)-distribution approaches the standard normal when \(n\) gets large, some would argue to always use the \(t\) over the \(Z\)-test.

Assumptions

Assumptions for a \(Z\)-test for population mean

Same as the CLT: independence of sample observations, and sufficiently large sample size (typically \(n >30\)). When a sample is small, we also require Normality, i.e. that the sample observations come from a normally distributed population.

Assumptions for Z-test for proportions

  1. The sample’s observations are independent, e.g. are from a simple random sample.
  2. success-failure condition1 we expected to see at least 10 successes and 10 failures in the sample, i.e. \(np \geq 10\) and \(n(1 - p) \geq 10\).

Assumptions: \(X_1, X_2, \dots, X_n\) are i.i.d + assumptions in the rhombuses.

Risso dolphin example

Exercise 1: Risso dolphins

Image source: seahistory.org

Elevated mercury concentrations in marine mammals, such as Risso’s dolphins, pose significant health risks to both the animals themselves and humans who consume them. In a study conducted in the Taiji area of Japan, researchers investigated the mercury content in the muscle tissue of 19 randomly sampled Risso’s dolphins. The summary statistics of the mercury content, measured in micrograms of mercury per wet gram of muscle (μg/wet g), are

\(n\) \(\bar{x}\) \(s\) minimum maximum
19 4.4 2.3 1.7 9.2

Checking Assumptions:

Are the independence and normality conditions satisfied for this data set?

  • The observations are a simple random sample, therefore independence is reasonable.

  • As we don’t have the raw data, we cannot assess for normality using qqnorm plots for example.

  • In lieu of this, we can verify that the summary statistics do not suggest any clear outliers, since all observations are within 2.5 standard deviations of the mean.

Based on this evidence, the normality condition seems reasonable.

Standard error calculations

Using the summary statistics, compute the standard error for the average mercury content in the \(n\) = 19 dolphins.

We plug in \(s\) and \(n\) into the formula: \[\begin{align} SE = s/\sqrt{n} = 2.3/\sqrt{19} = 0.5276562 \approx 0.528. \end{align}\]

Confidence Interval

Compute and interpret the 95% confidence interval for the average mercury content in Risso’s dolphins.

Recall the formula for CI for the mean in our previous unit: \[\begin{align*} \bar{x} \pm z^* \times SE \end{align*}\] We replace \(z^*\) by \(t^*\) which can be found in a similar was as before, only now we reference the \(t\)-distribution.

t-star

In other words, we need to find the \(t^*\) that satisifies the following expression: \[\begin{align} \Pr(-t^* < t_{n-1} < t^*) = 1-\alpha \end{align}\]

In R this is found using: (you should verify this on the t-table)

alpha = 0.05; n = 19
qt(alpha/2, df = n-1, lower.tail = FALSE)
[1] 2.100922

Returning to our CI, we now have: \[\begin{align*} \bar{x} &\pm t^* \times s/\sqrt{n}\\ 4.4 &\pm 2.100922 \times 0.5276562\\ 4.4 &\pm 1.1085645 \end{align*}\]

The last line represent the point estimate (\(\bar{x}\)) plus/minus the margin of error. This is a perfectly legitimate (and common) way of expressing a CI. Alternatively we could express it as: \[\begin{equation} (3.291, 5.509) \end{equation}\]

We are 95% confident the average mercury content of muscles in Risso’s dolphins is between 3.29 and 5.51 \(\mu\)g/wet gram, which is considered extremely high.

Connection with a two-sided test

  • Recall that we can also make decisions based on (\(1 - \alpha\))% CI for a two-tailed hypothesis with significance level \(\alpha\).

  • For this example, if we are testing at \(\alpha = 0.05\) with \(H_0: \mu = \mu_0\) vs. \(H_A: \mu \neq \mu_0\)

  • Hence, all values of \(\mu_0\) outside the interval (3.291, 5.509) will lead to a significant \(p\)-value.

  • To put another way, \(\mu_0\) values outside the interval (3.291, 5.509) will yield a \(t_{obs}\)1 that falls in the rejection region.

Connection between Confidence Intervals and Two-Sided Hypothesis Tests

Consider a two-sided hypothesis test at a significance level of \(\alpha\) \[\begin{align} H_0&: \mu = \mu_0 & H_A&: \mu \neq \mu_0 \end{align}\]

Suppose we construct the (1- \(\alpha\))% confidence interval (CI) for \(\mu\), using \[\begin{align} \bar x \pm t^* \times s/\sqrt{n} \end{align}\]

where \(t^*\) is found \(\Pr(-t^* < t_{n-1} < t^*) = 1-\alpha\), where \(t_{n-1}\) follows a \(t\)-distribution with degrees of freedom (\(df\) or \(\nu\)) equal to the sample size minus 1.

  • If \(\mu_0\) falls within the 100(1 - \(\alpha\))% CI, we do not have sufficient evidence to reject \(H_0\).

  • If \(\mu_0\) falls outside the 100(1 - \(\alpha\))% CI, we have sufficient evidence to reject \(H_0\).

A note on rounding

  • You may be tempted to round intermediate results to a certain number of decimal places before proceeding to the next step to simplify the process.

  • When doing these calculations in R, you should keep all digits (which you should be storing to an object) and refrain from rounding until the final step

  • Failure to do so could introduce a degree of error, which may carry over into subsequent calculations or final results.

Cherry Blossom Race example

Exercise 2: Cherry Blossom Race

The Cherry Blossom Race, held annually, attracts participants from various backgrounds and fitness levels. In 2006, the average time for all runners who completed the race was recorded as 93.29 minutes (5597 seconds). To investigate whether there has been a significant change in the average race time since then, data from 100 participants in the 2017 Cherry Blossom Race were randomly selected.

Using the data from the 2017 race, conduct a hypothesis test to determine whether runners in the 2017 race are getting faster or slower compared to the 2006 race.

Choosing a test

What test is most appropriate for this problem?

  • Here we are interested in the mean race time, so we will be conducting a hypothesis test for the population mean \(\mu\).

  • Since we do not know the population standard deviation and need to estimated it using \(s\) (the sample standard deviation), the one-sample \(t\)-test for the mean is the most appropriate test.

Testing assumptions

Are the assumptions for using this test met?

  • ✅ The data come from a simple random sample of all participants, so the observations are independent.

  • ✅ There are more than 30 observations in our sample, hence we can rely on the CLT to approximate the sampling distribution of \(\bar{X}\) with the normal distribution.

How to check for Normality

If the sample size was less than 30, we would need to check for normality. While there are formal test for such thing (eg. Shapiro-Wilk Test), we can use visual checks

  • Plot a histogram of the data and visually inspect whether the distribution resembles a bell curve.

  • Normal QQ Plot: If the points fall approximately along a diagonal line, the data is likely normally distributed.

  • Box Plot: Plot a box plot of the data and check for symmetry and the presence of outliers.

Visual Inspection of Plots

❌ Examining the plots in Figure 1, we see the data is approximately bell-shaped, however the are a number of outliers, and departures from the expected pattern in the QQ plot, it is reasonable to question the normality assumption.

Warning

Conducting a test when the assumptions are not met raises doubts about the validity of statistical inference and the generalizability of findings to a broader population.

Code
library(openintro)
library(dplyr)
n = 100

# to place the plots side-by-side in a 1x3 grid 
par(mfrow=c(1,3))

# Randomly sample 100 runners from the run17 data set
set.seed(17)
chr_race <-  run17 %>%
  filter(event == "10 Mile") %>% 
  sample_n(100, replace = FALSE)

hist(chr_race$clock_sec, xlab="Time to complete the race (in seconds)", main = "", breaks = 20)

boxplot(chr_race$clock_sec)

qqnorm(chr_race$clock_sec); qqline(chr_race$clock_sec)

Figure 1: A histogram (left), boxplot (center) and Normal Q-Q- plot for the time to complete the race from 100 randomly sampled runners in the 2017 Cherry Blossom run.

Population insight 👀

  • Since we are actually randomly sampling 100 observations from the full population (all 19,961 runners in the 2017 Cherry Blossom Run), we can actually plot the population distribution.

  • For fun, let’s investigate if our suspicion of non-normality is correct, understanding that this is not typically something we have access to …

Code
par(mfrow=c(1,3))
library(car)
# Create QQ plot with 95% confidence bands
qqPlot(chr_race$clock_sec, main = "QQ Plot with Confidence Bands", envelope = 0.95)


qqPlot(run17$clock_sec, main = "QQ Plot with Confidence Bands (full data)", envelope = 0.95)

hist(run17$clock_sec, main = "Histrogram (full data)", xlab = "Time to complete race in seconds")

The normal qq-plot on the sample of 100 observations (left), the normal qq-plot on the entire data set of 19,961 runners (center), the histogram of the entire data set of 19,961 runners (right)

Comments

  • We can see that the normality assumption is indeed violated in this case; the population appears to be bimodal

  • Checking the assumptions is a very important step that should never be skipped in your own analyses.

  • While the \(t\)-test is quite a robust method, since the normality assumption is not met, we would be better to perform another alternative (e.g. nonparametric tests) or ensure we have an even larger sample size to ensure the CLT provides a good approximation.

Here are the summary statistics for these 100 randomly sampled runners. Note that the times are given in seconds.

\(n\) \(\bar{x}\) \(s\) minimum maximum
100 6436.64 1324.51 3273 9152

Homework

Conduct the appropriate hypothesis test for this question.

Exercise 3: Piano lessons

Georgianna claims that in a small city renowned for its music school, the average child takes less than 5 years of piano lessons. We have a random sample of 20 children from the city, with a mean of 4.6 years of piano lessons and a standard deviation of 2.2 years. Evaluate Georgianna’s claim using a formal hypothesis test.

Choosing a test

What test is most appropriate for this problem?

  • Here we are interested in the years of piano lessons in a small city, so we will be conducting a hypothesis test for the population mean \(\mu\).

  • Since we do not know the population standard deviation and need to estimated it using \(s\) (the sample standard deviation), the one-sample \(t\)-test for the mean is the most appropriate test.

Checking Assumptions:

What are the assumptions for using this test?

  • We will be assuming that the observations are a simple random sample from; i.e. observations are independent (e.g. not siblings, or kids from one school).

  • As we don’t have the raw data, we cannot assess for normality using qqnorm plots for example, so will assuming that the population from which we are sampling is normally distributed.

  • Since unspecified, we assume a significance-level of \(\alpha = 0.05\).

Confidence Interval for the mean

Construct a 95% confidence interval for the number of years students in this city take piano lessons, and interpret it in context of the data.

\[\begin{align*} \bar{x} &\pm t^* \times s/\sqrt{n}\\ 4.6 &\pm 2.093 \times 2.2/\sqrt{20}\\ 4.6 &\pm 2.093 \times 0.491935\\ 4.6 &\pm 1.03\text{ or } (3.570, 5.630)\\ \end{align*}\] We are 95% confident the average number of years students in this city take piano lessons is between 3.570 and 5.630.

Connection with two-sided test

Do your results from the hypothesis test and the confidence interval agree

  • From our knowledge about the connection between CI and their respective two-sided hypothesis test with the same level of \(\alpha\), we can simply check if the hypothesized (claimed) value of lies within the 95% CI.

  • Since 5 lies between 3.570 and 5.630, we know that the two-sided test will fail to reject the null. Let’s verify this …

Critical Value Method

P-value Method