STAT 205: Introduction to Mathematical Statistics
University of British Columbia Okanagan
March 15, 2024
So far we have covered two hypothesis tests for a single sample:
The distribution of the test statistics associated with each of these tests is the standard normal. For obvious reasons, these are referred to \(Z\)-tests.
Test statistic for \(\mu\) with \(\sigma\) known
\(H_0: \mu = \mu_0 \quad \text{vs} \quad H_A: \mu \neq \mu_0\)
\[ \dfrac{\bar{X} - \mu_0}{\dfrac{\sigma}{\sqrt{n}}} \sim N(0,1) \\ \]
Test statistic for \(p\)
\(H_0: p = p_0 \quad \text{vs} \quad H_A: p \neq p_0\)
\[ \dfrac{\hat p - p_0}{\sqrt{ \dfrac{ p_0 ( 1 - p_0 )}{n} }} \sim N(0,1) \]
\[ \begin{equation} \text{test statistic} = \dfrac{\text{point estimator} - \text{null value}}{SE(\text{point estimator})} \end{equation} \]
From our theoretical knowledge of the sampling distribution, we know that test statistic values of 0 are more likely.
On the flip side, we know that values of 1.96 or more (or -1.96 or less) only have a 5% chance of occurring by chance.
From our theoretical knowledge of the sampling distribution, we know that test statistic values of 0 are more likely.
On the flip side, we know that values of 1.96 or more (or -1.96 or less) only have a 5% chance of occurring by chance.
For a two-sided hypothesis test at \(\alpha = 0.05\), we fail to reject the hypothesis for all test statistics between -1.96 and 1.96 (teal balls)
More generally, we will reject any null hypothesis for which the test statistic that in our rejection region defined in red below
It is determined by:
Critical Value approach for a single population mean
State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]
Find critical value:
\[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: \mu \neq \mu_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: \mu < \mu_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: \mu > \mu_0 \end{cases}\]Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)
Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).
Alternatively we could make decisions by comparing the \(p\)-value with the significance-level \(\alpha\).
The \(p\)-value is how likely it is to observe a test statistics more extreme than the one we observed and compare this probability
If \(p\)-value \(<\alpha\), we reject \(H_0\)
P-value approach for a single population mean
State hypotheses \[\begin{equation} H_0 : \mu = \mu_0 \quad \text{ vs } \quad H_A: \begin{cases} \mu \neq \mu_0& \text{ two-sided test} \\ \mu < \mu_0&\text{ one-sided (lower-tail) test} \\ \mu > \mu_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]
Compute the test statistic \(z_{obs} = \dfrac{\bar x - \mu_0}{\sigma/\sqrt{n}}.\)
Calculate the \(p\)_value
\[\begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: p \neq p_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: p < p_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: p > p_0 \end{cases}\]Conclusion: reject \(H_0\) if if \(p\)-value is less than \(\alpha\) (typically 0.05), otherwise, fail to reject \(H_0\).
Notice that the test statistic used for hypothesis test for the population mean \(\mu\) requires that \(\sigma\) is known.
However, it is more common that \(\sigma\) is unknown and that we need to estimate \(\sigma\) from the sample data.
In this case, we use the \(t\)-test whose test statistic follows a t-distribution with \(n-1\) degrees of freedom. \[\begin{equation} \dfrac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim t_{\nu = n-1} \end{equation}\]
The concepts from \(Z\)-tests and CIs carry over to \(t\)-tests and CIs using the t-distribution in place of the standard normal.
Even when \(\sigma\) is known, this test is also useful for the \(t\)-test is more robust to the variability of smaller samples.
For large samples, the distinction between assuming \(\sigma\) is known or unknown becomes less critical since \(s\) tends to be a good estimate of \(\sigma\)
Since the \(t\)-distribution approaches the standard normal when \(n\) gets large, some would argue to always use the \(t\) over the \(Z\)-test.
Assumptions for a \(Z\)-test for population mean
Same as the CLT: independence of sample observations, and sufficiently large sample size (typically \(n >30\)). When a sample is small, we also require Normality, i.e. that the sample observations come from a normally distributed population.
Assumptions for Z-test for proportions
Exercise 1: Risso dolphins
Elevated mercury concentrations in marine mammals, such as Risso’s dolphins, pose significant health risks to both the animals themselves and humans who consume them. In a study conducted in the Taiji area of Japan, researchers investigated the mercury content in the muscle tissue of 19 randomly sampled Risso’s dolphins. The summary statistics of the mercury content, measured in micrograms of mercury per wet gram of muscle (μg/wet g), are
\(n\) | \(\bar{x}\) | \(s\) | minimum | maximum |
---|---|---|---|---|
19 | 4.4 | 2.3 | 1.7 | 9.2 |
Checking Assumptions:
Are the independence and normality conditions satisfied for this data set?
The observations are a simple random sample, therefore independence is reasonable.
As we don’t have the raw data, we cannot assess for normality using qqnorm plots for example.
In lieu of this, we can verify that the summary statistics do not suggest any clear outliers, since all observations are within 2.5 standard deviations of the mean.
Based on this evidence, the normality condition seems reasonable.
Standard error calculations
Using the summary statistics, compute the standard error for the average mercury content in the \(n\) = 19 dolphins.
We plug in \(s\) and \(n\) into the formula: \[\begin{align} SE = s/\sqrt{n} = 2.3/\sqrt{19} = 0.5276562 \approx 0.528. \end{align}\]
Confidence Interval
Compute and interpret the 95% confidence interval for the average mercury content in Risso’s dolphins.
Recall the formula for CI for the mean in our previous unit: \[\begin{align*} \bar{x} \pm z^* \times SE \end{align*}\] We replace \(z^*\) by \(t^*\) which can be found in a similar was as before, only now we reference the \(t\)-distribution.
In other words, we need to find the \(t^*\) that satisifies the following expression: \[\begin{align} \Pr(-t^* < t_{n-1} < t^*) = 1-\alpha \end{align}\]
In R this is found using: (you should verify this on the t-table)
Returning to our CI, we now have: \[\begin{align*} \bar{x} &\pm t^* \times s/\sqrt{n}\\ 4.4 &\pm 2.100922 \times 0.5276562\\ 4.4 &\pm 1.1085645 \end{align*}\]
The last line represent the point estimate (\(\bar{x}\)) plus/minus the margin of error. This is a perfectly legitimate (and common) way of expressing a CI. Alternatively we could express it as: \[\begin{equation} (3.291, 5.509) \end{equation}\]
We are 95% confident the average mercury content of muscles in Risso’s dolphins is between 3.29 and 5.51 \(\mu\)g/wet gram, which is considered extremely high.
Recall that we can also make decisions based on (\(1 - \alpha\))% CI for a two-tailed hypothesis with significance level \(\alpha\).
For this example, if we are testing at \(\alpha = 0.05\) with \(H_0: \mu = \mu_0\) vs. \(H_A: \mu \neq \mu_0\)
Hence, all values of \(\mu_0\) outside the interval (3.291, 5.509) will lead to a significant \(p\)-value.
To put another way, \(\mu_0\) values outside the interval (3.291, 5.509) will yield a \(t_{obs}\)1 that falls in the rejection region.
Connection between Confidence Intervals and Two-Sided Hypothesis Tests
Consider a two-sided hypothesis test at a significance level of \(\alpha\) \[\begin{align} H_0&: \mu = \mu_0 & H_A&: \mu \neq \mu_0 \end{align}\]
Suppose we construct the (1- \(\alpha\))% confidence interval (CI) for \(\mu\), using \[\begin{align} \bar x \pm t^* \times s/\sqrt{n} \end{align}\]
where \(t^*\) is found \(\Pr(-t^* < t_{n-1} < t^*) = 1-\alpha\), where \(t_{n-1}\) follows a \(t\)-distribution with degrees of freedom (\(df\) or \(\nu\)) equal to the sample size minus 1.
If \(\mu_0\) falls within the 100(1 - \(\alpha\))% CI, we do not have sufficient evidence to reject \(H_0\).
If \(\mu_0\) falls outside the 100(1 - \(\alpha\))% CI, we have sufficient evidence to reject \(H_0\).
You may be tempted to round intermediate results to a certain number of decimal places before proceeding to the next step to simplify the process.
When doing these calculations in R, you should keep all digits (which you should be storing to an object) and refrain from rounding until the final step
Failure to do so could introduce a degree of error, which may carry over into subsequent calculations or final results.
Exercise 2: Cherry Blossom Race
The Cherry Blossom Race, held annually, attracts participants from various backgrounds and fitness levels. In 2006, the average time for all runners who completed the race was recorded as 93.29 minutes (5597 seconds). To investigate whether there has been a significant change in the average race time since then, data from 100 participants in the 2017 Cherry Blossom Race were randomly selected.
Using the data from the 2017 race, conduct a hypothesis test to determine whether runners in the 2017 race are getting faster or slower compared to the 2006 race.
Choosing a test
What test is most appropriate for this problem?
Here we are interested in the mean race time, so we will be conducting a hypothesis test for the population mean \(\mu\).
Since we do not know the population standard deviation and need to estimated it using \(s\) (the sample standard deviation), the one-sample \(t\)-test for the mean is the most appropriate test.
Testing assumptions
Are the assumptions for using this test met?
✅ The data come from a simple random sample of all participants, so the observations are independent.
✅ There are more than 30 observations in our sample, hence we can rely on the CLT to approximate the sampling distribution of \(\bar{X}\) with the normal distribution.
If the sample size was less than 30, we would need to check for normality. While there are formal test for such thing (eg. Shapiro-Wilk Test), we can use visual checks
Plot a histogram of the data and visually inspect whether the distribution resembles a bell curve.
Normal QQ Plot: If the points fall approximately along a diagonal line, the data is likely normally distributed.
Box Plot: Plot a box plot of the data and check for symmetry and the presence of outliers.
❌ Examining the plots in Figure 1, we see the data is approximately bell-shaped, however the are a number of outliers, and departures from the expected pattern in the QQ plot, it is reasonable to question the normality assumption.
Warning
Conducting a test when the assumptions are not met raises doubts about the validity of statistical inference and the generalizability of findings to a broader population.
library(openintro)
library(dplyr)
n = 100
# to place the plots side-by-side in a 1x3 grid
par(mfrow=c(1,3))
# Randomly sample 100 runners from the run17 data set
set.seed(17)
chr_race <- run17 %>%
filter(event == "10 Mile") %>%
sample_n(100, replace = FALSE)
hist(chr_race$clock_sec, xlab="Time to complete the race (in seconds)", main = "", breaks = 20)
boxplot(chr_race$clock_sec)
qqnorm(chr_race$clock_sec); qqline(chr_race$clock_sec)
Since we are actually randomly sampling 100 observations from the full population (all 19,961 runners in the 2017 Cherry Blossom Run), we can actually plot the population distribution.
For fun, let’s investigate if our suspicion of non-normality is correct, understanding that this is not typically something we have access to …
par(mfrow=c(1,3))
library(car)
# Create QQ plot with 95% confidence bands
qqPlot(chr_race$clock_sec, main = "QQ Plot with Confidence Bands", envelope = 0.95)
qqPlot(run17$clock_sec, main = "QQ Plot with Confidence Bands (full data)", envelope = 0.95)
hist(run17$clock_sec, main = "Histrogram (full data)", xlab = "Time to complete race in seconds")
Here are the summary statistics for these 100 randomly sampled runners. Note that the times are given in seconds.
\(n\) | \(\bar{x}\) | \(s\) | minimum | maximum |
---|---|---|---|---|
100 | 6436.64 | 1324.51 | 3273 | 9152 |
Homework
Conduct the appropriate hypothesis test for this question.
Exercise 3: Piano lessons
Georgianna claims that in a small city renowned for its music school, the average child takes less than 5 years of piano lessons. We have a random sample of 20 children from the city, with a mean of 4.6 years of piano lessons and a standard deviation of 2.2 years. Evaluate Georgianna’s claim using a formal hypothesis test.
Choosing a test
What test is most appropriate for this problem?
Here we are interested in the years of piano lessons in a small city, so we will be conducting a hypothesis test for the population mean \(\mu\).
Since we do not know the population standard deviation and need to estimated it using \(s\) (the sample standard deviation), the one-sample \(t\)-test for the mean is the most appropriate test.
Checking Assumptions:
What are the assumptions for using this test?
We will be assuming that the observations are a simple random sample from; i.e. observations are independent (e.g. not siblings, or kids from one school).
As we don’t have the raw data, we cannot assess for normality using qqnorm plots for example, so will assuming that the population from which we are sampling is normally distributed.
Since unspecified, we assume a significance-level of \(\alpha = 0.05\).
Confidence Interval for the mean
Construct a 95% confidence interval for the number of years students in this city take piano lessons, and interpret it in context of the data.
\[\begin{align*} \bar{x} &\pm t^* \times s/\sqrt{n}\\ 4.6 &\pm 2.093 \times 2.2/\sqrt{20}\\ 4.6 &\pm 2.093 \times 0.491935\\ 4.6 &\pm 1.03\text{ or } (3.570, 5.630)\\ \end{align*}\] We are 95% confident the average number of years students in this city take piano lessons is between 3.570 and 5.630.
Connection with two-sided test
Do your results from the hypothesis test and the confidence interval agree
From our knowledge about the connection between CI and their respective two-sided hypothesis test with the same level of \(\alpha\), we can simply check if the hypothesized (claimed) value of lies within the 95% CI.
Since 5 lies between 3.570 and 5.630, we know that the two-sided test will fail to reject the null. Let’s verify this …
Comments
We can see that the normality assumption is indeed violated in this case; the population appears to be bimodal
Checking the assumptions is a very important step that should never be skipped in your own analyses.
While the \(t\)-test is quite a robust method, since the normality assumption is not met, we would be better to perform another alternative (e.g. nonparametric tests) or ensure we have an even larger sample size to ensure the CLT provides a good approximation.