stat205 – Confidence Intervals for the Mean

Recall

Sampling Distribution of the Sample Mean

Let \(X_1,\dots,X_n\) be a random sample from a population with mean \(\mu\) and variance \(\sigma^2\). The sample mean defined as \(\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i\) has

\[\mathbb{E}(\bar{X})=\mu\]

\(\mathrm{Var}(\bar{X})=\frac{\sigma^2}{n}\)

\(\mathrm{SD}(\bar{X})=\frac{\sigma}{\sqrt{n}}\) \(\leftarrow\) we call this the standard error

If the population is normal or \(n\) is sufficiently large, then
\[ \bar{X}\sim\mathcal{N}\!\left(\mu,\frac{\sigma^2}{n}\right) \]

This distribution is called the sampling distribution of the sample mean.

Introduction

Last lecture, we studied the sampling distribution of the sample mean \(\bar X\).
Today, we will see how to construct a confidence interval (CI) for the population mean \(\mu\) based on that sampling distribution.
For simplicity, we assume (for now), that are sampling from a normal population with known standard deviation \(\sigma\).

The sampling distribution of \(\bar X\) assuming samples are drawn from a normal population with mean 100 and variance 5.

Point Estimation

In most real-world settings, the population mean \(\mu\) is unknown and we rely on sample data to learn about it.
The sample mean \(\bar X\) is a point estimate of \(\mu\).
A point estimate gives a single best guess, but provides no information about the uncertainty associated with this guess.
We can do better than a single number.

From a Point Estimate to an Interval

Rather than reporting a single number, we report a best guess together with a plus–minus.
This plus-minus is called a margin of error (ME)
The ME tells us how uncertain we are about our estimate.
- A large ME means there is more uncertainty.
- A small ME means the estimate is more precise.

Review

The sampling distribution of the sample mean describes the distribution of \(\bar X\) across repeated samples drawn from the same population.
In practice, however, we usually observe only one sample.
The observed value of the statistic \(\bar X\) can be viewed as one realization from its sampling distribution.
Since the sampling distribution tells us how much the \(\bar X\) varies from sample to sample, we can use it to construct an interval that likely contains \(\mu\).

Confidence Interval

❓ What is a confidence interval?

A confidence interval (CI) provides an interval or range of plausible values for the population parameter.

It is constructed using a sample statistic and its sampling distribution.
CI provides some prescribed degree of confidence (C) of securing the true parameter (typically 90%, 95%¹, or 99%)

Comments

\(L\) and \(U\) will be used to denote¹ the lower and upper confidence limits, respectively.

For this interval \((L, U)\) to be useful,

the true population parameter \(\mu\) should have a good chance of falling between \(L\) and \(U\)
the interval should be relatively narrow

Garfield Comic Strip

Form of a confidence interval

The general form of a confidence interval (CI) in this unit:

\[\text{point estimate} \pm \text{(Margin of Error)}\]

\[ \begin{align} \text{or } \big[\text{point est.} - \text{ME}&,\ \text{point est.} + \text{ME}\big]\\ \big[L&, U\big] \quad\quad \text{where }L <U \end{align} \]

For these types of CI,

Point estimate = center of the interval
CI width = \(U - L = 2 \times \text{ME}\)

Point Estimators

In general, let \(\theta\) be some population parameter of interest.
A point estimator of \(\theta\) is a statistic, is denoted by \(\hat{\theta}\).
The “hat” notation, reminds us that \(\hat{\theta}\) is sample-based estimate computed from the data which distinguishs from the population parameter which is fixed and unknown.

Multiple Estimators

In general, there may be many possible estimators for a parameter.
For example, estimators of the population mean \(\mu\) include:
- \(\hat \mu\) = the sample mean \(\bar X\),
- \(\hat \mu\) = the sample median,
- \(\hat \mu\) = trimmed means, etc.
The sample mean \(\bar X\) is especially appealing because it is unbiased for \(\mu\).

Unbiased Estimator

\(\bar X\) is a unbiased estimator for \(\mu\)
\(\bar x\) is a point estimate¹ for \(\mu\)

Unbiased Estimator

A statistic \(\hat \theta\) is said to be an unbiased estimator, or its value an unbiased estimate, of \(\theta\) if and only if:

\[ \mathbb{E}[\hat \theta]= \theta \]

From Point Estimates to Intervals

We know that when we use \(\bar X\) to estimate \(\mu\), the estimate will almost surely be wrong, i.e. \(\Pr(\bar X = \mu) = 0\)
To examine this error, recall that for large \(n\)

\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \]

Standard Normal Distribution

Figure 1: The sampling distribution of \(Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1)\)

Standard Normal Distribution

Figure 2: The sampling distribution of \(Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1)\)

Sampling Distribution Probabilities

As shown in Figure 1, we can assert the following:

\[ \begin{align} \Pr(-z_{\alpha/2} < Z < z_{\alpha/2}) &= 1- \alpha\\ \Pr(-z_{\alpha/2} < \frac{\bar X - \mu}{\sigma/\sqrt{n}} < z_{\alpha/2}) &= 1- \alpha \end{align} \]

where \(z_{\alpha/2}\) represents the \(z\)-score such cuts off an area of \(\alpha/2\) in the upper-tail of the standard normal curve.

Rearranging this ineqaulity we can write …

\[ \begin{align} &\Pr\left(-z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} < \bar X - \mu < z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \right) = 1- \alpha\\ &\Pr\left(- \bar X -z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} < - \mu < - \bar X + z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \right)= 1- \alpha\\ &\Pr\left(\bar X + z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} > \mu > \bar X - z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \right)= 1- \alpha\\ &\Pr\left( \bar X - z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} < \mu < \bar X + z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \right)= 1- \alpha \end{align} \]

From Probabilities to Confidence

The previous probability statement describes how the random variable \(\bar X\) behaves across repeated samples.
Once we observe a sample and compute \(\bar x\), the resulting interval is fixed (no more randomness/probability).
Since the parameter is also considered fixed, there is no more randomness associate with this interval and it either contains the true population parameter or it doesn’t.

Interpreted through repeated sampling: If we repeatedly sample from the population, we would expect \((1-\alpha)100\)% of the confidence interval to contain \(\mu\).

Interpreting a CI

Correct Interpretation

✅ Over repeated sampling, this confidence interval procedure is expected to produce intervals that contain \(\mu\) about \((1-\alpha)100\%\) of the time.

✅ We are \((1-\alpha)100\%\) confident that \(\mu\) lies in the \((1-\alpha)100\%\) CI.

e.g. we are 95 percent confident that the true population mean lies within \(1.96\cdot \frac{\sigma}{\sqrt{n}}\) of the observed sample mean.

Incorrect Interpretations

❌ We are confident that the interval contains the true population parameter \(\mu\).

❌ There is a \((1-\alpha)100\%\) probability that \(\mu\) lies in this interval.

Large Sample Confidence Intervals

Large Sample Confidence Intervals

For large (\(n \geq 30\)) random samples from a population with mean \(\mu\) and variance \(\sigma^2\) a \((1- \alpha)100\)% confidence interval for \(\mu\) is given by:

\[ \bar x \pm \underbrace{z_{\alpha/2}\cdot \frac{\sigma}{\sqrt{n}}}_{\text{Margin of Error}} \]

where \(z_{\alpha/2}\) is the value such that \(\Pr(-z_{\alpha/2} < Z< z_{\alpha/2}) = 1-\alpha\).
\(z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\) is the margin of error.

Confidence and Significance Level

The confidence level \(C = (1 - \alpha)100\)% describes the reliability of the CI procedure.
The quantity \(\alpha\) is called the significance level.
We can choose the confidence level to suit the situation. Common choices include:
- \(C = 90\% \implies \alpha = 0.10\)
- \(C = 95\% \implies \alpha = 0.05\) \(\leftarrow\) most common
- \(C = 99\% \implies \alpha = 0.01\)

Calculating a 95% CI

Constructing a 95% confidence interval

Exercise 1 A random sample of size \(n\) = 100 has a sample mean of \(\bar x\) = 21.6. Assuming the population standard deviation is know to be \(\sigma =\) 5.1, construct a 95% confidence interval for \(\mu\).

Finding \(z_{\alpha/2}\)

If we are constructing a 95% CI, this implies \(\alpha = 0.05\) (since (1 - 0.05)*100 = 95).

Hence we need to find \(z_{\alpha/2} = z_{0.05/2} = z_{0.025}\)

We can either compute this in R, or from our Z-table

Using R

To find the \(z_{\alpha/2}\) in R, you can use the qnorm() function
There will be several ways to get at \(z_{\alpha/2}\)
Choose which ever way makes the most sense to you

Tip

I highly recommend that you draw the Normal curve to help you visualize.

qnorm

qnorm() is the quantile function for the normal distribution with mean equal to mean and standard deviation equal to sd.

qnorm(p, mean = 0, sd = 1, lower.tail = TRUE)

p vector (or single value) of probabilities
mean (default value is 0)
sd (default value is 1)
lower.tail logical; if TRUE (default), probabilities are \(\Pr(X \leq x)\), otherwise \(\Pr(X > x)\)

Warning

\(z_{\alpha}\) denotes the upper-tail \(z\)-scores of the standard normal distribution, i.e. \(\Pr(Z > z_{\alpha})= \alpha\) where \(Z \sim N(0,1)\)

The default of qnorm(p) is to find lower-tail quantiles associated with lower-tail probabilities.

Finding \(z_{\alpha/2}\) and -\(z_{\alpha/2}\)

For a 95% confidence interval, we set \(\alpha = 0.05\), so we need the cutoff \(z_{0.025}\).
The value \(-z_{0.025}\) is just the same number with a minus sign in front.
Because the standard normal curve is symmetric around zero, we don’t need to find both numbers.
Either value can be used to construct the interval, so compute the one you find easiest.

Option 1: find \(z_{\alpha/2}\)

Find the \(z_{\alpha/2} = z_{0.025}\) by finding the \(z\)-score that that gives 2.5% in the upper-tail:

qnorm(0.025, lower.tail = FALSE)

[1] 1.959964

which is equivalent to

qnorm(p = 0.025, 
      mean = 0,
      sd = 1, 
      lower.tail = FALSE)

[1] 1.959964

*The z-score yeilding 2.5% in the upper-tail.*

Option 2: find \(z_{\alpha/2}\)

Find the \(z_{\alpha/2} = z_{0.025}\) by finding the \(z\)-score that that gives 97.5% in the lower-tail:

qnorm(0.975)

[1] 1.959964

which is equivalent to

qnorm(p = 0.975, 
      mean = 0,
      sd = 1, 
      lower.tail = TRUE)

[1] 1.959964

*The z-score yeilding 97.5% in the lower-tail.*

Option 3: -find \(z_{\alpha/2}\)

Find the \(-z_{\alpha/2} = z_{1-\alpha/2} = z_{0.975}\) that gives 97.5% in the upper-tail:

qnorm(0.975, lower.tail = FALSE)

[1] -1.959964

which is equivalent to

qnorm(p = 0.975, 
      mean = 0,
      sd = 1, 
      lower.tail = FALSE)

[1] -1.959964

*The z-score yeilding 97.5% in the upper-tail.*

Option 4: -find \(z_{\alpha/2}\)

Find the \(-z_{\alpha/2} = z_{1-\alpha/2} = z_{0.025}\) that gives 2.5% in the lower-tail:

qnorm(0.025)

[1] -1.959964

which is equivalent to

qnorm(p = 0.025, 
      mean = 0,
      sd = 1, 
      lower.tail = TRUE)

[1] -1.959964

*The z-score yeilding 2.5% in the lower-tail.*

Using the Tables

Since our Z-table gives lower-tail probabilities, it will make sense to either find

\(\Pr(Z < -z_{0.025}) = 0.025\)

\(\Pr(Z < z_{0.025}) = 1- 0.025 = 0.975\)

Notice how the table gives lower-tail probabilities.

\[ \begin{align} \Pr(Z < -z_{0.025}) &= 0.025 \\ &\phantom{= 0.975} \end{align} \]

\[ \begin{align} \Pr(Z < -z_{0.025}) &= 0.025 \\ & \phantom{= 0.975} \end{align} \]

\[ \begin{align} \Pr(Z < -z_{0.025}) &= 0.025 \\ & \phantom{=0.975} \end{align} \]

\[ \begin{align} \Pr(Z < z_{0.025}) &= 1- 0.025 \\ &= 0.975 \end{align} \]

Constructing 95% CI for the mean

The form of a 95% CI for \(\mu\) is therefore given by:

\[ \begin{align} \text{point estimate} &\pm \textcolor{red}{\boxed{\text{Margin of Error}}}\\ \bar x &\pm \textcolor{red}{\boxed{z_{0.025} \times \sigma_{\bar x}}} \phantom{asdflj}\\ \bar x &\pm \textcolor{red}{\boxed{1.96 \times \sigma/\sqrt{n}}}\\ \end{align} \]

where \(\sigma_{\bar X}\) is the standard error (SE) of the point estimate.

iClicker: finding \(z_{\alpha/2}\)

What is the \(z_{\alpha/2}\) for a 98% confidence interval?

2.33
2.05
2.58
None of the above

Warning

Warning

For this CI to be exact, we must have the following:

The sample data must be a simple random sample from the population of interest
The population must be normally distributed

When it is not guaranteed that our population is normal, this interval is approximate for large \(n\) (thanks to the CLT)

iClicker

iClicker

Exercise 2 (Margin of Error) Suppose a confidence interval for a population mean has the form \([a, b]\). Which of the following is the margin of error?

\(b -a\)
\((a+b)/2\)
\((b-a)/2\)
\(a\)

iClicker

iClicker

Exercise 3 (Margin of Error) What does a larger margin of error indicate?

The estimate is more accurate
The population mean is larger
The sample mean is biased
There is more uncertainty in the estimate

iClicker

iClicker

Exercise 4 (Margin of Error) Two confidence intervals are centered at the same point estimate. One is wider than the other. Which statement is true?

The wider interval has a larger margin of error
The wider interval has a smaller margin of error
Both intervals have the same margin of error
Width and margin of error are unrelated

Example: Cereal

Let’s consider a subset of cereal from the Breakfast Cereal data from Kaggle.

Code

cereal <- read.csv("http://irene.vrbik.ok.ubc.ca/data/cereal-subset.csv")
cereal$mfr      <- factor(cereal$mfr)
cereal$type     <- factor(cereal$type)
cereal$shelf    <- factor(cereal$shelf)
cereal$vitamins <- factor(cereal$vitamins)

Cereal calories

Exercise 5 Based on this sample, a nutrition researcher wants to estimate the mean number of calories per serving for breakfast cereals. Given the sample size of \(n\) = 12 and the sample mean of \(\bar x\) = 109.17 compute a 98% confidence interval for \(\mu\). You may assume the population standard deviation is \(\sigma = 19.8\) calories

Small Sample Size

Because this is a small sample size, we cannot rely on the CLT. Before we take our CI we need to as ourselves if its reasonable to assuming a normal population.

Checking for Normality

While formal tests for normality exist, for now we will graphical checks.

Histogram

Look for a bell-shaped, symmetric distribution
Mild skewness is usually fine for large samples

Normal Q–Q Plot

If points fall roughly along a straight line → normality is reasonable
Systematic curves or strong deviations → non-normality

📌 In practice, Q–Q plots are preferred over histograms.

Visual Checks for Normality

Histogram

Code

par(mar = c(4,2,0,0))
hist(cereal$calories, main = "", xlab = "Calories")

Normal Q–Q Plot

Code

par(mar = c(4,2,1,0))
qqnorm(cereal$calories) # normal QQ plot
qqline(cereal$calories) # Reference line: points should fall roughly along this line if normality is reasonable

QQ-plot with bands

The Q–Q plot on the right includes shaded bands showing typical sampling variation under normality.

✅ Because the points fall within these bands, the normality assumption appears reasonable

Code

library(qqplotr)
library(ggplot2)

ggplot(cereal, aes(sample = calories)) +
  stat_qq_band(conf = 0.95) +
  stat_qq_line() +
  stat_qq_point() +
  labs(
    title = "Normal Q–Q Plot with Confidence Bands",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  ) +
  theme_minimal(base_size = 16)

QQ-plot produced using **ggplot** and **qqplotr**

Solution

Interpretation

We are 98% confident that the true mean calories per serving in breakfast cereal lies somewhere between 95.9 and 122.5

Interpretation matters

Confidence intervals must be interpreted in context. Generic statements such as

We are 98% confident that \(\mu\) lies within this interval.

without referencing the population and units will receive partial credit or no credit on tests and assignments.

Wrong interpretation 1

❌ We are 98% confident that the sample mean calories per serving falls within this interval.

Why this is wrong

The CI is about the population mean \(\mu\), NOT a statistic (\(\bar X\))
The sample mean \(\bar x\) is already known and fixed

Wrong interpretation 2

❌ There is a 98% probability that the true mean calories per serving lies in this interval.

Why this is wrong

Once the interval is computer, it is fixed. Probability refers to the procedure, not this particular interval.

Wrong interpretation 3

❌ 98% of breakfast cereals have calorie counts that fall within 95.9 and 122.5 calories.

Why this is wrong

A confidence interval for the mean does not describe the distribution of individual observations or the proportion of the population within the interval.

Non-symmetric CI

The CI we have just constructed were symmetric CI that splies \(\alpha\) evenly between the two tails.
This is not a requirement, however, it is the most common.
Consider instead if we split a 5% significant level to have 2% in the lower tail and 3% in the upper tail…

Symmetric vs Non-symmetric CI

Q: What would be an advantage of using the symmetric confidence interval on the left over the non-symmetric confidence interval on the right?

Symmetric vs Non-symmetric CI

Symmetric vs Non-symmetric CI

Symmetric vs Non-symmetric CI

Symmetric vs Non-symmetric CI

Symmetric vs Non-symmetric CI

One-side CI

In the most extreme case, we place all of \(\alpha\) in one tail resulting in a one-sided confidence interval

There are two natural versions:

Lower one-sided CI: \([z_\alpha, \infty)\)
Upper one-sided CI: \((-\infty, z_{1-\alpha}]\)

In both cases, the confidence interval has infinite width.