STAT 205: Introduction to Mathematical Statistics
University of British Columbia Okanagan
In this lecture, we will cover mathematical results related to the concepts introduced in the previous lecture.
We will also discuss an important Finite Population Correction which will be needed in some special circumstances.
Sample Size calculations.
Assumptions for the Sampling Distribution of \(\bar X\)
To use the sampling distribution of the sample mean \(\bar X\), we assume:
Sampling distribution of \(\bar X\)
Under these assumptions the sample mean \(\bar X\) is approximately1 Normal with mean \(\mu\) and a standard error \(\sigma_{\bar X} = \frac{\sigma}{\sqrt{n}}\).
Expected value of \(\bar X\)
Exercise 1 Let \(X_1, \dots, X_n\) be independent identically distributed random variables from the same distribution with mean \(\mu\) and variance \(\sigma^2\). Show that
\[\mu_{\bar X} = \mathbb{E}[\bar X] = \mu\]
Variance of \(\bar X\)
Let \(X_1, \dots, X_n\) be independent identically distributed random variables from the same distribution with mean \(\mu\) and variance \(\sigma^2\). Show that
\[\sigma^2_{\bar X} = \mathbb{Var}[\bar X] = \frac{\sigma^2}{n}\]
The standard error \(\sigma_{\bar X} = \frac{\sigma}{\sqrt{n}}\) assumes that samples are independent and and identically distributed (i.i.d)
i.i.d is guarantee by sampling with replacement.
A sampling with replacement (SWR) scheme. Source: Towards Data Science: An Introduction to Probability Sampling Methods.
In practice, we often sample without replacement (SWOR)
SWOR induces dependence; this is negligible for large populations but violates i.i.d. in small finite populations.
Sampling with replacement (SWR) vs. sampling without replacement (SWOR). Source: twd
SWR vs SWOR on a large population \(N = 100,000\). The dependence introduced by SWOR in large populations is negligible. Image generated by ChatGPT.
If all three of the following conditions hold, a Finite Population Correction to the standard error calculation is needed \(\dots\)
Conditions for using the FPC (all must hold)
Finite population: population size \(N\) is known and fixed
SWOR: Each sampled unit is removed from the population
Large sampling fraction: A common rule of thumb, is when our sample comprises at least 5% of the population1, i.e. when
\[n/N \geq 0.05\]
Finite Population Correction (FPC)
Under the Conditions for using the FPC (all must hold), the following correction should be applied to the usual standard error calculation
\[ SE \sqrt{\frac{N-n}{N-1}} \]
where \(\sqrt{\frac{N-n}{N-1}}\) is called the finite population correction factor (FPC).
Note
When the sample size is small relative to the population size, FPC \(\approx 1\)
Calculating a FPC (smaller sampling fraction)
Exercise 2 Find the value of the FPC for \(n\) = 30 and \(N = 300\).
Note
Exercise 3 (Calculating a FPC (large sampling fraction)) Find the value of the FPC for \(n\) = 180 and \(N = 300\).
FPC
For some fixed population size \(N\), as the sampling fraction \(n/N\) increases, what happens to the FPC?
✅ Correct answer: A. The more of the population you see the less variability remains.
One question that we should also think about is
“How many people do I need to survey?”
The is not guesswork.
We can determine an optimal size for the sample directly from the Confidence interval formula
For a mean (with \(\sigma\) known)
\[ \bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]
The part that controls the width is:
\[ z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]
Recall: this is called the margin of error (ME).
Instead of computing ME from \(n\)
we decide how much error we are willing to tolerate:
\[ ME = E \]
where \(E\) is the maximum acceptable error.
We decide on \(E\) and solve for \(n\)
\[ \begin{align} ME &= E\\ z_{\alpha/2}\frac{\sigma}{\sqrt{n}} &= E\\ \vdots \end{align} \]
\[ n = \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 \]
Sample size gets larger when:
Commute Time
Example 1 A researcher wants to estimate the average commute time.
\[ \begin{align} n &= \left(\frac{1.96 \times 12}{3.5}\right)^2\\ &= (6.72)^2\\ & = 45.1584 \end{align} \]
iClicker
The sample size required to achieve a ME of 3.5 is: (choose the most correct answer)
45.1584
46
45
None of the above
Always Round Up for Sample Size
When performing sample size calculations you will almost always get a decimal.
ALWAYS ROUND UP to the next whole number
Why? Rounding down would make the margin of error larger than the value you promised, meaning
\[ ME > E \] and your sample would no longer achieve the desired precision.
We derived FPC and sample size formula for a mean with \(\sigma\) known.
These were specific examples, but the underlying ideas are much more general.
The same reasoning can be applied to (for example):
Means when \(\sigma\) is unknown (using \(s\) as an estimate),
Population proportions,
Other parameters with different confidence interval formulas \(\dots\)