stat205 – Finite Populations, and Choosing a Sample Size

Introduction

In this lecture, we will cover mathematical results related to the concepts introduced in the previous lecture.
We will also discuss an important Finite Population Correction which will be needed in some special circumstances.
Sample Size calculations.

Assumptions for the Sampling Distribution of \(\bar X\)

To use the sampling distribution of the sample mean \(\bar X\), we assume:

The data are obtained from a random sample.
One of the following holds:
- The population distribution is normal, or
- The sample size \(n\) is sufficiently large (CLT conditions).

Sampling distribution of \(\bar X\)

Under these assumptions the sample mean \(\bar X\) is approximately¹ Normal with mean \(\mu\) and a standard error \(\sigma_{\bar X} = \frac{\sigma}{\sqrt{n}}\).

Expected value of \(\bar X\)

Exercise 1 Let \(X_1, \dots, X_n\) be independent identically distributed random variables from the same distribution with mean \(\mu\) and variance \(\sigma^2\). Show that

\[\mu_{\bar X} = \mathbb{E}[\bar X] = \mu\]

Variance of \(\bar X\)

Let \(X_1, \dots, X_n\) be independent identically distributed random variables from the same distribution with mean \(\mu\) and variance \(\sigma^2\). Show that

\[\sigma^2_{\bar X} = \mathbb{Var}[\bar X] = \frac{\sigma^2}{n}\]

SWR

The standard error \(\sigma_{\bar X} = \frac{\sigma}{\sqrt{n}}\) assumes that samples are independent and and identically distributed (i.i.d)
i.i.d is guarantee by sampling with replacement.

A sampling with replacement (SWR) scheme. Source: Towards Data Science: An Introduction to Probability Sampling Methods.

SWOR

In practice, we often sample without replacement (SWOR)
SWOR induces dependence; this is negligible for large populations but violates i.i.d. in small finite populations.

Sampling with replacement (SWR) vs. sampling without replacement (SWOR). Source: twd

Large populations

SWR vs SWOR on a large population \(N = 100,000\). The dependence introduced by SWOR in large populations is negligible. Image generated by ChatGPT.

When SWR matters

If all three of the following conditions hold, a Finite Population Correction to the standard error calculation is needed \(\dots\)

Conditions for using the FPC (all must hold)

Finite population: population size \(N\) is known and fixed
SWOR: Each sampled unit is removed from the population
Large sampling fraction: A common rule of thumb, is when our sample comprises at least 5% of the population¹, i.e. when

\[n/N \geq 0.05\]

Finite Population Correction

Finite Population Correction (FPC)

Under the Conditions for using the FPC (all must hold), the following correction should be applied to the usual standard error calculation

\[ SE \sqrt{\frac{N-n}{N-1}} \]

where \(\sqrt{\frac{N-n}{N-1}}\) is called the finite population correction factor (FPC).

Note

When the sample size is small relative to the population size, FPC \(\approx 1\)

Calculating a FPC

Calculating a FPC (smaller sampling fraction)

Exercise 2 Find the value of the FPC for \(n\) = 30 and \(N = 300\).

Calculating a FPC

Note

Exercise 3 (Calculating a FPC (large sampling fraction)) Find the value of the FPC for \(n\) = 180 and \(N = 300\).

iClicker: FPC

FPC

For some fixed population size \(N\), as the sampling fraction \(n/N\) increases, what happens to the FPC?

It decreases toward 0
It increases toward 1
It stays constant
It depends

✅ Correct answer: A. The more of the population you see the less variability remains.

How big should my sample be?

One question that we should also think about is

“How many people do I need to survey?”

The is not guesswork.
We can determine an optimal size for the sample directly from the Confidence interval formula

Confidence Interval Formula

For a mean (with \(\sigma\) known)

\[ \bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]

The part that controls the width is:

\[ z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \]

Recall: this is called the margin of error (ME).

Determining an appropriate \(n\)

Instead of computing ME from \(n\)

we decide how much error we are willing to tolerate:

\[ ME = E \]

where \(E\) is the maximum acceptable error.

Determining an appropriate \(n\)

We decide on \(E\) and solve for \(n\)

\[ \begin{align} ME &= E\\ z_{\alpha/2}\frac{\sigma}{\sqrt{n}} &= E\\ \vdots \end{align} \]

What Controls Sample Size

\[ n = \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2 \]

Sample size gets larger when:

Confidence level increases (\(z_{\alpha/2}\) bigger)
Population variability increases (\(\sigma\) bigger)
Desired error gets smaller (\(E\) smaller)

Example

Commute Time

Example 1 A researcher wants to estimate the average commute time.

\(\sigma = 12\) minutes (from past data)
95% confidence \(\rightarrow z_{0.025} = 1.96\)
Wants estimate within 3.5 minutes.

Solution

\[ \begin{align} n &= \left(\frac{1.96 \times 12}{3.5}\right)^2\\ &= (6.72)^2\\ & = 45.1584 \end{align} \]

iClicker

iClicker

The sample size required to achieve a ME of 3.5 is: (choose the most correct answer)

45.1584
46
45
None of the above

Always Round Up

Always Round Up for Sample Size

When performing sample size calculations you will almost always get a decimal.

ALWAYS ROUND UP to the next whole number

Why? Rounding down would make the margin of error larger than the value you promised, meaning

\[ ME > E \] and your sample would no longer achieve the desired precision.

A Final Note

We derived FPC and sample size formula for a mean with \(\sigma\) known.
These were specific examples, but the underlying ideas are much more general.
The same reasoning can be applied to (for example):
- Means when \(\sigma\) is unknown (using \(s\) as an estimate),
- Population proportions,
- Other parameters with different confidence interval formulas \(\dots\)

References

Johnson, R. A., I. Miller, and J. E. Freund. 2000. Miller & Freund’s Probability and Statistics for Engineers. Prentice Hall. https://books.google.ca/books?id=yaxQPwAACAAJ.