Statistical Inference for Proportions

STAT 205: Introduction to Mathematical Statistics

Dr. Irene Vrbik

University of British Columbia Okanagan

Introduction

So far, we have done inference for population means \(\mu\).

The exact same ideas extend to population proportions.

Population Sample Statistic Sampling Distribution
\(\mu\) \(\bar X\) \(\bar X \sim N(\mu, \sigma/\sqrt{n})\)
\(p\) \(\hat p\) \(\hat p \sim ?\)

Sampling Distribution of \(\hat p\)

What is the sampling distribution of \(\hat p\)?

Let’s look at an example to help us conceptualize.

Proportion of children diagnosed with ADHD

Suppose we are interested in the proportion of children in our region who have been diagnosed with ADHD.

Example: ADHD

Two possible outcomes:

  • ✅ the child has been diagnosed with ADHD

  • ❌ the child has NOT been diagnosed with ADHD

Binary outcome:

  • ✅ 1

  • ❌ 0

Population Distribution

We can denote random variable

\[ X_i = \begin{cases} 1 & \text{ADHD diagnosis}\\ 0 & \text{No diagnosis} \end{cases} \] If we take a simple random sample of size \(n\) our estimated proportion is:

\[ \hat p = \frac{X_1 + X_2 + \dots + X_n}{n} \phantom{ = \bar X} \]

Population Distribution

We can denote random variable

\[ X_i = \begin{cases} 1 & \text{ADHD diagnosis}\\ 0 & \text{No diagnosis} \end{cases} \] If we take a simple random sample of size \(n\) our estimated proportion is:

\[ \hat p = \frac{X_1 + X_2 + \dots + X_n}{n} = \bar X \]

CLT callback

Under certain conditions, the CLT tells us that

  • \(\bar X\) is normally distributed with
  • \(\mu_{\bar X}\): mean equal to the population mean \(\mu\)
  • \(\sigma_{\bar X}\): standard error equal population standard deviation divided by the square root of the sample size, i.e. \(\sigma/\sqrt{n}\).

Population Distribution

So all that is left to do is recognize the population distribution and identify the mean and variance.

\[ X_i \sim \quad ? \]

Population Distribution

So all that is left to do is recognize the population distribution and identify the mean and variance.

\[ X_i \sim \text{Bernoulli}(p) \]

where \(p\) is the probability that a randomly selected child has an ADHD diagnosis.

Population Mean and Variance

For \(X_i \sim \text{Bernoulli}(p)\), \(\quad \quad \boxed{\text{recall } \bar X \sim N(\mu, \sigma/\sqrt{n})}\)

\[ \begin{align} \mathbb{E}[X_i] =\mu & = p &\mathbb{Var}(X_i) = \sigma^2 = p(1-p)\\ & & \implies \sigma = \sqrt{p(1-p)} \end{align} \]

\[ \hat p \sim N\left(p, \sqrt{\frac{p(1-p)}{n}} \right)\\ \]

Sampling Distribution of \(\hat p\)

\[ \hat p \sim N\left(\mu_{\hat p} = p, \sigma_{\hat p} = \sqrt{\frac{p(1-p)}{n}} \right) \]

Conditions for Using the Sampling Distribution of \(\hat p\)

To ensure the CLT applies and the distribution of \(\hat p\) is approximately Normal, we need:

  1. Random sample / independent observations
  2. Large sample condition (Success–Failure condition)

\[ \begin{align} np \geq 10 && \text{and} && n(1-p) \geq 10 \end{align} \]

Recap of Rejection Method1

  • This test uses a test statistic, i.e., a function of the sample data use to decide whether to reject or fail to reject \(H_0\).

  • It requires a critical value(s)2 which separates the

    • rejection region (RR): the set of observed test statistics for which \(H_0\) will be rejected

    • acceptance”/fail-to-reject region: the set of observed test statistics for which we would fail to reject \(H_0\)

Null Distribution

Our base assumption for hypothesis testing is that \(H_0\) is true.

\[H_0: \mu = \mu_0\] null dist. for mean

\[ \begin{align} \bar X &\sim N(\mu_\bar X, \sigma_{\bar X})\\ \bar X &\sim N(\mu_0, \sigma/\sqrt{n}) \end{align} \]

\[H_0: p = p_0\] null dist. for proportions

\[ \begin{align} \hat p &\sim N(\mu_{\hat p}, \sigma_{\hat p})\\ \hat p &\sim N\left(p_0, \sqrt{\frac{p_0(1-p_0)}{n}}\right) \end{align} \]

Test Statistic

As before, we standardize our sample statistic to \(N(0,1)\).

standardization for mean

\[ Z = \frac{\bar X - \mu_{\bar x}}{\sigma_{\bar X}} \] where \(\bar X \sim N(\mu_{\bar X}, \sigma_{\bar X})\)

standardization for proportions

\[ Z = \frac{\hat p - \mu_{\hat p}}{\sigma_{\hat p}} \] where \(\hat p \sim N(\mu_{\hat p}, \sigma_{\hat p})\)

Test Statistic

As before, we standardize our sample statistic to \(N(0,1)\).

standardization for mean

\[ Z = \frac{\bar X - \mu_0}{\dfrac{\sigma}{\sqrt{n}}} \] where \(Z \sim N(0,1)\)

standardization for proportions

\[ Z = \frac{\hat p - p_0}{ \sqrt{\dfrac{p_0(1-p_0)}{n}} } \] where \(Z \sim N(0,1)\)

RR for Two-tail test

RR for upper-tail test

RR for lower-tail test

Critical Value approach for proportions

  1. Check Assumptions. If satisfied, 1. State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Find critical value:

    \[\begin{cases} P(-z_{crit} < Z < z_{crit}) = 1 - \alpha &\text{ if } H_A: p \neq p_0 \\ P(Z < z_{crit}) = \alpha &\text{ if } H_A: p < p_0 \\ P(Z > z_{crit}) = \alpha &\text{ if } H_A: p > p_0 \end{cases}\]
  3. Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)

  4. Conclusion: reject \(H_0\) if \(z_{obs} \in\) rejection region, otherwise, fail to reject \(H_0\).

Assumptions

Assumptions for Hypothesis Tests for Proportions

  1. We have a simple random sample
  2. The experiment can be modeled by the binomial distribution:
    • number of trials is fixed
    • trials are independent
    • only two possible outcomes (‘success’ and ‘failure’)
    • Probabilities constant for each trial
  3. Success-failure condition1: \(np \geq 10\) and \(n(1-p) \geq 10)\)

ChatGPT

Exercise 1 According to a November 2023 research survey conducted by Pew Research1, about 13% of all U.S. teens have used the generative artificial intelligence (AI) chatbot in their schoolwork. Suppose we wish to investigate if this the proportion of students on our campus using generative AI to do homework. We survey 80 randomly selected UBCO students and find that 14 have admitted to using generative AI to do homework. Perform a formal hypothesis test for determining the proportion at UBCO differs from that of U.S. teens.

Check Assumption

We need to check the success-fail condition: that \(np \geq 10\) and \(n(1-p) \geq 10)\).

Important

We use the hypothesized value \(p_0\) for these checks.

Here \(n\) = 80, our hypothesized value for \(p\) is 0.13

  • \(np_0\) = 10.4 ✅

  • \(np_0\) = 69.6 ✅

Solution to Exercise 1

  1. State null and alternative hypotheses

    \(H_0:\)

    \(H_A:\)

  2. Find the critical values:

  1. Calculate the test statistic:

    \[ \begin{equation} Z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}} = \end{equation} \]

  2. Conclusion:

P-value

Alternatively we could have used the \(p\)-value approach …

\(p\)-value

Definition 1 The \(p\)-value is the probability, calculated assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as contradictory to \(H_0\) as the value calculated from the available sample.

In other words, it quantifies the chances of obtaining the observed data or data more favorable to the alternative than our current data set if the null hypothesis were true.

\(p\)-value approach for proportions

  1. State hypotheses \[\begin{equation} H_0 : p = p_0 \quad \text{ vs. } \quad H_A: \begin{cases} p \neq p_0& \text{ two-sided test} \\ p < p_0&\text{ one-sided (lower-tail) test} \\ p > p_0&\text{ one-sided (upper-tail) test} \end{cases} \end{equation}\]

  2. Compute the test statistic \(z_{obs} = \dfrac{\hat p - p_0}{\sqrt{p_0(1-p_0)/n}}.\)

  3. Calculate the \(p\)-value

    \[\begin{cases} 2P(Z \geq |z_{obs}|) &\text{ if } H_A: p \neq p_0 \\ P(Z \leq z_{obs}) &\text{ if } H_A: p < p_0 \\ P(Z \geq z_{obs}) &\text{ if } H_A: p > p_0 \end{cases}\]
  4. Conclusion: reject \(H_0\) if \(p\)-value is less than \(\alpha\) (typically 0.05), otherwise, fail to reject \(H_0\).

Redo example using p-values

Returning to Exercise 1, and using the same hypotheses, the \(p\)-value can be calculated as follows:

\[\begin{align} 2P(Z \geq |z_{obs}|) &= 2P(Z \geq |1.1968127|) \\ &= 2 (0.1156898)\\ &= 0.2313796\\ \end{align}\]

Since this \(p\)-value (\(\approx 0.231)\) is greater than \(\alpha = 0.05\), we fail to reject \(H_0\).

Evidence

  • If 13% of use ChatGPT to do homework, we would expect \(0.13\times80= 10.4\) in our survey to answer “yes”.

  • As our sample moves farther from that expected value, the null hypothesis becomes less and less plausible.

  • This provides increasing evidence against the null \(H_0\).

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 14 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 15 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 16 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 17 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 18 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 19 yes’s.

The distribution of the test statistic for testing \(H_0: p = 0.13\) vs \(H_0: p \neq 0.13\). Shaded in yellow is the \(p\)-value when we observed 20 yes’s.

Comments

When the observed test statistics falls in the rejection region, we will necessarily have a significant \(p\)-value and vice versa. Observing…

  • 21 out of 80 (i.e. \(\hat p = \frac{21}{80}\) = 0.2625) yields a very small \(p\)-value (<0.001). \(\implies\) very strong evidence against the null hypothesis \(H_0\).
  • 17 out of 80 (i.e. \(\hat p = \frac{17}{n}\) = 0.2125 ) yields a significant \(p\)-value (0.028), but evidence is not as strong as a sample with 21 “yes”s.
  • 16 out of 80 (i.e. \(\hat p = \frac{16}{80}\) = 0.2) yields an almost significant \(p\)-value (0.063). Hence we have insufficient evidence against \(H_0\) but may still have our suspicions (“weak evidence”).

Interpreting \(p\)-values

Here are some guidelines for using the \(p\)-value to assess the evidence against the null hypothesis at \(\alpha = 0.05\).
\(p\)-value Evidence against \(H_0\) Significance code in R
\(0.1 \leq p \leq 1\) no evidence
\(0.05 < p \leq 0.10\) weak evidence .
\(0.01 < p \leq 0.05\) sufficient evidence *
\(0.001 < p \leq 0.01\) strong evidence **
\(0< p \leq 0.001\) very strong evidence ***

Confidence Interval for p

We build a \(100*(1-\alpha)\)% confidence interval based on:

\[ \begin{align} \text{point estimate} &\pm \textcolor{red}{\boxed{\text{Margin of Error}}}\\ \hat p \ &\pm \textcolor{red}{\boxed{z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}}}} \end{align} \]

Alternatively, we could express our CI as:

\[ (\hat p - \textcolor{red}{\boxed{\text{ME}}}, \hat p + \textcolor{red}{\boxed{\text{ME}}}) \]

Important Note

Hypothesis tests use \(p_0\) in the standard error \[ \sigma_{\hat p} = \sqrt{\frac{p_0(1-p_0)}{n}} \]

Confidence Intervals use \(\hat p\) in the standard error \[ \sigma_{\hat p} = \sqrt{\frac{\hat p(1-\hat p)}{n}} \]

Confidence Interval for p

Exercise 1: 95% CI based on 14 out of 80 yes’s \(\left(\frac{14}{80} = 0.175\right)\)

\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{\hat p(1-\hat p)}{n}}\\ 0.175 &\pm 1.96 \frac{0.175(1- 0.175)}{\sqrt{80}}\\ 0.175 &\pm 0.08326396 \\ [0.0917&, 0.2583] \end{align} \]

Since \(p_0\) = 0.13 lies within this CI we would fail to reject the null hypothesis that \(p = 0.13\)

Confidence Interval for p

Exercise 1: 95% CI based on 17 out of 80 yes’s \(\left(\frac{17}{80} = 0.2125\right)\)

\[ \begin{align} \hat p &\pm z_{\alpha/2} \sqrt{\frac{\hat p(1-\hat p)}{n}}\\ 0.2125 &\pm 1.96 \frac{0.2125(1- 0.2125)}{\sqrt{80}}\\ 0.2125 &\pm 0.08964289 \\ [0.1229&, 0.3021] \end{align} \]

Since \(p_0\) = 0.13 does not lie within this CI would have sufficient evidence to reject the null hypothesis that \(p = 0.13\)

iClicker

Margin of Error for a Proportion I

Suppose the sample proportion value changed from \(\hat p=0.2125\) to \(\hat p=0.4\). What happens to the margin of error? (Assume the same \(\alpha\) and \(n\))

  1. The ME becomes narrower
  2. The ME becomes wider
  3. The ME stays the same
  4. There is not enough information to say

iClicker

Margin of Error for a Proportion II

Suppose the sample proportion value changed from \(\hat p=0.2125\) to \(\hat p=0.9\). What happens to the margin of error? (Assume the same \(\alpha\) and \(n\))

  1. The ME becomes narrower
  2. The ME becomes wider
  3. The ME stays the same
  4. There is not enough information to say

iClicker

Margin of Error for a Proportion III

All else equal, for which value of \(p\) is the standard error largest?

  1. \(p=0.10\)
  2. \(p=0.25\)
  3. \(p=0.50\)
  4. \(p=0.80\)
  5. None of the above
  6. It depends

Sample Size

As we have seen before we can do sample size calculations to acheive a desired margin of error.

Lightbulbs

Exercise 2 Suppose that we want to estimate the true proportion of defective light bulbs in a very large shipment, and that we want to be at least 95% confident that the error in our estimate is at most 0.03. How large a sample will we need if …

  1. we have no idea what the true proportion might be?
  2. we know that the true proportion does not exceed 0.08?

Solving for \(n\)

\[ \begin{align} \text{point estimate} &\pm \textcolor{red}{\boxed{\text{Margin of Error}}}\\ \hat p \ &\pm \textcolor{red}{\boxed{z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}}}} \end{align} \]

If we have a desire margin of error = \(E\), we simply solve for \(n\)

\[ \begin{align} z_{\alpha/2} \times \sqrt{\frac{0.5(1-0.5)}{n}} &= E\\ \sqrt{\frac{p(1-p)}{n}} &= \frac{E}{z_{\alpha/2}}\\ \frac{p(1-p)}{n} &= \left(\frac{E}{z_{\alpha/2}}\right)^2\\ \implies n &= p(1-p) \left(\frac{z_{\alpha/2}}{E}\right)^2 \end{align} \]

Solution 1

Using the conservative formula (worst case when \(p = 0.5\)),

Code
alpha = 0.05
E = 0.03
cl = 100*(1-alpha)
zcrit = qnorm(alpha/2, lower.tail = FALSE)
p = 0.5
E = 0.03
n = p*(1-p)*(zcrit/E)^2
nup = ceiling(n)

\[ \begin{align} n &= p(1-p) \left(\frac{z_{\alpha/2}}{E}\right)^2\\ &= 0.5(1-0.5) \left(\frac{z_{0.025}}{0.03}\right)^2\\ &= 0.5(1-0.5) \left(\frac{1.959964}{0.03}\right)^2\\ &= 1067.0718946 \implies n \text{ must be at least } 1068 \end{align} \]

Solution 2

Using the formula with prior information \(p\) is at most 0.08 we use the worst case scenario of pknown (in code below) = 0.08 (anything lower will lead to a smaller required sample size).

Code
n = pknown*(1-pknown)*(zcrit/E)^2
nup = ceiling(n)

\[ \begin{align} n &= p(1-p) \left(\frac{z_{\alpha/2}}{E}\right)^2\\ &= 0.08(1-0.08) \left(\frac{z_{0.025}}{0.03}\right)^2\\ &= 0.0736 \left(\frac{1.959964}{0.03}\right)^2\\ &= 314.1459658 \implies n \text{ must be at least } 315 \end{align} \]

Tests for Proportions in R

prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)
  • x counts of success
  • n counts of trials
  • p hypothesized \(p_0\)
  • conf.level = \(1-\alpha\)
  • correct If Yates’ continuity correction gets applied

Examples x = 14

Note that when we run this test, our \(p\)-values agree with what we obtained earlier, however the test statistic and CI do not. We will return to this later.

prop.test(x = 14, n = 80, p = 0.13, correct = FALSE)

    1-sample proportions test without continuity correction

data:  14 out of 80, null probability 0.13
X-squared = 1.4324, df = 1, p-value = 0.2314
alternative hypothesis: true p is not equal to 0.13
95 percent confidence interval:
 0.1072064 0.2725754
sample estimates:
    p 
0.175 

Examples x = 17

Note that when we run this test, our \(p\)-values agree with what we obtained earlier, however the test statistic and CI do not. We will return to this later.

prop.test(x = 17, n = 80, p = 0.13, correct = FALSE)

    1-sample proportions test without continuity correction

data:  17 out of 80, null probability 0.13
X-squared = 4.8143, df = 1, p-value = 0.02822
alternative hypothesis: true p is not equal to 0.13
95 percent confidence interval:
 0.1371239 0.3142216
sample estimates:
     p 
0.2125 

Examples x = 20

Note that when we run this test, our \(p\)-values agree with what we obtained earlier, however the test statistic and CI do not. We will return to this later.

prop.test(x = 20, n = 80, p = 0.13, correct = FALSE)

    1-sample proportions test without continuity correction

data:  20 out of 80, null probability 0.13
X-squared = 10.186, df = 1, p-value = 0.001415
alternative hypothesis: true p is not equal to 0.13
95 percent confidence interval:
 0.1680623 0.3548467
sample estimates:
   p 
0.25 

Big Picture

  • Proportion inference is just mean inference on 0/1 data

  • The quality of the Normal approximation depends strongly on p

    • The closer p is to 0.5, the better everything behaves
  • Note: There is more than one way to construct a confidence interval for \(p\).

    • Today we used the Wald CI.

Resources

Devore, J. L., K. N. Berk, and M. A. Carlton. 2021. Modern Mathematical Statistics with Applications. Springer Texts in Statistics. Springer International Publishing. https://books.google.ca/books?id=ghcsEAAAQBAJ.