stat205 – Hypothesis Test for One Variance

Introduction

We have covered three hypothesis tests for a single sample:

Hypothesis test for the mean \(\mu\) with \(\sigma\) known (\(Z\)- test)
Hypothesis tests for the proportion \(p\) (\(Z\)- test)
Hypothesis test for the mean \(\mu\) with \(\sigma\) unknown (\(t\)-test)

Today we consider hypothesis tests involve the population variance \(\sigma^2\)

Assumptions: \(X_1, X_2, \dots, X_n\) are i.i.d + assumptions in the rhombuses.

Recap

For random samples from normal populations, we saw (see Lecture 4 how:

\[ \dfrac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1} \]

where \(S^2 = \frac{\sum_{i = 1}^n (X_i - \bar{X})}{n-1}\) is the sample variance and \(\chi^2_{n-1}\) is the Chi-squared distribution with \(n-1\) degrees of freedom.

Hypotheses

We may which to test if there is evidence to suggest that population variance differs for some hypothesized value \(\sigma_0^2\).
As before, we start with a null hypothesis (\(H_0\)) that the population variance equals a specified value (\(\sigma^2 = \sigma_0^2\))
We test this against the alternative hypothesis \(H_A\) which can either be one-sided (\(\sigma^2 < \sigma_0^2\) or \(\sigma^2 > \sigma_0^2\)) or two-sided (\(\sigma^2 \neq \sigma_0^2\)).

Test Statistic

Recall that our test statistic is calculated assuming the null hypothesis is true. Hence, if we are testing \(H_0: \sigma^2 = \sigma_0^2\), the test statistic we use is : \[ \chi^2 = \dfrac{(n-1)S^2}{\sigma_0^2} \] where \(\chi^2 \sim \chi^2_{n-1}\).

Chi-square distrbituion

Assumptions

For the following inference procedures to be valid we require:

A simple random sample from the population
A normally distributed population (very important, even for large sample sizes)

Warning

It is important to note that if the population is not approximately normally distributed, chi-squared distribution may not accurately represent the sampling distribution of the test statistic.

Rejection Regions and \(p\)-values for the chi-square test concerning one variance
Alternative	Reject \(H_A\) if…	\(p\)-value
\(H_A: \sigma^2 < \sigma_0^2\)	\(\chi^2_{\text{obs}} \geq \chi^2_\alpha\)	Area to the right of \(\chi^2_{\text{obs}}\)
\(H_A: \sigma^2 > \sigma_0^2\)	\(\chi^2_{\text{obs}} \leq \chi^2_{1-\alpha}\)	Area to the left of \(\chi^2_{\text{obs}}\)
\(H_A: \sigma^2 \neq \sigma_0^2\)	\(\chi^2_{\text{obs}} \geq \chi^2_{\alpha/2}\) or \(\chi^2_{\text{obs}} \leq \chi^2_{1-\alpha/2}\)	Double the area to the left or right of \(\chi^2_{\text{obs}}\); whichever is smallest.

Critical Region (upper-tailed)

The rejection region associated with an upper-tailed test for the population variance. Note that the critical value will depend on the chosen significance level (\(\alpha\)) and the d.f.

Critical Region (lower-tailed)

The rejection region associated with an upper-tailed test for the population variance. Note that the critical value will depend on the chosen significance level (\(\alpha\)) and the d.f.

Critical Region (two-tailed)

The rejection region associated with an upper-tailed test for the population variance. Note that the critical value will depend on the chosen significance level (\(\alpha\)) and the d.f.

P-values

Similarly we can find \(p\)-values from Chi-squared tables or R

\(p\)-value for lower-tailed: \[\Pr(\chi^2 < \chi^2_{\text{obs}})\] \(p\)-value for upper-tailed: \[\Pr(\chi^2 > \chi^2_{\text{obs}})\] \(p\)-value for two-tailed:

\[2\cdot \min \{ \Pr(\chi^2 < \chi^2_{\text{obs}}), \Pr(\chi^2 > \chi^2_{\text{obs}})\}\]

P-values

Similarly we can find \(p\)-values from Chi-squared tables or R

\(p\)-value for lower-tailed: \[\Pr(\chi^2 < \chi^2_{\text{obs}})\] \(p\)-value for upper-tailed: \[\Pr(\chi^2 > \chi^2_{\text{obs}})\] \(p\)-value for two-tailed:

\[2\cdot \min \{ \Pr(\chi^2 < \chi^2_{\text{obs}}), \Pr(\chi^2 > \chi^2_{\text{obs}})\}\]

Beyond Burger Fat

Exercise 1 Beyond Burgers claim to have 18 g grams of fat. A random sample of 11 burgers had a mean of 19.45 and a variance of 0.85 grams\(^2\). Suppose that the quality assurance team at the company will on accept at most a \(\sigma\) of 0.5. Use the 0.05 level of significance to test the null hypotehsis \(\sigma = 0.5\) against the appropriate alternative.

\[\begin{align} H_0: \sigma^2 &= 0.5^2 & H_A: \sigma^2 &> 0.5^2 \end{align}\]

Distribution of Test Statistic

Code

par(mar=c(4,4,0,0) + 0.1)
xmax = 30 #4 * df
curve(dchisq(x, df = n-1), from = 0, to = xmax, ylab = "Density", xlab = expression(chi^2))

Under the null hypothesis, the test statistic follows \(\chi^2 = (n-1)S^2/0.5^2\) a chi-square distribution with df = 10

Critical value

The critical value can be found by determining what value on the chi-square curve with 10 df yield a 5 percent probability in the upper tail (since we are doing an upper-tailed test). In R: qchisq(alpha, df=n-1, lower.tail = FALSE). Verify using \(\chi^2\) table.

Observed Test Statistic

Compute the observed test statistic which we denote by \(\chi^2_{\text{obs}}\)

\[\begin{align} \chi^2_{\text{obs}} &= \dfrac{(n-1)s^2}{\sigma_0^2}\\ &= \dfrac{(10)0.85^2}{0.5^2}= \dfrac{7.225}{0.25}\\ &= 20.6690909 \approx 20.67 \end{align}\]

Critical value

Since the observed test statistic falls in the rejection region, i.e. \(\chi^2_{\text{obs}} > \chi^2_{\alpha}\), we rejection the null hypothesis in favour of the alternative.

P-value in R

pchisq(20.67, df = 10, lower.tail = FALSE)

[1] 0.02351589

We could compute the exact p-value in R (0.0235) or approximate using the \(\chi^2\) table.

Using the chi-square distribution table we can see that our observed test statistic falls between two values. We can use the neigbouring values to approximate our p-value.

Approximate P-value

\(p\text{-value} = \Pr(\chi^2_{10} > 20.67)\)

It is clear from the visualization that \[\begin{align} \Pr(\chi^2_{10} > 20.67) > \Pr(\chi^2_{10} > \textcolor{dodgerblue}{23.209})\\ \Pr(\chi^2_{10} > 20.67) < \Pr(\chi^2_{10} > \textcolor{deeppink}{20.483}) \\ \end{align}\]

Hence the \(p\)-value can be expressed as:

\[\begin{align} \textcolor{dodgerblue}{0.01} < p\text{-value } < \textcolor{deeppink}{0.025} \end{align}\]

Conclusion

We reject the null hypothesis in favour of the alternative since:

the \(p\)-value (0.0235) is less than \(\alpha\) = 0.05 OR
the the observed test statistic (\(\chi^2_{\text{obs}}\) = 20.669) is larger than the critical value \(\chi^2_{\alpha}\)

Hence, there is very strong evidence to suggest that the population variance \(\sigma^2\) is greater than \(0.5^2\).

Hypothesis Test for \(\sigma\) in R

There is no base function for performing this test in R
This is person some indication that it’s not a common test in modern applied work
If you want to do this in R, you can always use the varTest() function from the EnvStats library (you will NOT be tested on this in the final exam)

Optional material

library(EnvStats)
fat_samples <- 17.6, 17.8, 19.2, 18.1, 18.1, 19.3, 18.3, 17.1, 17.5, 17.7, 18.9
varTest(x= fat_samples, alternative = "greater", sigma.squared = 0.25)


Results of Hypothesis Test
--------------------------

Null Hypothesis:                 variance = 0.25

Alternative Hypothesis:          True variance is greater than 0.25

Test Name:                       Chi-Squared Test on Variance

Estimated Parameter(s):          variance = 0.5167273

Data:                            fat_samples

Test Statistic:                  Chi-Squared = 20.66909

Test Statistic Parameter:        df = 10

P-value:                         0.02352291

95% Confidence Interval:         LCL = 0.2822561
                                 UCL =       Inf