round(median(absenteeism$days),2)
[1] 11
STAT 205: Introduction to Mathematical Statistics
Dr. Irene Vrbik
University of British Columbia Okanagan
Exercise 1.1 What is the extension for quarto documents?
.rmd
.qmd
.html
.r
.qrt
qmd
Exercise 1.2 What does echo: false
do in a Quarto code chunk?
Exercise 1.3 Which of the following is NOT part of the YAML section in a Quarto document?
title
author
output
subtitle
Exercise 1.4 What is the syntax for producing code chunks in Quarto?
r
inside{r}
at the start:{r}
inside{.r}
at the start:Use triple backticks with {r}
at the start:
Exercise 1.5 What does echo: false
do in a Quarto code chunk?
Displays the output but hides the code
Exercise 1.6 Why is setting a seed important in R, and how do you set one?
Setting a seed ensures reproducibility by making random number generation consistent across runs. In R, you set a seed using set.seed(number)
, where number is any fixed integer (e.g., set.seed(123)
).
Exercise 1.7 Based on the output below, identify which data type x
has been coded as.
Exercise 1.8 In the context of variables, which of the following is an example of a categorical variable?
Gender of individuals.
Exercise 1.9 Which measure of central tendency is defined as the middle value when data is ordered?
Median
Exercise 1.10 The Interquartile Range (IQR) measures:
The spread of the middle 50% of the data
Exercise 1.11 What does the breaks
argument in the hist
function do?
The breaks
argument in the hist
function determines how data is divided into bins. It can be:
- A single number specifying the approximate number of bins.
- A vector specifying exact breakpoints.
- A function that computes breakpoints dynamically.
More bins result in more bars and smaller bin widths.
Exercise 1.12 Consider Figure Figure 1.1, estimate the median number of absent days.
Exercise 1.13 What is the primary difference between descriptive and inferential statistics?
Descriptive statistics summarize data; inferential statistics draw conclusions about a population based on a sample.
Exercise 1.14 In the context of sampling methods, what is a simple random sample?
A sample where each member of the population has an equal chance of being selected.
Exercise 1.15 Explain what selection bias is and provide an example of how it might occur in a research study.
Selection bias occurs when the sample chosen for a study is not representative of the population, leading to biased results. An example is surveying only individuals who have internet access for an online study, which excludes those without internet access.
Exercise 1.16 Which type of study involves observing individuals or groups and collecting data without intervening or manipulating any aspect of the study participants?
Exercise 1.17 Suppose a researcher wants to study the average income of households in Kelowna.
1. All households in Kelowna. 2. Sampling houses only in a rich neighborhood. 3. The population parameter of interest is**
Exercise 2.1 Which of the following statements best describes statistics?
Exercise 2.2 True or False: Parameters are descriptive measures computed from a sample, while statistics are descriptive measures computed from a population.
False
It’s the other way around parameters are descriptive measures computed from a population, while statistics are descriptive measures computed from a sample.
Exercise 2.3 Let
Recall the Probability Density Function (PDF) of the Gamma Distribution:
where
Using the method of moments, we equate the population moments to the sample moments:
Setting the first population moment equal to the first sample moment and solving for
The second moment equation (using variance):
Substituting
Thus, the moment estimators are:
Exercise 2.4 Which of the following best describes the concept of sampling distribution?
Exercise 2.5 Let
The likelihood function is:
Taking the log-likelihood:
When
Setting the derivative to zero and solving for
Thus, the MLE estimator for
When
Setting the derivative to zero and solving for
The MLE estimator for
To check if it is unbiased, we compute its expectation:
Since
Since
Since
Exercise 2.6 Let
Find the efficiency of
The relative efficiency is given by:
Substituting the given variances:
Simplifying:
Thus, the relative efficiency of
Exercise 2.7 Suppose we have a manufacturing process where widgets are produced, and the time it takes for a widget to be completed follows an exponential distribution with a mean time of 4 hours. If thirty-five widgets from this manufacturer are chosen at random:
Let
By the Central Limit Theorem, the mean completion time
Mean:
Standard Error:
Since the variance of an exponential distribution
Thus, we have:
or we can keep it in fraction form:
We compute:
Substituting values:
Using standard normal tables:
Thus, the probability that the mean failure time exceeds 5 hours is 0.0696.
A graphical representation is shown below:
Exercise 2.8 Let
Using the result that if
In this case:
How would you calculate this probability in R?
pchisq(2, df = n-1, lower.tail = FALSE)
qchisq(2, df = n-1, lower.tail = FALSE)
pchisq(2, df = n-1)
pchisq(1-2, df = n-1)
Exercise 3.1 Which of the following best describes the purpose of a confidence interval?
Exercise 3.2 As part of an investigation from Union Carbide Corporation, the following data represent naturally occurring amounts of sulfate S04 (in parts per million) in well water. The data is from a random sample of 24 water wells in Northwest Texas.
No, there is no indication of a violation of the normality assumption. The boxplot appears roughly symmetric, and there is no systematic curvature or evident outliers in the normal QQ plot.
Based on the following summary statistics, estimate the standard error for data
)
Since we don’t have
Using the
The 90% confidence interval for
Exercise 3.3 In New York City on October 23rd, 2014, a doctor who had recently been treating Ebola patients in Guinea went to the hospital with a slight fever and was subsequently diagnosed with Ebola. Soon thereafter, an NBC 4 New York/The Wall Street Journal/Marist Poll found that 82% of New Yorkers favored a “mandatory 21-day quarantine for anyone who has come into contact with an Ebola patient.” This poll included responses from 1042 New York adults between October 26th and 28th, 2014.
The point estimate, based on a sample size of
To check whether
Since both conditions are met, we can assume the sampling distribution of
The standard error is given by:
Substituting values:
Using
Either the point estimate plus/minus the margin of error, or the confidence interval are acceptable final answers.
We are 95% confident that the proportion of New York adults in October 2014 who supported a quarantine for anyone who had come into contact with an Ebola patient was between 0.797 and 0.843.