SMaC: Statistics, Math, and Computing

Applied Statistics for Social Science Research

Eric Novik | Summer 2024 | Session 7

Session 7 Outline

  • Statistical inference
  • Sampling distribution
  • Standard errors
  • Confidence intervals
  • Degrees of freedom and t distribution
  • Bias and uncertainty
  • Statistical significance

\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\P}{\mathbb{P}} \DeclareMathOperator{\V}{\mathbb{V}} \DeclareMathOperator{\L}{\mathscr{L}} \DeclareMathOperator{\I}{\text{I}} \]

Introduction to Statistical Inference

  • Statistical Inference: A process of learning from noisy measurements
  • Key Challenges:
    • Generalizing from a sample to the population of interest
    • Learning what would have happened under different treatment
    • Understanding the relationship between the measurement and the estimand

Image source: https://diff.healthpolicydatascience.org/

Measurement Error Models

  • We are trying to estimate the parameters of some data-generating process
  • For our car example, we assumed that the car position data are coming from the following model:

\[ x_i(t) = a + bt_i^2 + \epsilon_i \]

  • We also assumed that time \(t\) was observed precisely, but we can also have an error in \(t\)
  • Errors may be multiplicative, not just additive

Sampling Distribution

  • Generative model for our data: given the generative model, we can create replicas of the dataset
  • Sampling distribution is typically unknown and depends on the data collection, how subjects are assigned to treatment conditions, sampling process
  • Examples
    • Simple random sample of size \(n\) from the population of size \(N\). Here, each person has the same probability of being selected
    • Our constant acceleration model \(x_i(t) = a + bt_i^2 + \epsilon_i\), where we fix \(a\) and \(b\), and \(t\), and draw \(\epsilon\) from an error distribition, such as \(\text{Normal}(0, 1)\)
    • We can write some code to generate observations according to this process (which is what I did for the car example)

Standard Errors

  • Standard error is the estimated standard deviation of the estimate
  • It provides a measure of uncertainty around the estimate
  • Standard error decreases as the sample size increases
  • To estimate the SE of the mean from a large population with sd = \(\sigma\): \(\frac{\sigma}{\sqrt{n}}\)
set.seed(1)
n <- 100; mu <- 5; sigma <- 1.5
x <- rnorm(n, mean = mu, sd = sigma)
x_bar <- mean(x); round(x_bar, 2)
[1] 5.16
(se <- sigma/sqrt(n))
[1] 0.15
  • If \(\sigma\) is unknown, we can estimate it from the sample: \(s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2}\)
var(x)
[1] 1.815215
sum((x - x_bar)^2) / (n - 1)
[1] 1.815215

Confidence Intervals

  • For a given sampling distribution, confidence interval provides a range of parameter values consistent with the data
  • For example, for a normal sampling distribution, an estimate \(\hat{\theta}\) will fall in \(\hat{\theta} \pm 2\cdot\text{se}\) about 95% of the time

Confidence Intervals for Proportions

  • In surveys, we are often interested in estimating the standard error of the proportion
  • Suppose we want to estimate the proportion of the US population that supports same-sex marriage
  • Say you randomly survey \(n = 500\) from the population, and \(y = 355\) people respond yes1
  • The estimate of the proportion is \(\hat{\theta} = y/n\) with the \(\text{se} = \sqrt{\hat{\theta}(1 -\hat{\theta})/n }\)
n <- 500
y <- 355
theta_hat <- y/n
theta_hat |> round(2)
[1] 0.71
se <- sqrt(theta_hat * (1 - theta_hat) / n)
se |> round(2)
[1] 0.02
theta_hat + 2*se |> round(2)
[1] 0.75
theta_hat - 2*se |> round(2)
[1] 0.67
ci_95 <- theta_hat + qnorm(c(0.025, 0.975)) * se
ci_95 |> round(2)
[1] 0.67 0.75

Your Turn

In a national survey of \(n\) people, how large does \(n\) have to be so that you can estimate presidential approval to within a standard error of ±3 percentage points, ±1 percentage points?

Standard Error for Differences

  • We already saw an example of how to add variances from two independent RVs

\[ \text{se}_{\text{diff}} = \sqrt{\text{se}_1^2 + \text{se}_2^2} \]

  • Your turn: How large does \(n\) have to be so that you can estimate the gender gap in approval to within a standard error of ±3 percentage points?

Computer Demo: Computing CIs

# Generate fake data
p <- 0.3
n <- 20
data <- rbinom(1, n, p)
print(data)

# Estimate proportion and calculate confidence interval
p_hat <- data / n
se <- sqrt(p_hat * (1 - p_hat) / n)
ci <- p_hat + c(-2, 2) * se
print(ci) 

# Put it in a loop
reps <- 100
for (i in 1:reps) {
  data <- rbinom(1, n, p)
  p_hat <- data / n
  se <- sqrt(p_hat * (1 - p_hat) / n)
  ci <- p_hat + c(-2, 2) * se
  print(ci) 
}

Computer Demo: Proportions, Means, and Differences of Means

# Read data from here:  https://github.com/avehtari/ROS-Examples
library("foreign")
library("dplyr")
pew_pre <- read.dta(
  paste0(
    "https://raw.githubusercontent.com/avehtari/",
    "ROS-Examples/master/Pew/data/",
    "pew_research_center_june_elect_wknd_data.dta"
  )
)
pew_pre <- pew_pre |> select(c("age", "regicert")) %>%
  na.omit() |> filter(age != 99)
n <- nrow(pew_pre)

# Estimate a proportion (certain to have registered for voting?)
registered <- ifelse(pew_pre$regicert == "absolutely certain", 1, 0)
p_hat <- mean(registered)
se_hat <- sqrt((p_hat * (1 - p_hat)) / n)
round(p_hat + c(-2, 2) * se_hat, 4) # ci

# Estimate an average (mean age)
age <- pew_pre$age
y_hat <- mean(age)
se_hat <- sd(age) / sqrt(n)
round(y_hat + c(-2, 2) * se_hat, 4) # ci

# Estimate a difference of means
age2 <- age[registered == 1]
age1 <- age[registered == 0]
y_2_hat <- mean(age2)
se_2_hat <- sd(age2) / sqrt(length(age2))
y_1_hat <- mean(age1)
se_1_hat <- sd(age1) / sqrt(length(age1))
diff_hat <- y_2_hat - y_1_hat
se_diff_hat <- sqrt(se_1_hat ^ 2 + se_2_hat ^ 2)
round(diff_hat + c(-2, 2) * se_diff_hat, 4) # ci

Degrees of Freedom

  • Some distributions rely on the concept of degrees of freedom
  • Degrees of freedom balance the need for accurate estimation against the amount of data available; it’s a way to correct for overfitting
  • The more parameters you estimate, the fewer degrees of freedom you have left for other calculations
  • The data will give us \(n\) (sample size) degrees of freedom, and \(p\) number of parameters will be used up during the estimation

Normal and t Distribution

  • t distribution has a degrees freedom parameter, and with a low number of degrees of freedom, it has heavier tails than the Normal

Confidence Intervals from the t Distribution

  • t distribution has location and scale parameters and is symmetric like the Normal
  • Standard error would have \(n-1\) degrees of freedom as one will be estimated to compute the mean
  • Suppose you threw 5 darts and the distance from bull’s eye was as follows (standard dart board radius is about 23 cm)
# Distance from the bull's eye in cm
data <- c(8, 6, 10, 5, 18)

# Calculate the sample mean
mean_data <- mean(data)

# Calculate the standard error of the mean
se_mean <- sd(data) / sqrt(length(data))

# Degrees of freedom
df <- length(data) - 1

# Calculate the 95% and 50% confidence interval 
ci_95 <- mean_data + qt(c(0.025, 0.975), df) * se_mean
ci_50 <- mean_data + qt(c(0.25, 0.75), df) * se_mean

# Output the results
mean_data |> round(2)
[1] 9.4
se_mean |> round(2)
[1] 2.32
ci_95 |> round(2)
[1]  2.97 15.83
ci_50 |> round(2)
[1]  7.69 11.11

Bias and Uncertainty

  • There is a lot more to it than what is in following standard picture
  • Discuss among yourselves what are the potential sources of bias

Statistical Significance

  • Comes up in NHST: Null Hypothesis Significance Testing
  • Bad decision filter: if p-value is less than 0.05 (relative to some Null), the results can be trusted, otherwise they are likely noise
  • Often, estimates (coefficients) are labeled as not significant if they are within 2 SEs: selecting models this way is also problematic
  • You flip a coin 10 times and observe 7 heads. Is there evidence of coin bias?
  • What about if you got 70 heads out of 100?

Some Problems with Statistical Significance

  • Statistical significance does not mean practical (or in the case of Biostats clinical) significance
  • No significance does not mean there is no effect
  • The difference between “significant” and “not significant” is not itself statistically significant
  • Researcher degrees of freedom, p-hacking