Part I focused on causal inference in settings where we conceptualized study populations as effectively infinite, allowing us to ignore random variability and focus solely on systematic bias from confounding, selection, and measurement. Part II now introduces random variability and the use of statistical models for causal inference. This chapter bridges identification (Part I) and estimation (Part II), explaining why we need models and how to quantify uncertainty.
Up to now, we have focused on identification: determining whether causal effects can be computed from observed data under certain assumptions. Now we turn to estimation: using finite data to approximate those causal effects.
Estimand: The population parameter of interest (e.g., \(Pr[Y = 1|A = a]\) in the super-population).
Estimator: A rule for computing the estimand from sample data.
Estimate: The numerical value obtained by applying the estimator to a particular sample (a point estimate).
Example 1 (Sample Proportion as an Estimator) Estimand: Super-population risk \(Pr[Y = 1|A = 1]\)
Estimator: Sample proportion \(\widehat{Pr}[Y = 1|A = 1]\)
Estimate: From our 20-person study, \(\widehat{Pr}[Y = 1|A = 1] = 7/13 \approx 0.54\)
An estimator is consistent if estimates get arbitrarily close to the true parameter as sample size increases.
\[Pr\left[|\hat{\theta}_n - \theta| > \epsilon\right] \rightarrow 0 \text{ as } n \rightarrow \infty, \text{ for all } \epsilon > 0\]
The sample proportion \(\widehat{Pr}[Y = 1|A = a]\) is a consistent estimator of \(Pr[Y = 1|A = a]\).
A 95% confidence interval quantifies uncertainty due to random sampling.
Construction (Wald interval):
Example: For \(\hat{p} = 7/13 \approx 0.54\) with \(n = 13\):
In randomized experiments with random sampling, standard statistical methods can be used to estimate causal effects and compute confidence intervals.
Suppose:
Because of exchangeability, the causal risk difference equals the associational risk difference in the super-population:
\[Pr[Y^{a=1} = 1] - Pr[Y^{a=0} = 1] = Pr[Y = 1|A = 1] - Pr[Y = 1|A = 0]\]
Estimators:
Standard statistical methods provide confidence intervals for these causal effects.
In observational studies, similar methods apply after adjusting for confounding:
The concept of a “super-population” is a useful fiction that allows us to apply statistical methods, but it raises important questions about the sources of randomness.
The standard binomial confidence interval for \(p = Pr[Y = 1|A = a]\) is valid in two scenarios:
Scenario 1: Random sampling from a super-population
Scenario 2: Nondeterministic counterfactuals
Most applied researchers use confidence intervals computed under the super-population framework, even when:
This practice is justified by the convenience and familiarity of standard methods, though it requires careful interpretation.
When should we condition on variables when computing causal effects and their standard errors?
Conditionality principle: If a variable \(L\) is independent of treatment \(A\) and outcome \(Y\) under the intervention, we may condition on \(L\) when computing estimates and standard errors without affecting validity.
Example 1: Stratified randomization
If treatment is randomized within strata of sex \(L\):
Example 2: Baseline covariates in randomized trials
Even when baseline covariates \(L\) are balanced across treatment groups:
As the number of confounders or effect modifiers increases, nonparametric estimation becomes increasingly difficult. This is the curse of dimensionality.
Suppose we need to adjust for 10 binary confounders:
Consequences:
Parametric models make assumptions about the functional form relating variables:
Advantages:
Disadvantages:
The remainder of Part II describes methods that use parametric and semiparametric models to:
Understanding the curse of dimensionality motivates the need for these modeling approaches.
This chapter introduced random variability and bridged Part I (identification) and Part II (estimation).
Key concepts: