This chapter describes standardization and the parametric g-formula, methods for computing standardized means and risks by outcome modeling. While IP weighting models the treatment assignment mechanism, standardization models the outcome mechanism. Both approaches can estimate the same causal effects under conditional exchangeability.
We’ve seen two approaches to confounding adjustment:
Both can estimate \(E[Y^a]\) under conditional exchangeability.
From Chapter 2, standardization computes:
\[E[Y^a] = \sum_{\ell} E[Y \mid A = a, L = \ell] \Pr[L = \ell]\]
This is a weighted average of stratum-specific means, with weights equal to the population distribution of \(L\).
Definition 1 (Standardization) Standardization estimates the mean outcome under treatment \(a\) by:
\[\hat{E}[Y^a] = \sum_{\ell} \hat{E}[Y \mid A = a, L = \ell] \times \hat{\Pr}[L = \ell]\]
where \(\hat{\Pr}[L = \ell]\) is the observed proportion with \(L = \ell\).
Setting: Binary \(A\), binary \(Y\), discrete \(L\) with \(k\) levels
Step 1: Compute proportion \(Y = 1\) within each stratum \((A = a, L = \ell)\)
Step 2: Standardize to population distribution:
\[\hat{E}[Y^{a=1}] = \sum_{\ell=1}^k \hat{\Pr}[Y = 1 \mid A = 1, L = \ell] \times \hat{\Pr}[L = \ell]\]
\[\hat{E}[Y^{a=0}] = \sum_{\ell=1}^k \hat{\Pr}[Y = 1 \mid A = 0, L = \ell] \times \hat{\Pr}[L = \ell]\]
Causal effect: \(\hat{E}[Y^{a=1}] - \hat{E}[Y^{a=0}]\)
When confounders are continuous or high-dimensional, we cannot compute stratum-specific means directly. Instead, we use parametric models.
Model: Specify a model for \(E[Y \mid A, L]\), such as:
\[E[Y \mid A, L] = \beta_0 + \beta_1 A + \beta_2^{\top} L + \beta_3^{\top} (A \times L)\]
This includes: - Main effects of \(A\) and \(L\) - Interactions between \(A\) and \(L\) to allow effect modification
Estimation: Fit the model using standard regression (e.g., linear regression for continuous \(Y\), logistic regression for binary \(Y\)).
Definition 2 (Parametric G-Formula) Given a model \(\hat{E}[Y \mid A, L]\), the parametric g-formula estimates:
\[\hat{E}[Y^a] = \frac{1}{n} \sum_{i=1}^n \hat{E}[Y \mid A = a, L = L_i]\]
Algorithm:
Outcome model: Linear regression for weight change
\[E[Y \mid A, L] = \beta_0 + \beta_1 A + \sum_{j} \beta_j L_j + \sum_{j} \gamma_j (A \times L_j)\]
Procedure:
The g-formula standardizes to the observed distribution of confounders. We can also standardize to other distributions.
Options for standardization:
Average treatment effect (ATE): \[E[Y^{a=1}] - E[Y^{a=0}]\] Standardized to the population (or sample) distribution of \(L\).
Average treatment effect in the treated (ATT): \[E[Y^{a=1} \mid A = 1] - E[Y^{a=0} \mid A = 1]\] Standardized to the distribution of \(L\) among the treated.
G-formula for ATT: \[\hat{E}[Y^a \mid A = 1] = \frac{1}{n_1} \sum_{i: A_i = 1} \hat{E}[Y \mid A = a, L = L_i]\] where \(n_1 = \sum_i I(A_i = 1)\).
Both IP weighting and standardization can estimate causal effects. How do they compare?
| Aspect | IP Weighting | Standardization |
|---|---|---|
| Models | \(\Pr[A \mid L]\) (treatment) | \(E[Y \mid A, L]\) (outcome) |
| Target | Marginal effect | Marginal effect (via averaging) |
| Natural for | Marginal structural models | Conditional models |
| Handles | Time-varying treatment easily | Time-varying treatment (complex) |
| Efficiency | Less efficient (if outcome model correct) | More efficient (if outcome model correct) |
| Robustness | Robust to outcome model misspec. | Robust to treatment model misspec. |
Use IP weighting when: - Treatment mechanism is simple to model - Outcome is complex or multiply measured - Time-varying treatments - Interested in marginal effects explicitly
Use standardization when: - Outcome mechanism is simple to model - Treatment assignment is complex - Efficiency is important - Natural to think about outcome modeling
Use both: - Doubly robust estimation combines both approaches - Agreement between methods is reassuring - Disagreement suggests model misspecification
Parametric models are approximations. How much does misspecification matter?
Reality: No model is exactly correct
Consequences:
Include product terms (interactions): - Between treatment and confounders: \(A \times L\) - Allows effect modification - Helps model fit in treated and untreated separately
Add polynomial terms: - Quadratic: \(L + L^2\) - Cubic: \(L + L^2 + L^3\) - Flexible fit for continuous \(L\)
Use flexible methods: - Splines - Generalized additive models - Machine learning methods (with care)
Model checking: - Residual plots - Goodness-of-fit tests - Cross-validation - Subject-matter knowledge
Simple model: \[E[Y \mid A, L] = \beta_0 + \beta_1 A + \beta_2 \text{Age} + \beta_3 \text{Sex} + \ldots\]
Flexible model: \[E[Y \mid A, L] = \beta_0 + \beta_1 A + \beta_2 \text{Age} + \beta_3 \text{Age}^2 + \beta_4 A \times \text{Age} + \ldots\]
Including \(A \times L\) interactions is especially important, as it allows the confounder-outcome relationship to differ between treated and untreated.
The g-formula extends naturally to continuous treatments.
Setting: Treatment \(A\) is continuous (e.g., dose, duration, intensity)
G-formula: \[E[Y^a] = E_L[E[Y \mid A = a, L]]\]
Same as before, but now \(a\) can be any value in the continuous range.
Estimation:
Definition 3 (Dose-Response Curve) The dose-response curve is the function \(a \mapsto E[Y^a]\) showing how the mean potential outcome varies with treatment level \(a\).
For continuous treatment, this is a smooth curve rather than discrete points.
Example: Effect of smoking intensity (cigarettes/day) on lung function
For binary outcomes, we can estimate causal risk ratios and risk differences using either approach.
Outcome model: Logistic regression (or log-binomial model)
\[\text{logit} \Pr[Y = 1 \mid A, L] = \beta_0 + \beta_1 A + \beta_2^{\top} L + \beta_3^{\top} (A \times L)\]
G-formula: \[\hat{\Pr}[Y^a = 1] = \frac{1}{n} \sum_{i=1}^n \text{expit}(\hat{\beta}_0 + \hat{\beta}_1 a + \hat{\beta}_2^{\top} L_i + \hat{\beta}_3^{\top} (a \times L_i))\]
where \(\text{expit}(x) = \frac{e^x}{1 + e^x}\).
Causal measures:
Marginal structural model:
For risk difference: \[\Pr[Y^a = 1] = \beta_0 + \beta_1 a\]
For risk ratio (log-binomial): \[\log \Pr[Y^a = 1] = \beta_0 + \beta_1 a\]
For odds ratio (logistic): \[\text{logit} \Pr[Y^a = 1] = \beta_0 + \beta_1 a\]
Estimation: Fit weighted model using IP weights
Caution: Logistic MSM models odds ratios, not risk ratios. For risk ratios, use log-binomial or Poisson models.
Key concepts introduced:
Relationship to IP weighting:
Practical advice: