Chapter 12: IP Weighting and Marginal Structural Models
This chapter introduces inverse probability (IP) weighting, a method for estimating causal effects that creates a pseudo-population in which treatment is independent of measured confounders. IP weighting is used to fit marginal structural models, which provide a natural framework for estimating marginal causal effects when treatment and confounding vary over time.
This chapter is based on Hernán and Robins (2020, chap. 12, pp. 157-174).
Key concepts: IP weighting creates a pseudo-population where confounding is eliminated by construction. This approach is particularly powerful for time-varying treatments and confounders, which will be explored further in Part III.
1 12.1 The Causal Question (pp. 157-159)
We return to the NHEFS study to estimate the average causal effect of quitting smoking on weight gain.
1.1 Research Question
Population: 1,566 cigarette smokers from NHEFS who had a baseline visit and were seen again approximately 10 years later.
Treatment: \(A = 1\) if quit smoking between visits, \(A = 0\) if continued smoking
Outcome: \(Y\) = weight change in kg between visits
Causal estimand: \[E[Y^{a=1}] - E[Y^{a=0}]\]
The average treatment effect of smoking cessation on weight gain.
1.2 Measured Confounders
We have measured baseline covariates \(L\) that may confound the relationship:
- Sex
- Age
- Race
- Education
- Intensity and duration of smoking
- Physical activity
- Weight and weight change in past year
- Other lifestyle and health factors
Assumption: Conditional exchangeability given \(L\): \[Y^a \perp\!\!\!\perp A \mid L\]
Why we need adjustment: People who quit smoking differ systematically from those who continue. For example, those who quit may be more health-conscious, may have experienced health scares, or may differ in baseline weight. These factors could also affect weight gain independently of smoking cessation.
The assumption of conditional exchangeability says that within levels of the measured covariates \(L\), treatment assignment is as good as random with respect to the potential outcomes.
2 12.2 Estimating IP Weights (pp. 159-163)
The core idea of IP weighting is to create a pseudo-population by weighting each individual by the inverse of their probability of receiving the treatment they actually received.
2.1 IP Weights Definition
Definition 1 (Inverse Probability Weights) For individual \(i\), the IP weight is:
\[W^A_i = \frac{1}{f(A_i \mid L_i)}\]
where \(f(A_i \mid L_i) = \Pr[A = A_i \mid L = L_i]\) is the propensity score - the probability of receiving the treatment actually received, given confounders.
For a dichotomous treatment:
- If \(A_i = 1\): \(W^A_i = \frac{1}{\Pr[A = 1 \mid L_i]}\)
- If \(A_i = 0\): \(W^A_i = \frac{1}{\Pr[A = 0 \mid L_i]} = \frac{1}{1 - \Pr[A = 1 \mid L_i]}\)
2.2 Estimating Propensity Scores
Step 1: Fit a model for \(\Pr[A = 1 \mid L]\)
For dichotomous treatment, use logistic regression:
\[\text{logit}\Pr[A = 1 \mid L] = \beta_0 + \beta_1 L_1 + \beta_2 L_2 + \ldots + \beta_p L_p\]
Step 2: Predict \(\hat{f}(A_i \mid L_i)\) for each individual
- For \(A_i = 1\): \(\hat{f}(1 \mid L_i) = \hat{\Pr}[A = 1 \mid L_i]\)
- For \(A_i = 0\): \(\hat{f}(0 \mid L_i) = 1 - \hat{\Pr}[A = 1 \mid L_i]\)
Step 3: Calculate IP weights
\[\hat{W}^A_i = \frac{1}{\hat{f}(A_i \mid L_i)}\]
Why this works: In the pseudo-population created by IP weighting, each individual is weighted by how “surprising” their treatment assignment was given their covariates.
- Individuals with \(\Pr[A \mid L]\) close to 1 receive small weights (their treatment was expected)
- Individuals with \(\Pr[A \mid L]\) close to 0 receive large weights (their treatment was unexpected)
This reweighting creates a pseudo-population in which treatment is independent of \(L\).
2.3 Example: NHEFS Data
In the NHEFS study:
Propensity score model: Logistic regression including sex, age, race, education, smoking intensity, smoking duration, exercise, weight, etc.
Typical weights: - Median weight: approximately 1.0 - Range: 0.3 to 16.7 - Mean: approximately 1.0 (by construction in simple settings)
Some individuals have very large weights, indicating their treatment was unusual given their covariates.
3 12.3 Stabilized IP Weights (pp. 163-165)
Standard IP weights can have extreme values, leading to unstable estimates. Stabilized weights reduce variability.
Definition 2 (Stabilized IP Weights) \[SW^A = \frac{f(A)}{f(A \mid L)}\]
where \(f(A) = \Pr[A]\) is the marginal probability of treatment.
For dichotomous \(A\):
- If \(A = 1\): \(SW^A = \frac{\Pr[A = 1]}{\Pr[A = 1 \mid L]}\)
- If \(A = 0\): \(SW^A = \frac{\Pr[A = 0]}{\Pr[A = 0 \mid L]} = \frac{1 - \Pr[A = 1]}{1 - \Pr[A = 1 \mid L]}\)
3.1 Properties of Stabilized Weights
Advantages: 1. Mean is exactly 1.0 2. Smaller range than unstandardized weights 3. More stable variance estimates 4. Still create pseudo-population with \(A \perp\!\!\!\perp L\)
Estimation: - Numerator: Fit model for \(\Pr[A = 1]\) (intercept-only logistic regression) - Denominator: Same as unstabilized weights
3.2 Example: NHEFS Data
Stabilized weights: - Median: approximately 1.0 - Range: 0.3 to 13.3 (compared to 0.3 to 16.7 for unstabilized) - Mean: exactly 1.0
Intuition: Stabilized weights still create a pseudo-population where treatment is independent of confounders, but they “shrink” extreme weights toward 1.0. The numerator ensures that the marginal distribution of treatment is preserved, while the denominator still removes the association between treatment and confounders.
4 12.4 Marginal Structural Models (pp. 165-169)
IP weighting is used to fit marginal structural models - models for the marginal distribution of the potential outcomes.
Definition 3 (Marginal Structural Model) A marginal structural model (MSM) is a model for the marginal mean of the potential outcome \(Y^a\) as a function of treatment \(a\) (and possibly other variables):
\[E[Y^a] = \beta_0 + \beta_1 a\]
For dichotomous \(A\), parameter \(\beta_1\) equals the average causal effect:
\[\beta_1 = E[Y^{a=1}] - E[Y^{a=0}]\]
4.1 Fitting Marginal Structural Models
Procedure:
- Estimate IP weights \(\hat{SW}^A\) for all individuals
- Fit a weighted regression model:
- Outcome: \(Y\)
- Predictor: \(A\)
- Weights: \(\hat{SW}^A\)
- The coefficient of \(A\) estimates the marginal causal effect
Important: The model is fit using the observed data \((A, Y)\), but weighted by IP weights. This approximates what we would see if we fit an unweighted model in the pseudo-population.
4.2 Example: NHEFS Study
MSM: \[E[Y^a] = \beta_0 + \beta_1 a\]
Weighted linear regression:
# Pseudo-code
fit <- lm(weight_change ~ quit_smoking,
weights = stabilized_weights,
data = nhefs)Results: - \(\hat{\beta}_1 \approx 3.4\) kg (95% CI: 2.4, 4.5) - Interpretation: Quitting smoking causes an average weight gain of 3.4 kg
Comparison to stratification:
- IP weighting estimates the marginal effect: \(E[Y^{a=1}] - E[Y^{a=0}]\)
- Stratification + standardization can estimate the same quantity
- IP weighting and standardization are mathematically equivalent under certain conditions
- IP weighting is often more convenient, especially for continuous confounders or time-varying treatments
5 12.5 Effect Modification and Marginal Structural Models (pp. 169-171)
MSMs can model effect modification by including interactions with baseline covariates.
5.1 MSM with Effect Modification
Definition 4 (MSM with Effect Modifier) To assess effect modification by variable \(V\):
\[E[Y^a \mid V] = \beta_0 + \beta_1 a + \beta_2 V + \beta_3 a \times V\]
where \(\beta_3\) quantifies effect modification:
- If \(\beta_3 \neq 0\), the causal effect differs across levels of \(V\)
- The causal effect at \(V = v\) is \(\beta_1 + \beta_3 v\)
5.2 Fitting MSMs with Effect Modification
Procedure:
- Estimate IP weights as before (adjustment set includes \(V\) and other confounders)
- Fit weighted regression including \(A\), \(V\), and \(A \times V\)
- Interpret \(\beta_3\) as the change in causal effect per unit increase in \(V\)
5.3 Example: Effect Modification by Sex
MSM: \[E[Y^a \mid \text{Sex}] = \beta_0 + \beta_1 a + \beta_2 \text{Sex} + \beta_3 a \times \text{Sex}\]
Results (hypothetical): - \(\hat{\beta}_1 = 2.5\) kg (effect in men) - \(\hat{\beta}_3 = 1.8\) kg (additional effect in women) - Effect in women: \(2.5 + 1.8 = 4.3\) kg
Important distinction:
- The MSM \(E[Y^a \mid V]\) models the mean of the potential outcome within levels of \(V\)
- This is a conditional causal effect (conditional on \(V\))
- \(V\) appears in the model, but we still adjust for confounders \(L\) through IP weighting
- \(V\) must not be affected by treatment (must be a pre-treatment variable)
6 12.6 Censoring and Missing Data (pp. 171-173)
IP weighting can also handle censoring and missing data under appropriate assumptions.
6.1 IP Weights for Censoring
Let \(C = 1\) if censored (data missing), \(C = 0\) if uncensored (data observed).
IP weight for censoring:
\[W^C = \frac{1}{\Pr[C = 0 \mid A, L]}\]
These weights create a pseudo-population of only uncensored individuals.
6.2 Joint Weights for Treatment and Censoring
When we have both confounding and censoring:
\[W^{A,C} = W^A \times W^C = \frac{1}{\Pr[A \mid L]} \times \frac{1}{\Pr[C = 0 \mid A, L]}\]
Stabilized version:
\[SW^{A,C} = \frac{\Pr[A]}{\Pr[A \mid L]} \times \frac{\Pr[C = 0 \mid A]}{\Pr[C = 0 \mid A, L]}\]
6.3 Example: NHEFS with Loss to Follow-up
Setting: Some individuals lost to follow-up by the second visit
Assumption: Censoring is independent of potential outcomes given \((A, L)\):
\[C \perp\!\!\!\perp Y^a \mid A, L\]
Procedure:
- Fit model for \(\Pr[C = 0 \mid A, L]\) among all individuals
- Calculate censoring weights for uncensored individuals
- Multiply by treatment weights: \(SW^{A,C}\)
- Fit weighted MSM using uncensored data only
Assumptions required:
- Treatment: \(Y^a \perp\!\!\!\perp A \mid L\) (conditional exchangeability)
- Censoring: \(C \perp\!\!\!\perp Y^a \mid A, L\) (missing at random)
- Positivity: \(\Pr[A = a \mid L] > 0\) and \(\Pr[C = 0 \mid A, L] > 0\) for all \(a, L\)
Missing at random (MAR) means censoring may depend on observed variables but not on the (unobserved) outcome. This is weaker than missing completely at random (MCAR).
7 12.7 A Likelihood Approach (pp. 173-174)
IP weighting can be viewed through the lens of likelihood theory.
7.1 Weighted Likelihood
The IP weighted estimator solves the weighted estimating equations:
\[\sum_{i=1}^n W^A_i \times \frac{\partial \log f(Y_i \mid A_i; \beta)}{\partial \beta} = 0\]
This is equivalent to maximizing a weighted likelihood:
\[L_W(\beta) = \prod_{i=1}^n [f(Y_i \mid A_i; \beta)]^{W^A_i}\]
7.2 Connection to Maximum Likelihood
Without confounding: Standard MLE of \(\beta\) in model \(E[Y \mid A] = g(A; \beta)\)
With confounding: IP weighted MLE of \(\beta\) in MSM \(E[Y^a] = g(a; \beta)\)
The IP weights “adjust” the likelihood to account for confounding.
Theoretical advantages:
- IP weighted estimators have well-studied asymptotic properties
- Can derive standard errors from weighted likelihood theory
- Provides a unified framework for various causal inference methods
Practical note: Most software provides sandwich (robust) standard errors for IP weighted analyses, which account for: - Uncertainty in IP weight estimation - Model misspecification - Clustering
8 Summary
Key concepts introduced:
- Inverse probability weighting: Create a pseudo-population where treatment is independent of confounders
- Stabilized weights: Reduce variability while maintaining confounding control
- Marginal structural models: Models for the mean potential outcome as a function of treatment
- Effect modification in MSMs: Include interactions to assess heterogeneity
- Censoring weights: Handle missing data under MAR assumption
- Weighted likelihood: Theoretical foundation for IP weighting
Advantages of IP weighting: - Natural for marginal effects - Handles continuous confounders easily - Extends naturally to time-varying treatments (Part III) - Can combine treatment and censoring weights
Limitations: - Requires correct specification of treatment model - Can be unstable with extreme weights - Positivity violations lead to extreme weights - Efficiency loss compared to outcome modeling (when that model is correct)
Looking ahead: Part III will extend these ideas to time-varying treatments and confounders, where IP weighting and MSMs are particularly powerful. We’ll see that handling time-varying confounding affected by prior treatment is impossible with standard regression, but natural with IP weighting.