This chapter explores outcome regression and propensity scores in greater depth, clarifying their roles in causal inference. We examine when simple regression adjustment is sufficient, when it fails, and how propensity scores can be used for confounding adjustment through matching, stratification, or weighting.
Outcome regression estimates causal effects by modeling the outcome as a function of treatment and confounders.
Definition 1 (Outcome Regression) Outcome regression for causal inference:
For simple cases, the treatment coefficient may approximate the causal effect, but this requires strong assumptions.
Model: \(E[Y \mid A, L] = \beta_0 + \beta_1 A + \beta_2^{\top} L\)
Question: When does \(\beta_1 = E[Y^{a=1}] - E[Y^{a=0}]\)?
Answer: Only under restrictive conditions:
If effect modification exists, \(\beta_1\) is a weighted average of conditional effects, not generally equal to the marginal causal effect.
Simple model: \[E[\text{Weight Change} \mid A, L] = \beta_0 + \beta_1 \text{Quit} + \beta_2 \text{Age} + \beta_3 \text{Sex} + \ldots\]
Issues:
Better approach: \[E[Y \mid A, L] = \beta_0 + \beta_1 A + \beta_2^{\top} L + \beta_3^{\top} (A \times L)\]
Then use g-formula to compute marginal effect.
The propensity score is the probability of receiving treatment given confounders. It plays a central role in observational studies.
Definition 2 (Propensity Score) The propensity score is:
\[e(L) = \Pr[A = 1 \mid L]\]
For individual \(i\) with covariates \(L_i\), the propensity score is \(e(L_i) = \Pr[A = 1 \mid L = L_i]\).
Key theorem: If \(Y^a \perp\!\!\!\perp A \mid L\), then:
\[Y^a \perp\!\!\!\perp A \mid e(L)\]
Interpretation: Conditional on the propensity score, treatment assignment is independent of potential outcomes.
Implication: We can adjust for confounding by adjusting for the propensity score alone, rather than all components of \(L\).
Common approach: Logistic regression
\[\text{logit} \Pr[A = 1 \mid L] = \alpha_0 + \alpha_1^{\top} L\]
Estimation:
Model selection:
Propensity scores can be used to stratify the population and then standardize.
Procedure:
\[\hat{E}[Y^a] = \sum_{s=1}^S \hat{E}[Y \mid A = a, \text{stratum } s] \times \Pr[\text{stratum } s]\]
After stratification, check whether confounders are balanced within strata:
Balance: Within stratum \(s\), the distribution of \(L\) should be similar for treated and untreated.
Diagnostics:
If balance is poor, refine the propensity score model (add interactions, polynomials, etc.).
Steps:
Common finding: Most of the confounding is removed by stratifying on propensity score quintiles, though finer stratification may improve balance.
Propensity score matching creates pairs (or sets) of treated and untreated individuals with similar propensity scores.
1-to-1 nearest neighbor matching:
Matching with replacement:
Caliper matching:
After matching, assess balance:
Target: Standardized differences < 0.1 for all confounders
Procedure:
Advantages: Intuitive, allows checking balance on all confounders
Disadvantages: Discards some individuals, may not achieve perfect balance
Clarifying terminology: propensity scores, treatment models, and MSMs.
Propensity score: \(e(L) = \Pr[A = 1 \mid L]\)
Treatment model: Any model for \(\Pr[A \mid L]\) (or \(f(A \mid L)\) for non-binary \(A\))
Marginal structural model (MSM): Model for \(E[Y^a]\) or \(E[Y^a \mid V]\)
IP weighting uses the treatment model to create weights:
\[W^A = \frac{1}{\Pr[A \mid L]}\]
For binary \(A\), this uses the propensity score:
\[W^A = \frac{1}{e(L)} \text{ if } A = 1, \quad W^A = \frac{1}{1 - e(L)} \text{ if } A = 0\]
MSM is then fit using IP weights:
\[E[Y^a] = \beta_0 + \beta_1 a\]
Can we combine propensity scores with outcome regression?
Idea: Use both a treatment model and an outcome model.
Estimator: Fit outcome model within propensity score strata (or matched sets), then standardize.
Double robustness: The estimator is consistent if EITHER:
(But not necessarily both)
Advanced approach: Combine IP weighting with outcome modeling:
\[\hat{E}[Y^a] = \frac{1}{n}\sum_{i=1}^n \left[\frac{I(A_i = a) Y_i}{f(a \mid L_i)} - \frac{I(A_i = a) - f(a \mid L_i)}{f(a \mid L_i)} m(a, L_i)\right]\]
where \(m(a, L) = \hat{E}[Y \mid A = a, L]\) is the outcome model.
Properties:
Propensity scores extend to continuous treatments, though with additional complexity.
For continuous treatment \(A\), the generalized propensity score is the conditional density:
\[f(A \mid L)\]
Balancing property: Under conditional exchangeability,
\[Y^a \perp\!\!\!\perp A \mid f(A \mid L)\]
Common approach: Model the conditional distribution of \(A\) given \(L\).
Example: Normal model
\[A \mid L \sim \text{Normal}(\mu(L), \sigma^2)\]
where \(\mu(L) = \alpha_0 + \alpha_1^{\top} L\)
GPS: \(f(A_i \mid L_i) = \phi\left(\frac{A_i - \mu(L_i)}{\sigma}\right)\) where \(\phi\) is the standard normal density.
IP weighting: Create weights
\[W_i = \frac{f(A_i)}{f(A_i \mid L_i)}\]
where \(f(A_i)\) is the marginal density of \(A\) (unconditional).
Stratification: Stratify on the GPS and standardize within strata.
Key concepts:
Methods comparison:
| Method | Uses | Advantages | Disadvantages |
|---|---|---|---|
| Outcome regression | \(E[Y \mid A, L]\) | Natural, efficient when correct | Requires correct outcome model |
| IP weighting | \(\Pr[A \mid L]\) | Natural for MSMs, handles time-varying | Can be unstable, needs correct treatment model |
| Propensity matching | \(e(L)\) | Intuitive, easy to check balance | Discards data, complex inference |
| Propensity stratification | \(e(L)\) | Reduces dimensionality | Requires choosing # of strata |
| Doubly robust | Both models | Robust to one misspecification | More complex, needs both models |
Practical recommendations: