Chapter 14: G-Estimation of Structural Nested Models

This chapter introduces G-estimation, a method for estimating the parameters of structural nested models (SNMs). Unlike IP weighting and standardization, G-estimation does not directly model either the treatment mechanism or the outcome mechanism. Instead, it models the causal effect itself, making it robust to certain types of model misspecification.

1 14.1 The Structure of Structural Nested Models (pp. 189-192)

Structural nested models directly parameterize the causal effect rather than the mean outcome or treatment probability.

Definition 1 (Structural Nested Mean Model) A structural nested mean model (SNMM) specifies how the mean of \(Y^a\) differs from the mean of \(Y^{a'}\) as a function of treatment and covariates:

\[\text{E}{\left[Y^a - Y^{a'} \mid L\right]} = \gamma(a, a'; \psi, L)\]

For dichotomous treatment with \(a = 1\) and \(a' = 0\):

\[\text{E}{\left[Y^1 - Y^0 \mid L\right]} = \psi_0 + \psi_1^{\top} L\]

where \(\psi = (\psi_0, \psi_1)\) are the parameters of interest.

Comparison to Previous Approaches

Marginal structural model (IP weighting): \[\text{E}{\left[Y^a\right]} = \beta_0 + \beta_1 a\] Models the mean outcome under treatment \(a\).

Outcome regression (standardization): \[\text{E}{\left[Y \mid A, L\right]} = \beta_0 + \beta_1 A + \beta_2^{\top} L + \beta_3^{\top} (A \times L)\] Models the conditional mean outcome given treatment and confounders.

Structural nested model (G-estimation): \[\text{E}{\left[Y^1 - Y^0 \mid L\right]} = \psi_0 + \psi_1^{\top} L\] Models the conditional causal effect directly.

2 14.2 Rank Preservation (pp. 192-194)

G-estimation relies on the assumption that treatment affects everyone in the same direction (though possibly by different amounts).

Definition 2 (Rank Preservation) Rank preservation (also called monotonicity or no qualitative interaction) assumes:

If \(Y_i^1 > Y_j^1\), then \(Y_i^0 > Y_j^0\) for all individuals \(i, j\).

Equivalently: Treatment does not reverse the ranking of individuals with respect to the outcome.

Implications

Allowed under rank preservation: - Individual causal effects \(Y_i^1 - Y_i^0\) can differ across individuals - Some individuals can have large effects, others small effects - Effects can vary with covariates \(L\)

NOT allowed under rank preservation: - Treatment helps some individuals (\(Y_i^1 > Y_i^0\)) and harms others (\(Y_j^1 < Y_j^0\)) - Qualitative interactions where treatment reverses rankings

Example: Smoking Cessation and Weight

Rank preservation: - Some people gain more weight than others when quitting - But quitting increases weight for everyone (or at least doesn’t decrease it for anyone)

Violation: - Some people gain weight when quitting, others lose weight when quitting

3 14.3 The G-Null Hypothesis (pp. 194-196)

The key idea of G-estimation: under the null hypothesis of no causal effect with specific parameters, we can construct a pseudo-outcome that is independent of treatment.

Definition 3 (G-Null Hypothesis) For a given parameter value \(\psi\), define the G-null hypothesis \(H_0(\psi)\):

\[H_0(\psi): Y^1 - Y^0 = \psi_0 + \psi_1^{\top} L \text{ for all individuals}\]

Under rank preservation, this is equivalent to:

\[H_0(\psi): Y_i^1 - Y_i^0 = \psi_0 + \psi_1^{\top} L_i \text{ for all } i\]

Creating the Pseudo-Outcome

Under \(H_0(\psi)\), we can construct:

\[H(\psi) = Y - A(\psi_0 + \psi_1^{\top} L)\]

Key property: If \(H_0(\psi)\) is true, then:

\[H(\psi) = Y^0 \text{ for all individuals}\]

Since \(Y^0\) is the potential outcome under no treatment, it should be independent of actual treatment \(A\) given confounders \(L\):

\[H(\psi) \perp\!\!\!\perp A \mid L\]

4 14.4 Estimating the Causal Effect (pp. 196-199)

G-estimation finds the value of \(\psi\) that makes \(H(\psi)\) independent of \(A\) conditional on \(L\).

G-Estimation Algorithm

Step 1: Specify a structural nested model \[\text{E}{\left[Y^1 - Y^0 \mid L\right]} = \psi_0 + \psi_1^{\top} L\]

Step 2: For a candidate value \(\psi\), compute pseudo-outcome \[H(\psi) = Y - A(\psi_0 + \psi_1^{\top} L)\]

Step 3: Test whether \(H(\psi) \perp\!\!\!\perp A \mid L\) by fitting \[\text{E}{\left[H(\psi) \mid A, L\right]} = \alpha_0 + \alpha_1 A + \alpha_2^{\top} L\]

Step 4: The correct \(\psi\) is the one that makes \(\alpha_1 = 0\)

Step 5: In practice, solve the estimating equation: \[\sum_{i=1}^n A_i[Y_i - A_i(\psi_0 + \psi_1^{\top} L_i)] = 0\] or more generally: \[\sum_{i=1}^n U_i(\psi)[Y_i - A_i(\psi_0 + \psi_1^{\top} L_i)] = 0\] where \(U_i(\psi)\) is an appropriate function (often \(U_i = A_i\) or \(U_i = A_i(1, L_i)^{\top}\)).

Example: Simple Model

SNMM: \(\text{E}{\left[Y^1 - Y^0\right]} = \psi_0\) (constant effect)

Estimating equation: \[\sum_{i=1}^n A_i(Y_i - A_i \psi_0) = 0\]

Solution: \[\hat{\psi}_0 = \frac{\sum_i A_i Y_i}{\sum_i A_i^2} = \frac{\sum_i A_i Y_i}{n_1}\] where \(n_1 = \sum_i A_i\) is the number of treated individuals.

This is the mean outcome among the treated when there is no confounding.

5 14.5 G-Estimation with Model Misspecification (pp. 199-201)

G-estimation has robustness properties that differ from IP weighting and standardization.

Robustness Properties

When SNMM is correctly specified: - G-estimation is consistent even if \(\text{E}{\left[A \mid L\right]}\) is misspecified - Need to correctly model the effect \(\text{E}{\left[Y^1 - Y^0 \mid L\right]}\), not the full outcome model \(\text{E}{\left[Y \mid A, L\right]}\)

When treatment model is correctly specified: - G-estimation is consistent even if the effect model is misspecified in certain ways - Specific robustness depends on the choice of estimating function \(U(\psi)\)

Double robustness: - Some G-estimators are doubly robust: consistent if either the effect model or a working model for \(\text{E}{\left[H(\psi) \mid A, L\right]}\) is correct - This is similar to doubly robust IP weighted estimators

Comparison to Other Methods

Method	Requires Correct	Robust To
IP weighting	\(\Pr[A \mid L]\)	Outcome model misspec.
Standardization	\(\text{E}{\left[Y \mid A, L\right]}\)	Treatment model misspec.
G-estimation	\(\text{E}{\left[Y^1 - Y^0 \mid L\right]}\)	Full outcome model misspec.
Doubly robust	Either model	One model misspecification

6 14.6 Estimating the Average Causal Effect (pp. 201-202)

From the SNMM, we can compute the average causal effect.

From Conditional to Marginal Effects

SNMM: \(\text{E}{\left[Y^1 - Y^0 \mid L\right]} = \psi_0 + \psi_1^{\top} L\)

Average causal effect: \[\text{E}{\left[Y^1 - Y^0\right]} = E_L[\text{E}{\left[Y^1 - Y^0 \mid L\right]}] = E_L[\psi_0 + \psi_1^{\top} L] = \psi_0 + \psi_1^{\top} \text{E}{\left[L\right]}\]

Estimator: \[\widehat{\text{E}{\left[Y^1 - Y^0\right]}} = \hat{\psi}_0 + \hat{\psi}_1^{\top} \bar{L}\]

where \(\bar{L} = n^{-1} \sum_i L_i\) is the sample mean of \(L\).

Effect in Specific Subgroups

Effect at \(L = \ell\): \[\text{E}{\left[Y^1 - Y^0 \mid L = \ell\right]} = \psi_0 + \psi_1^{\top} \ell\]

Effect in the treated (ATT): \[\text{E}{\left[Y^1 - Y^0 \mid A = 1\right]} = \psi_0 + \psi_1^{\top} \text{E}{\left[L \mid A = 1\right]}\]

Estimator for ATT: \[\widehat{\text{E}{\left[Y^1 - Y^0 \mid A = 1\right]}} = \hat{\psi}_0 + \hat{\psi}_1^{\top} \bar{L}_{A=1}\]

where \(\bar{L}_{A=1}\) is the mean of \(L\) among the treated.

7 14.7 Structural Nested Models with Two or More Parameters (pp. 202-204)

SNMMs can include multiple effect modifiers.

General SNMM

\[\text{E}{\left[Y^1 - Y^0 \mid L\right]} = \psi_0 + \psi_1 L_1 + \psi_2 L_2 + \psi_3 L_1 L_2 + \ldots\]

Parameters: \(\psi = (\psi_0, \psi_1, \psi_2, \psi_3, \ldots)\)

Estimating equations: Need as many equations as parameters

\[\sum_{i=1}^n U_{ij}(\psi)[Y_i - A_i(\psi_0 + \psi_1 L_{i1} + \psi_2 L_{i2} + \ldots)] = 0\]

for \(j = 1, 2, \ldots, p\) where \(p\) is the number of parameters.

Choice of Estimating Functions

Common choices for \(U_{ij}\):

Simple: \(U_i = (A_i, A_i L_{i1}, A_i L_{i2}, \ldots)^{\top}\)
Optimal: \(U_i = (A_i - \text{E}{\left[A \mid L_i\right]})(1, L_{i1}, L_{i2}, \ldots)^{\top}\)
Doubly robust: More complex functions that achieve double robustness

The choice affects: - Efficiency (variance of estimator) - Robustness properties - Computational complexity

8 14.8 Censoring and Missing Data (pp. 204-206)

G-estimation extends to handle censoring and missing outcomes.

Censoring Weights

Let \(C = 1\) if censored, \(C = 0\) if observed.

Assumption: \(C \perp\!\!\!\perp Y^a \mid A, L\) (censoring independent of potential outcomes given treatment and covariates)

Weighted estimating equation:

\[\sum_{i: C_i = 0} \frac{1}{\Pr[C_i = 0 \mid A_i, L_i]} U_i(\psi)[Y_i - A_i \gamma(A_i, 0; \psi, L_i)] = 0\]

This weights each uncensored observation by the inverse probability of being uncensored.

Joint Treatment and Censoring Weights

When we have both confounding and censoring:

\[\sum_{i: C_i = 0} W_i U_i(\psi)[Y_i - A_i \gamma(A_i, 0; \psi, L_i)] = 0\]

where:

\[W_i = \frac{1}{\Pr[A_i \mid L_i] \times \Pr[C_i = 0 \mid A_i, L_i]}\]

Or using stabilized weights for improved stability.

9 14.9 Marginal vs Conditional Effects (pp. 206)

G-estimation naturally estimates conditional effects \(\text{E}{\left[Y^1 - Y^0 \mid L\right]}\). We can average to get marginal effects.

Three Types of Effects

Marginal effect (population average): \[\text{E}{\left[Y^1 - Y^0\right]}\]

Conditional effect (within levels of \(L\)): \[\text{E}{\left[Y^1 - Y^0 \mid L\right]}\]

Individual effect: \[Y_i^1 - Y_i^0\]

Methods and Natural Estimands

Method	Natural Estimand	To Get Other Estimands
IP weighting	Marginal effect	Model \(\text{E}{\left[Y^a \mid V\right]}\) for conditional
Standardization	Conditional effect	Average over \(L\) for marginal
G-estimation	Conditional effect	Average over \(L\) for marginal

Advantage of SNMMs: By modeling \(\text{E}{\left[Y^1 - Y^0 \mid L\right]}\) directly, G-estimation provides natural inference for effect modification while still allowing marginal effect estimation.

10 Summary

Key concepts introduced:

Structural nested models: Model the causal effect directly rather than the outcome or treatment mechanism
Rank preservation: Assumption that treatment doesn’t reverse individual rankings
G-null hypothesis: Under the correct parameters, the pseudo-outcome \(H(\psi)\) equals \(Y^0\)
G-estimation: Find \(\psi\) that makes \(H(\psi)\) independent of \(A\) given \(L\)
Robustness: G-estimation is robust to outcome model misspecification (requires effect model to be correct)
Effect modification: SNMMs naturally model how effects vary with covariates
Censoring: G-estimation extends to handle missing data via inverse probability weighting

Comparison of methods:

Aspect	IP Weighting	Standardization	G-Estimation
Models	Treatment mechanism	Outcome mechanism	Causal effect
Estimand	Marginal effect	Conditional effect	Conditional effect
Assumptions	Conditional exchangeability	Conditional exchangeability	+ Rank preservation
Robustness	Treatment model	Outcome model	Effect model

Advantages of G-estimation:

Models the quantity of scientific interest (the causal effect) directly
Robust to certain outcome model misspecifications
Natural for effect modification
Can be doubly robust

Limitations:

Requires rank preservation (stronger than exchangeability alone)
Can be computationally more intensive
Less familiar to many practitioners
Requires solving estimating equations (no closed form in general)

Hernán, Miguel A, and James M Robins. 2020. Causal Inference: What If. Chapman & Hall/CRC. https://miguelhernan.org/whatifbook.