Chapter 18: Variable Selection for Causal Inference

This chapter addresses a critical question in causal inference: Which variables should we adjust for? Not all variables that predict the outcome should be included in causal models. Some variables, if adjusted for, can introduce bias rather than remove it. We provide guidance on variable selection using causal diagrams.

1 18.1 The Traditional Approach (pp. 265-267)

Traditional variable selection methods are designed for prediction, not causal inference.

Prediction vs Causal Inference

Prediction goal: Minimize prediction error for \(Y\) given covariates

  • Include any variable that improves prediction
  • Use criteria like AIC, BIC, cross-validation
  • More variables (if not overfitting) → better prediction

Causal inference goal: Estimate \(E[Y^a]\) or \(E[Y^{a=1}] - E[Y^{a=0}]\)

  • Include variables that remove confounding
  • Exclude variables that introduce bias
  • More variables ≠ better causal estimates

Stepwise Selection

Traditional approach: Stepwise regression (forward, backward, or both)

  • Add/remove variables based on statistical significance or information criteria
  • Maximize \(R^2\) or minimize AIC/BIC

Problem for causal inference:

  • May exclude important confounders (if weak predictors)
  • May include colliders or mediators (if strong predictors)
  • Ignores causal structure

Recommendation: Do not use stepwise selection for causal inference.

2 18.2 Confounding and Confounders (pp. 267-270)

What exactly is a confounder, and when should we adjust for it?

Definition 1 (Confounder (Formal Definition)) A variable \(L\) is a confounder for the effect of \(A\) on \(Y\) if:

  1. \(L\) is associated with treatment \(A\)
  2. \(L\) is a cause of outcome \(Y\)
  3. \(L\) is not affected by \(A\) (not a descendant of \(A\) on a causal DAG)

Causal criterion: \(L\) is on a backdoor path from \(A\) to \(Y\).

Backdoor Paths

Backdoor path: A path from \(A\) to \(Y\) that starts with an arrow into \(A\)

\[A \leftarrow L \to Y\]

Such paths create non-causal association between \(A\) and \(Y\).

Goal: Block all backdoor paths to eliminate confounding.

Sufficient Adjustment Sets

Definition 2 (Sufficient Adjustment Set) A set of variables \(L\) is sufficient for confounding adjustment if conditioning on \(L\) blocks all backdoor paths from \(A\) to \(Y\).

Equivalently: \((Y^a \perp\!\!\!\perp A \mid L)\) for all \(a\) (conditional exchangeability).

Multiple sufficient sets: There may be many sufficient adjustment sets. We want to choose one that:

  1. Blocks all backdoor paths (necessary)
  2. Doesn’t introduce new bias (important)
  3. Is measurable and measured

3 18.3 Confounding Adjustment (pp. 270-273)

When we adjust for a sufficient set, we remove confounding. But be careful about adjusting for too much.

Variables to Include

Confounders: Variables on backdoor paths

  • ✓ Include to block backdoor paths
  • These are causes of both treatment and outcome (or proxies thereof)

Example DAG: \[A \leftarrow L \to Y\]

Adjust for \(L\) to block the backdoor path.

Variables to Exclude

Mediators: Variables on the causal path from \(A\) to \(Y\)

  • ✗ Do NOT adjust (would remove part of the causal effect)

Example DAG: \[A \to M \to Y\]

If we adjust for \(M\), we block the causal path through \(M\).

Descendants of treatment: Variables affected by \(A\)

  • ✗ Usually do NOT adjust (may induce bias)

Colliders

Definition 3 (Collider) A collider on a path is a variable with two arrows pointing into it.

Example: \[A \to C \leftarrow U \to Y\]

\(C\) is a collider on the path \(A \to C \leftarrow U \to Y\).

Property: This path is blocked by default (without conditioning on \(C\)).

Danger: If we condition on \(C\) (or its descendants), we open the path, creating collider bias.

Rule: Do NOT adjust for colliders (unless necessary to block other paths).

4 18.4 Instrumental Variables and M-bias (pp. 273-276)

Some variables should not be adjusted for even if they’re associated with both treatment and outcome.

M-bias (Butterfly Bias)

DAG structure:

    U1 → L ← U2
     ↓         ↓
     A         Y

Properties:

  • \(L\) is associated with both \(A\) and \(Y\) (through \(U1\) and \(U2\))
  • \(L\) is a collider on the path \(A \leftarrow U1 \to L \leftarrow U2 \to Y\)
  • This path is blocked by default
  • But if we adjust for \(L\), we open this path!

Result: Adjusting for \(L\) introduces bias even though \(L\) is associated with both \(A\) and \(Y\).

Instrumental Variables Revisited

An instrumental variable \(Z\) satisfies:

Z → A → Y

with no backdoor paths from \(Z\) to \(Y\).

Should we adjust for \(Z\)?

  • If using IV methods: NO (use \(Z\) as instrument)
  • If using standard methods and \(Z\) is not a confounder: NO (unnecessary, may hurt efficiency)
  • If \(Z\) confounds some other relationship of interest: MAYBE

5 18.5 Confounders, Mediators, and Intermediate Confounders (pp. 276-279)

Time-varying treatments create new challenges for variable selection.

Time-Varying Confounding

Setting: Treatment varies over time (\(A_0, A_1, \ldots\)), as do confounders (\(L_0, L_1, \ldots\))

Time-varying confounder: \(L_1\) is a confounder for the effect of \(A_1\) on \(Y\)

Problem: If \(A_0\) affects \(L_1\), then:

  • \(L_1\) is a confounder (need to adjust)
  • \(L_1\) is a mediator (should not adjust in standard regression)

Intermediate Confounder

Definition 4 (Intermediate Confounder (Time-Dependent Confounder Affected by Prior Treatment)) A variable \(L_1\) is an intermediate confounder if:

  1. \(L_1\) is a confounder for the effect of \(A_1\) on \(Y\)
  2. \(L_1\) is affected by prior treatment \(A_0\)

DAG:

A_0 → L_1 → Y
 ↓     ↓
 A_1 → Y

Standard regression fails: Cannot correctly adjust for \(L_1\) using standard methods.

Solutions:

  • G-methods: Parametric g-formula, IP weighting, g-estimation (Part III)
  • These methods properly handle time-varying confounders affected by prior treatment

6 18.6 Selecting Variables for Precision (pp. 279-281)

After ensuring confounding is addressed, can we include additional variables to improve precision?

Precision Variables

Definition: Variables associated with the outcome but not with treatment (after accounting for confounders).

Example DAG:

A → Y ← V

\(V\) is associated with \(Y\) but not with \(A\) (no arrow from \(V\) to \(A\) or shared causes).

Effect of adjustment:

  • Does NOT affect bias (no confounding)
  • DOES improve precision (reduces residual variance)

Recommendation: Include precision variables to improve efficiency.

Instruments as Precision Variables?

Question: Should we include instrumental variables in outcome models?

Answer: Generally NO.

  • Instruments are associated with \(A\) but (by exclusion) not directly with \(Y\)
  • Including them in outcome models doesn’t improve precision
  • May slightly worsen precision due to additional parameters

Practical Strategy

  1. First priority: Include all variables needed to block backdoor paths (confounders)
  2. Second priority: Exclude colliders, mediators, and descendants of treatment
  3. Third priority: Consider including precision variables if they:
    • Strongly predict the outcome
    • Are not affected by treatment
    • Don’t introduce collinearity issues

7 18.7 Using Causal Diagrams (pp. 281-282)

Causal DAGs (directed acyclic graphs) are invaluable tools for variable selection.

Steps for Using DAGs

  1. Draw the DAG:

    • Represent your causal assumptions about relationships between variables
    • Include treatment, outcome, all measured covariates, and key unmeasured variables
    • Draw arrows representing direct causal effects
  2. Identify backdoor paths:

    • Find all paths from \(A\) to \(Y\) that start with an arrow into \(A\)
    • These are sources of confounding
  3. Find sufficient adjustment sets:

    • Identify sets of variables that block all backdoor paths
    • Avoid inducing collider bias
    • Use algorithms (e.g., dagitty R package) if DAG is complex
  4. Choose an adjustment set:

    • Select a sufficient set that is measured
    • Prefer simpler sets (fewer variables) when multiple options exist
    • Check for practical considerations (measurement error, missing data, etc.)

Software Tools

R package dagitty:

  • Define DAGs
  • Find adjustment sets automatically
  • Check conditional independencies implied by the DAG
  • Visualize DAGs

Example:

library(dagitty)
dag <- dagitty('dag {
  A -> Y
  L1 -> A
  L1 -> Y
  L2 -> Y
  U -> A
  U -> Y
}')
adjustmentSets(dag, exposure = "A", outcome = "Y")

8 Summary

Key principles for variable selection in causal inference:

  1. Use causal reasoning, not statistical criteria
  2. Adjust for confounders (variables on backdoor paths)
  3. Don’t adjust for mediators (variables on causal paths from treatment)
  4. Don’t adjust for colliders (unless necessary to block other paths)
  5. Don’t adjust for descendants of treatment (generally)
  6. Consider precision variables (if they don’t introduce bias)
  7. Use causal DAGs to guide selection

Variables to include:

  • ✓ Confounders (on backdoor paths from \(A\) to \(Y\))
  • ✓ Precision variables (predict \(Y\), not affected by \(A\), not colliders)
  • ✓ Proxies for unmeasured confounders

Variables to exclude:

  • ✗ Mediators (on causal path from \(A\) to \(Y\))
  • ✗ Colliders (except if needed to block backdoor paths)
  • ✗ Descendants of treatment (except special cases)
  • ✗ Instruments (when using standard methods)
  • ✗ Variables that induce M-bias

Special cases:

  • Intermediate confounders: Time-varying confounders affected by prior treatment
    • Cannot be handled by standard regression
    • Require g-methods (Part III)

Tools:

  • Causal DAGs: Represent causal structure graphically
  • Backdoor criterion: Identify sufficient adjustment sets
  • Software: dagitty, ggdag (R), DAGitty (web interface)

Common mistakes:

  1. Using stepwise selection or AIC/BIC for variable selection
  2. Adjusting for all predictors of the outcome
  3. Adjusting for mediators
  4. Conditioning on colliders
  5. Ignoring time-varying confounding

Practical workflow:

  1. Draw a causal DAG based on subject-matter knowledge
  2. Identify backdoor paths from treatment to outcome
  3. Find sufficient adjustment sets using the backdoor criterion
  4. Choose a set that is measured and practical
  5. Fit causal model adjusting for that set (using appropriate method)
  6. Conduct sensitivity analyses with alternative DAGs/adjustment sets
Hernán, Miguel A, and James M Robins. 2020. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://miguelhernan.org/whatifbook.