Chapter 7: Confounding

In Chapter 3, we introduced exchangeability as a key identifiability condition. In Chapter 6, we learned to represent causal relationships using DAGs and introduced the backdoor criterion for identifying confounding. This chapter provides a detailed examination of confounding—the most common threat to validity in observational studies.

1 7.1 The Structure of Confounding (pp. 77-80)

Confounding occurs when a common cause of treatment and outcome creates a non-causal association between them.

Definition 1 (Confounding Structure) A variable \(L\) is a confounder of the effect of \(A\) on \(Y\) if:

  1. \(L\) causes \(A\) (or shares a common cause with \(A\))
  2. \(L\) causes \(Y\) (or shares a common cause with \(Y\))
  3. \(L\) is not affected by \(A\) (not a consequence of treatment)

Causal diagram representation:

L → A → Y
L → Y

The path \(A \leftarrow L \rightarrow Y\) is a backdoor path that creates non-causal association.

Common Confounding Scenarios

Example 1: Healthy worker bias

  • Healthier individuals are more likely to be employed (employed → more likely to be exposed at work)
  • Healthier individuals have better outcomes
  • Comparing employed vs. unemployed introduces confounding by health status

Example 2: Confounding by indication

  • Sicker patients receive more aggressive treatment
  • Sicker patients have worse outcomes
  • Treatment appears harmful when in fact it may be beneficial

2 7.2 Confounding and Exchangeability (pp. 80-82)

Confounding is equivalent to lack of (conditional) exchangeability.

No Confounding = Exchangeability

No confounding means: \[Y^a \perp\!\!\!\perp A \quad \text{for all } a\]

This is marginal exchangeability: the counterfactual outcomes are independent of treatment.

Confounding means exchangeability does not hold: \[Y^a \not\perp\!\!\!\perp A\]

The treated and untreated differ with respect to their potential outcomes.

Example 1 (Confounding and Exchangeability) Suppose exercise (\(A\)) affects heart disease (\(Y\)), and both are affected by age (\(L\)):

Without confounding:

  • Young and old people equally likely to exercise
  • \(E[Y^{a=1} | A = 1] = E[Y^{a=1} | A = 0]\) (exchangeable)

With confounding:

  • Younger people more likely to exercise
  • Younger people have lower baseline risk
  • \(E[Y^{a=1} | A = 1] \neq E[Y^{a=1} | A = 0]\) (not exchangeable)
  • Those who exercise would have had better outcomes even without exercising

Conditional Exchangeability

Even when marginal exchangeability fails, we may achieve conditional exchangeability by adjusting for confounders:

\[Y^a \perp\!\!\!\perp A \mid L \quad \text{for all } a\]

Within levels of \(L\), the treated and untreated are exchangeable.

3 7.3 Confounding and the Backdoor Criterion (pp. 82-85)

The backdoor criterion (Chapter 6) provides a graphical method for identifying confounding.

Backdoor Paths and Confounding

A backdoor path from \(A\) to \(Y\):

  • Starts with an arrow into \(A\) (i.e., \(\cdot \rightarrow A\))
  • Connects \(A\) to \(Y\) through any sequence of arrows

Confounding exists if backdoor paths are open (unblocked).

Example 2 (Identifying Confounders with the Backdoor Criterion) Diagram 1:

L → A → Y
L → Y

Backdoor path: \(A \leftarrow L \rightarrow Y\) Confounders: \(L\) Solution: Adjust for \(L\)

Diagram 2:

U → L → A → Y
      L → Y

Backdoor paths: \(A \leftarrow L \rightarrow Y\), \(A \leftarrow L \leftarrow U \rightarrow Y\) (if U causes Y) Confounders: \(L\) (and \(U\) if it affects \(Y\)) Solution: Adjust for \(L\) (and \(U\) if measured)

Diagram 3:

A → M → Y
L → A
L → Y

Backdoor path: \(A \leftarrow L \rightarrow Y\) Confounders: \(L\) Do NOT adjust for \(M\): \(M\) is a mediator (on the causal path), not a confounder

4 7.4 Confounding and Confounders (pp. 85-87)

The traditional definition of “confounder” in epidemiology differs slightly from the causal DAG perspective.

Traditional Confounder Definition

Traditionally, a variable \(L\) is considered a confounder if: 1. \(L\) is associated with treatment \(A\) 2. \(L\) is associated with outcome \(Y\) (among the untreated) 3. \(L\) is not affected by treatment \(A\)

DAG-Based Definition

From the DAG perspective, \(L\) is a confounder if:

  • \(L\) opens a backdoor path from \(A\) to \(Y\)

5 7.5 Single-World Intervention Graphs (pp. 87-89)

Single-World Intervention Graphs (SWIGs) are an extension of DAGs that explicitly represent interventions and counterfactual outcomes.

SWIGs vs. DAGs

  • Standard DAGs: Represent relationships among observed variables
  • SWIGs: Represent relationships among counterfactual variables under specified interventions

6 7.6 Confounding Adjustment (pp. 89-92)

Once confounders are identified, several methods can adjust for them.

Methods for Confounding Adjustment

  1. Stratification: Estimate effects within strata of \(L\), then combine (standardization)

  2. Regression adjustment: Include \(L\) as covariates in a regression model

  3. Inverse probability weighting: Weight by \(1/Pr[A | L]\) to create a pseudo-population where \(A\) and \(L\) are independent (Chapter 12)

  4. Matching: Match treated and untreated individuals on \(L\)

Example 3 (Comparing Adjustment Methods) Data: Effect of smoking (\(A\)) on lung cancer (\(Y\)), adjusting for age (\(L\))

Stratification:

  • Estimate effect separately for age = 40, 50, 60, 70
  • Combine using weighted average

Regression:

glm(Y ~ A + L, family = binomial())

IP weighting (Chapter 12):

weight <- 1 / predict(glm(A ~ L, family = binomial()), type = "response")
glm(Y ~ A, weights = weight, family = binomial())

Matching:

  • For each smoker, find non-smoker of same age
  • Compare outcomes

7 Summary

This chapter provided a detailed examination of confounding.

Key concepts:

  1. Confounding structure: Common causes of treatment and outcome create backdoor paths

  2. Exchangeability: Confounding = lack of exchangeability; conditional exchangeability can be achieved by adjusting for confounders

  3. Backdoor criterion: Provides a graphical method to identify which variables to adjust for

  4. DAG vs. traditional definitions: DAG-based confounding identification is preferred over association-based criteria

  5. Adjustment methods: Stratification, regression, IP weighting, and matching can all adjust for confounding

  6. Critical assumptions:

    • All confounders must be identified (no unmeasured confounding)
    • All confounders must be measured accurately
    • Adjustment must be done correctly

8 References

Hernán, Miguel A, and James M Robins. 2020. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://miguelhernan.org/whatifbook.