In Chapter 1, we concluded that association is not causation, and we asked under which conditions we could use data to estimate causal effects. This chapter provides one answer: conduct a randomized experiment. We use randomized experiments to illustrate fundamental concepts of causal inference, including exchangeability and the distinction between conditional and unconditional effects.
Suppose we conduct a randomized experiment using Zeus’s family as our study population. To determine who receives a heart transplant, Zeus flips a (fair) coin for each of the 20 individuals. If heads, the individual receives a heart transplant (\(A = 1\)); if tails, they do not (\(A = 0\)).
Table 2.1 shows the observed data from this randomized experiment.
Table 2.1: Observed treatment and outcome in a randomized experiment in Zeus’s family
| Name | \(A\) | \(Y\) | Name | \(A\) | \(Y\) |
|---|---|---|---|---|---|
| Rheia | 0 | 0 | Leto | 0 | 0 |
| Kronos | 0 | 1 | Ares | 1 | 1 |
| Demeter | 0 | 0 | Athena | 1 | 1 |
| Hades | 0 | 0 | Hephaestus | 1 | 1 |
| Hestia | 1 | 0 | Aphrodite | 1 | 1 |
| Poseidon | 1 | 0 | Polyphemus | 1 | 1 |
| Hera | 1 | 0 | Persephone | 1 | 1 |
| Zeus | 1 | 1 | Hermes | 1 | 0 |
| Artemis | 0 | 1 | Hebe | 1 | 0 |
| Apollo | 0 | 1 | Dionysus | 1 | 0 |
From Table 2.1, we can compute associational measures:
The key feature of a randomized experiment is that treatment assignment is independent of an individual’s baseline characteristics, including their potential outcomes. We say the treated and untreated are exchangeable when:
\[Y^a \perp\!\!\!\perp A \quad \text{for all } a\]
This notation means the counterfactual outcome \(Y^a\) is independent of the treatment actually received \(A\). Exchangeability implies:
\[Pr[Y^a = 1|A = 1] = Pr[Y^a = 1|A = 0] = Pr[Y^a = 1]\]
for all values of \(a\).
Definition 1 (Exchangeability) Treatment groups are exchangeable if the distribution of counterfactual outcomes is the same in both groups:
\[Y^a \perp\!\!\!\perp A \quad \text{for all } a\]
Under exchangeability, individuals in the treated group would have experienced the same distribution of outcomes if they had remained untreated (and vice versa).
In our randomized experiment, each individual had a 0.5 probability of receiving treatment, determined by a fair coin flip. This random assignment ensures that treatment is independent of all variables—whether measured or unmeasured, known or unknown.
Specifically, randomization ensures:
\[Pr[A = 1|Y^{a=1} = 1, Y^{a=0} = 1] = Pr[A = 1|Y^{a=1} = 0, Y^{a=0} = 0] = Pr[A = 1] = 0.5\]
The probability of treatment is the same regardless of an individual’s potential outcomes. Therefore, the treated and untreated are exchangeable by design.
When exchangeability holds, we can identify the average causal effect from observed data:
\[E[Y^{a=1}] - E[Y^{a=0}] = E[Y|A = 1] - E[Y|A = 0]\]
The right side is the observed associational difference, which we can compute from Table 2.1. Under the exchangeability guaranteed by randomization, this associational difference equals the causal effect.
In our example:
The discrepancy is due to random variability from the coin flips, not systematic bias.
Not all randomized experiments use simple randomization with equal treatment probabilities. Often, randomization is conditional on measured covariates.
Consider a modified version of our experiment. Suppose Zeus decides to ensure exactly half the women and half the men receive transplants. He creates two strata: women and men. Within each stratum, he randomly assigns half to treatment.
Table 2.2 shows a possible outcome from this stratified randomization.
Table 2.2: Observed treatment and outcome in a conditionally randomized experiment in Zeus’s family
| Name | Sex \(L\) | \(A\) | \(Y\) | Name | Sex \(L\) | \(A\) | \(Y\) |
|---|---|---|---|---|---|---|---|
| Rheia | 0 | 0 | 0 | Leto | 0 | 0 | 0 |
| Demeter | 0 | 0 | 0 | Athena | 0 | 1 | 1 |
| Hestia | 0 | 1 | 0 | Aphrodite | 0 | 1 | 1 |
| Hera | 0 | 1 | 0 | Persephone | 0 | 1 | 1 |
| Artemis | 0 | 0 | 1 | Hebe | 0 | 0 | 1 |
| Kronos | 1 | 0 | 1 | Ares | 1 | 1 | 1 |
| Hades | 1 | 0 | 0 | Hephaestus | 1 | 1 | 1 |
| Poseidon | 1 | 0 | 1 | Polyphemus | 1 | 1 | 1 |
| Zeus | 1 | 1 | 1 | Hermes | 1 | 0 | 0 |
| Apollo | 1 | 0 | 1 | Dionysus | 1 | 1 | 0 |
In a conditionally randomized experiment, exchangeability holds within each stratum but not necessarily overall. We have:
\[Y^a \perp\!\!\!\perp A | L \quad \text{for all } a\]
This is conditional exchangeability (or conditional randomization).
Definition 2 (Conditional Exchangeability) Treatment groups are conditionally exchangeable given covariates \(L\) if:
\[Y^a \perp\!\!\!\perp A | L \quad \text{for all } a\]
Within each stratum of \(L\), the treated and untreated are exchangeable.
Conditional exchangeability means:
\[Pr[Y^a = 1|A = 1, L = l] = Pr[Y^a = 1|A = 0, L = l] = Pr[Y^a = 1|L = l]\]
for all values of \(a\) and \(l\).
Conditional randomization serves several purposes:
Common forms include:
When exchangeability is conditional rather than marginal, we cannot simply compare outcomes between treatment groups. We must account for the stratification variable \(L\).
Under conditional exchangeability \(Y^a \perp\!\!\!\perp A | L\), the average causal effect can be computed using standardization:
\[E[Y^a] = \sum_l E[Y|A = a, L = l] \times Pr[L = l]\]
This formula weights the stratum-specific mean outcomes by the population distribution of \(L\).
Definition 3 (Standardization Formula) Under conditional exchangeability \(Y^a \perp\!\!\!\perp A | L\), the mean counterfactual outcome is:
\[E[Y^a] = \sum_l E[Y|A = a, L = l] \times Pr[L = l]\]
where the sum is over all values \(l\) of the covariate \(L\).
Using Table 2.2, let’s compute the standardized causal risk difference.
Step 1: Compute stratum-specific risks
Among women (\(L = 0\)):
Among men (\(L = 1\)):
Step 2: Compute population distribution of \(L\)
Step 3: Apply standardization formula
\[\begin{align} E[Y^{a=1}] &= 0.6 \times 0.5 + 0.8 \times 0.5 = 0.7 \\ E[Y^{a=0}] &= 0.4 \times 0.5 + 0.6 \times 0.5 = 0.5 \end{align}\]
Step 4: Compute causal effect
\[E[Y^{a=1}] - E[Y^{a=0}] = 0.7 - 0.5 = 0.2\]
The standardized causal risk difference is 0.2. In this particular example, this equals the crude (unstandardized) associational difference: \((7/10) - (5/10) = 0.2\).
The standardized mean \(E[Y^{a=1}]\) answers: “What would the mean outcome be if everyone in the population received treatment \(a = 1\)?” It’s a weighted average of the stratum-specific mean outcomes, where weights are the proportion of the population in each stratum.
Similarly, \(E[Y^{a=0}]\) answers: “What would the mean outcome be if everyone in the population received treatment \(a = 0\)?”
The difference \(E[Y^{a=1}] - E[Y^{a=0}]\) is the average causal effect in the population.
Standardization is not the only method to adjust for conditional randomization. An alternative approach is inverse probability weighting (IPW).
The idea of IPW is to create a pseudo-population in which treatment is marginally randomized (unconditional). In this pseudo-population, exchangeability holds marginally, so we can estimate the causal effect by a simple comparison of means.
Each individual is weighted by the inverse of their probability of receiving the treatment they actually received:
\[W^A = \frac{1}{Pr[A|L]}\]
Definition 4 (Inverse Probability Weights) For an individual who received treatment \(A\) and has covariates \(L\), the inverse probability weight is:
\[W^A = \frac{1}{Pr[A|L]}\]
where \(Pr[A|L]\) is the probability (propensity) of receiving the treatment actually received, given covariates \(L\).
Under conditional exchangeability \(Y^a \perp\!\!\!\perp A | L\), the mean counterfactual outcome can be estimated as:
\[E[Y^a] = E\left[\frac{I(A = a) \times Y}{Pr[A = a|L]}\right]\]
where \(I(A = a)\) is an indicator function equal to 1 if \(A = a\) and 0 otherwise.
The IPW estimator of the causal risk difference is:
\[\widehat{E[Y^{a=1}] - E[Y^{a=0}]} = \frac{1}{n}\sum_{i=1}^n \frac{I(A_i = 1) \times Y_i}{Pr[A_i = 1|L_i]} - \frac{1}{n}\sum_{i=1}^n \frac{I(A_i = 0) \times Y_i}{Pr[A_i = 0|L_i]}\]
Using Table 2.2, let’s compute the IPW estimate of the causal effect.
Step 1: Compute treatment probabilities
In the stratified randomization:
Step 2: Compute weights
Since all probabilities equal 0.5, all weights equal \(1/0.5 = 2\).
Step 3: Compute weighted means
Treated group (\(A = 1\)):
Based on Table 2.2, there are 10 treated individuals, of whom 7 died (Y=1). The weighted mean is:
\[\frac{\sum_{i:A_i=1} W_i Y_i}{\sum_{i:A_i=1} W_i} = \frac{7 \times 2}{10 \times 2} = \frac{14}{20} = 0.7\]
Untreated group (\(A = 0\)):
Based on Table 2.2, there are 10 untreated individuals, of whom 5 died (Y=1). The weighted mean is:
\[\frac{\sum_{i:A_i=0} W_i Y_i}{\sum_{i:A_i=0} W_i} = \frac{5 \times 2}{10 \times 2} = \frac{10}{20} = 0.5\]
Step 4: Compute IPW estimate
| Feature | Standardization | IPW |
|---|---|---|
| Idea | Weight stratum-specific outcomes by population distribution | Create pseudo-population with marginal randomization |
| Formula | \(E[Y^a] = \sum_l E[Y \vert A=a,L=l] Pr[L=l]\) | \(E[Y^a] = E[Y \cdot I(A=a) / Pr[A=a \vert L]]\) |
| Weights | Stratum probabilities \(Pr[L=l]\) | Inverse treatment probabilities \(1/Pr[A \vert L]\) |
| Model | Outcome model \(E[Y \vert A,L]\) | Treatment model \(Pr[A \vert L]\) |
| Extensions | G-computation, parametric g-formula | Marginal structural models |
Both standardization and IPW require the positivity assumption: For all values of \(L\) that occur in the population, there must be a non-zero probability of receiving each treatment level.
Formally: If \(Pr[L = l] > 0\), then \(0 < Pr[A = a|L = l] < 1\) for all \(a\).
Definition 5 (Positivity) The positivity assumption (also called experimental treatment assignment or overlap) requires:
\[0 < Pr[A = a|L = l] < 1\]
for all values \(l\) such that \(Pr[L = l] > 0\) and for all treatment levels \(a\).
This chapter introduced fundamental concepts using randomized experiments:
Randomization ensures exchangeability \(Y^a \perp\!\!\!\perp A\), allowing causal effects to be identified from associational contrasts
Conditional randomization achieves conditional exchangeability \(Y^a \perp\!\!\!\perp A | L\) within strata, requiring adjustment methods
Standardization computes \(E[Y^a] = \sum_l E[Y|A=a,L=l] Pr[L=l]\) by weighting stratum-specific outcomes
Inverse probability weighting creates a pseudo-population where treatment is marginally randomized by weighting individuals as \(W^A = 1/Pr[A|L]\)
Both standardization and IPW require conditional exchangeability, positivity, and consistency
In randomized experiments, these assumptions are met by design. In observational studies (Chapters 3-7), these assumptions are less plausible and require careful justification.
In randomized experiments, individuals may not receive or adhere to their assigned treatment. For example, someone randomized to receive a transplant may refuse it. The intention-to-treat (ITT) effect is the causal effect of treatment assignment (not treatment received).
Let \(Z\) denote random assignment and \(A\) denote treatment actually received. The ITT effect is \(E[Y^{z=1}] - E[Y^{z=0}]\), while the per-protocol effect is \(E[Y^{a=1}] - E[Y^{a=0}]\).
ITT effects are always identified in randomized experiments (because \(Y^z \perp\!\!\!\perp Z\) by randomization). Per-protocol effects require additional assumptions about non-adherence.
Chapter 1 distinguished the average causal null hypothesis (\(E[Y^{a=1}] = E[Y^{a=0}]\)) from the sharp causal null hypothesis (\(Y^{a=1} = Y^{a=0}\) for all individuals).
In randomized experiments, the sharp null allows for exact inference using permutation tests (Fisher’s randomization test). These tests are valid in finite samples without distributional assumptions.
Under the average null (but not the sharp null), standard large-sample inference (normal approximation, t-tests) is typically used.
A causal quantity is identified if it can be expressed as a function of the observed data distribution. Exchangeability is a sufficient condition for identifying average causal effects.
Under \(Y^a \perp\!\!\!\perp A\), we have: \[E[Y^a] = E[Y|A=a]\]
This is because: \[E[Y^a] = E[Y^a|A=a] \quad \text{(by exchangeability)} = E[Y|A=a] \quad \text{(by consistency)}\]
Similarly, under \(Y^a \perp\!\!\!\perp A | L\): \[E[Y^a|L=l] = E[Y|A=a, L=l]\]
The standardization formula follows by taking expectations over \(L\).
The IPW estimator can be derived from first principles. Under \(Y^a \perp\!\!\!\perp A | L\) and positivity:
\[\begin{align} E[Y^a] &= E[E[Y^a|L]] \quad \text{(law of total expectation)} \\ &= E[E[Y^a|A=a, L]] \quad \text{(by conditional exchangeability)} \\ &= E[E[Y|A=a, L]] \quad \text{(by consistency)} \\ &= E\left[\frac{I(A=a)}{Pr[A=a|L]} E[Y|A=a, L]\right] \quad \text{(inverse weighting)} \\ &= E\left[\frac{I(A=a) \cdot Y}{Pr[A=a|L]}\right] \quad \text{(by law of total expectation)} \end{align}\]
The key step is the third equality, which uses the fact that \(E[I(A=a)/Pr[A=a|L]] = 1\).