As a human being, you are already familiar with causal inference’s fundamental concepts. Through sheer existence, you know what a causal effect is, understand the difference between association and causation, and you have used this knowledge consistently throughout your life. Had you not, you’d be dead. Without basic causal concepts, you would not have survived long enough to read this chapter, let alone learn to read. As a toddler, you would have jumped right into the swimming pool after seeing those who did were later able to reach the jam jar. As a teenager, you would have skied down the most dangerous slopes after seeing those who did won the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing that those children who took their medicines were not at the park the next day.
Since you already understand the definition of causal effect and the difference between association and causation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that you can match your causal intuition with the mathematical notation introduced here. This notation is necessary to precisely define causal concepts, and will be used throughout the book.
We use a hypothetical example to introduce causal effects. Consider Zeus’s extended family as our population of interest. Suppose all 20 family members have a life-threatening disease and we want to study the causal effect of heart transplant (treatment \(A\)) on death (outcome \(Y\)) within 5 years.
We represent treatment and outcome as binary variables:
To define a causal effect, we need to compare two counterfactual outcomes (also called potential outcomes) for each individual:
An individual causal effect exists when \(Y^{a=1} \neq Y^{a=0}\) for an individual. For example, if Zeus would die if transplanted (\(Y^{a=1} = 1\)) but survive if not transplanted (\(Y^{a=0} = 0\)), then heart transplant has a causal effect on Zeus’s outcome.
The fundamental problem of causal inference is that we can only observe one of the two counterfactual outcomes for each individual. If Zeus receives a heart transplant, we observe \(Y^{a=1}\) but not \(Y^{a=0}\). The unobserved counterfactual outcome remains unknown.
Formally, for each individual, the observed outcome \(Y\) equals the counterfactual outcome \(Y^a\) corresponding to the treatment actually received: \(Y = Y^A\). This equality is called consistency.
Since individual causal effects cannot be identified, we focus on average causal effects in a population. Table 1.1 shows the counterfactual outcomes for all 20 members of Zeus’s family.
Table 1.1: Counterfactual 5-year mortality outcomes for Zeus’s family
| Name | \(Y^{a=0}\) | \(Y^{a=1}\) |
|---|---|---|
| Rheia | 0 | 1 |
| Kronos | 1 | 0 |
| Demeter | 0 | 0 |
| Hades | 0 | 0 |
| Hestia | 0 | 0 |
| Poseidon | 1 | 0 |
| Hera | 0 | 0 |
| Zeus | 0 | 1 |
| Artemis | 1 | 1 |
| Apollo | 1 | 0 |
| Leto | 0 | 1 |
| Ares | 1 | 1 |
| Athena | 1 | 1 |
| Hephaestus | 0 | 1 |
| Aphrodite | 0 | 1 |
| Polyphemus | 0 | 1 |
| Persephone | 1 | 1 |
| Hermes | 1 | 0 |
| Hebe | 1 | 0 |
| Dionysus | 1 | 0 |
From Table 1.1, we can compute:
An average causal effect of treatment \(A\) on outcome \(Y\) is present if:
\[Pr[Y^{a=1} = 1] \neq Pr[Y^{a=0} = 1]\]
or equivalently (using expected values):
\[E[Y^{a=1}] \neq E[Y^{a=0}]\]
In our population, treatment does not have an average causal effect because both risks equal 0.5. The null hypothesis of no average causal effect holds. However, this does not mean there are no individual effects.
When a causal effect exists, we can quantify its magnitude using different effect measures. The three most common for binary outcomes are:
\[Pr[Y^{a=1} = 1] - Pr[Y^{a=0} = 1]\]
This additive measure equals zero under the null hypothesis. It measures the absolute difference in risk.
\[\frac{Pr[Y^{a=1} = 1]}{Pr[Y^{a=0} = 1]}\]
This multiplicative measure equals one under the null hypothesis. It measures how many times treatment increases (or decreases) the risk.
\[\frac{Pr[Y^{a=1} = 1] / Pr[Y^{a=1} = 0]}{Pr[Y^{a=0} = 1] / Pr[Y^{a=0} = 0]}\]
This also equals one under the null hypothesis.
In practice, we do not observe the counterfactual outcomes in Table 1.1. We only observe data from a sample of individuals. This introduces random variability.
We can view our study population in two ways:
Under the superpopulation perspective, even if the true average causal effect is zero, our sample estimate might not be exactly zero due to sampling variability.
It’s critical to distinguish:
With a very large sample, random error becomes negligible, and systematic error dominates. This is why proper study design and appropriate methods are essential for causal inference.
A key distinction in causal inference is between causation and association.
Unlike causal effects (defined by counterfactuals), associations are defined using observed data. Table 1.2 shows the observed treatment and outcomes for Zeus’s family.
Table 1.2: Observed treatment and outcome for Zeus’s family
| Name | \(A\) | \(Y\) |
|---|---|---|
| Rheia | 0 | 0 |
| Kronos | 0 | 1 |
| Demeter | 0 | 0 |
| Hades | 0 | 0 |
| Hestia | 1 | 0 |
| Poseidon | 1 | 0 |
| Hera | 1 | 0 |
| Zeus | 1 | 1 |
| Artemis | 0 | 1 |
| Apollo | 0 | 1 |
| Leto | 0 | 0 |
| Ares | 1 | 1 |
| Athena | 1 | 1 |
| Hephaestus | 1 | 1 |
| Aphrodite | 1 | 1 |
| Polyphemus | 1 | 1 |
| Persephone | 1 | 1 |
| Hermes | 1 | 0 |
| Hebe | 1 | 0 |
| Dionysus | 1 | 0 |
The associational risk in the treated: \(Pr[Y = 1|A = 1] = 7/13\)
The associational risk in the untreated: \(Pr[Y = 1|A = 0] = 3/7\)
Treatment \(A\) and outcome \(Y\) are independent (not associated) when:
\[Pr[Y = 1|A = 1] = Pr[Y = 1|A = 0]\]
or equivalently: \(E[Y|A = 1] = E[Y|A = 0]\)
This is denoted \(Y \perp\!\!\!\perp A\).
When this equality does not hold, \(A\) and \(Y\) are associated or dependent. Association measures include:
In our example:
Causal inference requires data like Table 1.1 (all counterfactual outcomes), but we only have data like Table 1.2 (observed outcomes). The question is: Under which conditions can real-world data be used for causal inference?
Chapter 2 provides one answer: conduct a randomized experiment.
This chapter introduced fundamental concepts:
Individual causal effects: Defined as \(Y^{a=1} \neq Y^{a=0}\), but cannot be identified due to the fundamental problem (missing counterfactuals)
Average causal effects: Defined as \(E[Y^{a=1}] \neq E[Y^{a=0}]\), can sometimes be identified from data
Effect measures: Include risk difference (additive), risk ratio (multiplicative), and odds ratio
Random variability: Distinguishes sampling variability (reduced by larger n) from systematic bias (not reduced by larger n)
Causation vs. association: Causation compares counterfactual risks in the same population; association compares observed risks in different subsets
The key insight: Association does not imply causation. The challenge is to use observed associations to make valid causal inferences.