An association created as a result of the process by which individuals are selected into the analysis is referred to as selection bias. Unlike confounding, this type of bias is not due to the presence of common causes of treatment and outcome, and can arise in both randomized experiments and observational studies. Like confounding, selection bias is just a form of lack of exchangeability between the treated and the untreated. This chapter provides a definition of selection bias and reviews the methods to adjust for it.
The term “selection bias” encompasses various biases that arise from the procedure by which individuals are selected into the analysis. Here we focus on bias that would arise even if the treatment had a null effect on the outcome, i.e., selection bias under the null.
The structure of selection bias can be represented using causal diagrams. Figure 8.1 depicts a dichotomous treatment \(A\), outcome \(Y\), and their common effect \(C\).
Suppose we study the effect of folic acid supplements \(A\) given to pregnant women shortly after conception on the fetus’s risk of developing a cardiac malformation \(Y\) (1: yes, 0: no) during the first trimester of pregnancy.
Let \(C\) (1: yes, 0: no) indicate whether the pregnancy results in a live birth. Both treatment and outcome affect \(C\):
Therefore, \(C\) is a common effect (collider) of \(A\) and \(Y\): \(A \rightarrow C \leftarrow Y\)
In the full population (not conditioning on \(C\)):
After restricting to live births (\(C = 1\)):
Definition 1 (Selection Bias) Selection bias occurs when conditioning on (or restricting the analysis to) a common effect of treatment and outcome, or conditioning on a variable affected by such a common effect.
This creates a non-causal association between treatment and outcome, even under the null hypothesis of no treatment effect.
Selection bias can arise in many settings. Here we review several common examples.
Scenario: Volunteers for a study may differ systematically from non-volunteers.
Example: A study of the effect of exercise \(A\) on depression \(Y\) recruits volunteers.
If both:
Then among volunteers, exercisers may appear less depressed even without a causal effect, simply because volunteers who don’t exercise are selected for being unusually non-depressed (to offset their lower volunteering tendency).
Scenario: Individuals drop out of a study after treatment assignment but before outcome measurement.
Example: Sicker patients are more likely to drop out, and treatment affects sickness.
If the analysis is restricted to those who complete the study, selection bias may occur because completion status \(C\) is affected by both treatment \(A\) and outcome \(Y\) (or their common causes).
Scenario: Workers must be healthy enough to remain employed.
Example: Studying occupational exposures \(A\) on mortality \(Y\) among employed workers.
Employment status \(C\) is affected by both:
Restricting to currently employed workers conditions on \(C\), inducing selection bias.
Result: Occupational cohorts often show lower mortality than the general population (the “healthy worker effect”), even for harmful exposures.
Scenario: Hospital-based case-control studies.
Example: Studying the effect of smoking \(A\) on lung cancer \(Y\) using hospitalized patients.
If:
Then among hospitalized individuals, smoking and lung cancer may appear less associated than in the general population.
Selection bias and confounding are both forms of non-exchangeability, but they have different causal structures.
Confounding:
Selection bias:
Yes! A variable can be both a confounder and a source of selection bias.
Example: Consider a variable \(L\) that:
In some causal diagrams, selection can induce confounding by opening backdoor paths that would otherwise be blocked.
Example: Suppose there is no confounding in the full population, but restricting the analysis to a subset creates confounding.
This occurs when:
Censoring is a specific form of selection where we fail to observe the outcome for some individuals.
Right censoring: Outcome is not observed because follow-up ends before the outcome occurs (e.g., study ends, patient drops out).
Left censoring: Outcome occurred before observation began.
Interval censoring: Outcome time is known only to occur within an interval.
Censoring causes selection bias if:
Example 1 (Censoring Creates Selection Bias) Study the effect of AZT \(A\) on mortality \(Y\) in HIV-positive individuals.
Suppose:
Let \(C\) indicate whether an individual remains in the study until outcome measurement.
If \(C\) is affected by both \(A\) (through its effect on health/survival) and \(Y\) (sicker people with worse outcomes drop out more), then restricting to \(C = 1\) creates selection bias.
Like confounding, selection bias can sometimes be adjusted for if appropriate data are available.
1. Inverse Probability of Selection Weighting
Create a pseudo-population where selection is independent of treatment and outcome.
For each individual in the selected sample, assign weight:
\[w^S = \frac{1}{Pr[S = 1 | A, Y, L]}\]
where \(S\) indicates selection into the analysis and \(L\) are measured variables.
2. Standardization (Restriction and Conditioning)
If selection depends only on measured variables \(L\):
3. Stratification on Selection
If possible, collect data on both selected and non-selected individuals.
Estimate the effect separately in selected and non-selected subgroups.
If the causal effect is the same in both groups, we can identify the population average causal effect.
Selection bias can be eliminated if:
Not all selection creates bias. Selection is harmless under certain conditions.
Selection does not create bias if the probability of selection does not depend on both treatment and outcome simultaneously.
Examples of harmless selection:
Selection based on variables that are not affected by treatment or outcome generally doesn’t create bias.
Example: Selecting only women for a study of a treatment and outcome.
If sex is not affected by treatment or outcome, this selection doesn’t bias the treatment-outcome association within women.
This chapter examined selection bias, another threat to exchangeability.
Key concepts:
Structure of selection bias: Arises from conditioning on a common effect (collider) of treatment and outcome
Sources of selection bias:
Selection bias vs. confounding:
Censoring: A specific type of selection where outcomes are unobserved
Adjustment methods:
Selection without bias: Not all selection creates bias (e.g., random sampling, case-control studies with proper analysis)