Chapter 19 introduced the concept of time-varying confounders: covariates that change over time, are causally influenced by past treatment, and simultaneously predict future treatment and the outcome. This chapter shows exactly why this creates a problem for standard analytic methods — and why that problem cannot be fixed by any simple modification of those methods. Understanding this failure is the intellectual motivation for the g-methods presented in Chapter 21.
Treatment-confounder feedback arises whenever a time-varying covariate \(L_k\) satisfies all of the following:
The causal diagram capturing two time points of feedback is:
\[A_0 \to L_1 \to A_1 \to Y, \quad A_0 \to Y, \quad L_0 \to A_0, \quad L_0 \to Y, \quad L_1 \to Y.\]
The arrow \(A_0 \to L_1\) is the feedback arrow: prior treatment changes the confounder.
Example 1 (HIV Treatment and CD4 Counts) In an HIV cohort study, let \(A_k\) be antiretroviral therapy (ART) at time \(k\), \(L_k\) be the CD4 T-cell count at time \(k\), and \(Y\) be death within five years.
All three conditions above are satisfied: \(L_k\) is a time-varying confounder with treatment feedback.
To appreciate why conditioning on \(L_k\) is problematic, it helps to introduce an unmeasured common cause \(U\) of \(L_k\) and \(Y\). For example, \(U\) might be an unmeasured aspect of immune function that affects both CD4 count (and hence \(L_k\)) and mortality (and hence \(Y\)).
The extended DAG then contains the path
\[A_{k-1} \to L_k \leftarrow U \to Y.\]
Here, \(L_k\) is a collider on the path \(A_{k-1} \to L_k \leftarrow U \to Y\). As discussed in Chapter 6, conditioning on a collider opens a previously blocked path, inducing a spurious association between \(A_{k-1}\) and \(Y\) through \(U\).
Traditional methods for confounding adjustment include:
All of these approaches condition on the time-varying confounder \(L_k\). We now show that this conditioning introduces bias in the presence of treatment-confounder feedback.
Consider a two-time-point study (\(k = 0, 1\)) with binary treatment and binary confounder, and suppose the true causal effect of the treatment strategy “always treat” (\(\bar{a} = (1,1)\)) versus “never treat” (\(\bar{a} = (0,0)\)) on \(Y\) is zero.
If we stratify on \(L_1\) (the time-varying confounder), we find a non-zero association between \(A_0\) and \(Y\) within strata of \(L_1\). This spurious association arises because conditioning on \(L_1\) opens the backdoor path \(A_0 \to L_1 \leftarrow U \to Y\).
Conversely, if we do not stratify on \(L_1\), we fail to control for the confounding path \(A_1 \leftarrow L_1 \to Y\), also introducing bias. There is no strategy within the traditional regression paradigm that avoids both forms of bias simultaneously.
The direction of the bias introduced by traditional methods depends on the signs of the associations in the feedback loop. Without additional information, the bias can be in either direction — traditional methods can either underestimate or overestimate the treatment effect.
This unpredictability is particularly concerning: a naive analyst who uses traditional regression may not only miss the true effect but may estimate an effect of the wrong sign.
The fundamental reason traditional methods fail is that they cannot simultaneously adjust for time-varying confounding and preserve the causal effect of prior treatment.
To estimate the total effect of the strategy “always treat” versus “never treat”, we need to include the causal path \(A_0 \to L_1 \to A_1 \to Y\) — that is, the effect of \(A_0\) working through \(L_1\) on \(A_1\) and then on \(Y\). But to control confounding of the \(A_1 \to Y\) relationship, we feel compelled to condition on \(L_1\), which blocks the path \(A_0 \to L_1 \to \cdots \to Y\) and introduces collider bias.
Proposition 1 (Traditional Methods Fail under Treatment-Confounder Feedback) Let the causal DAG contain the path \(A_0 \to L_1 \leftarrow U \to Y\) and the confounding path \(A_1 \leftarrow L_1 \to Y\). Then no estimator that conditions on \(L_1\) as a standard covariate (in a regression or stratification) can consistently estimate \(\text{E}{\left[Y^{\bar{a}=\bar{1}}\right]} - \text{E}{\left[Y^{\bar{a}=\bar{0}}\right]}\).
The failure can be visualized on the causal DAG. Suppose the full DAG (including unmeasured \(U\)) is:
\[L_0 \to A_0 \to L_1 \leftarrow U \to Y, \quad L_1 \to A_1 \to Y, \quad A_0 \to Y, \quad L_0 \to Y.\]
The treatment effect of interest is mediated through multiple paths: \(A_0 \to Y\) (direct) and \(A_0 \to A_1 \to Y\) (through its effect on \(A_1\) via \(L_1\)).
Now consider what happens when we condition on \(L_1\) in a regression:
Both effects distort the estimated association between \(A_0\) and \(Y\). The g-methods of Chapter 21 avoid this by not conditioning on \(L_1\) as a standard covariate, but instead using it in a more structured way that respects the feedback.
One might hope that a more sophisticated version of traditional regression — perhaps including interaction terms, polynomial terms, or machine learning — could solve the time-varying confounding problem. This hope is misguided.
The failure of traditional methods is not due to model misspecification or insufficient flexibility. It is a structural failure: the data cannot be analyzed correctly using any approach that conditions on \(L_k\) as a standard covariate.
Several “fixes” have been proposed and shown to be inadequate:
Adding lagged treatments: Including \(A_0, A_1\) and \(L_0, L_1\) in the same regression of \(Y\) does not solve the problem because it still conditions on \(L_1\) simultaneously with \(A_0\) and \(A_1\), blocking the causal path and opening the collider.
Marginal models without adjustment: Fitting a marginal regression of \(Y\) on \(A_0\) and \(A_1\) without adjusting for \(L_k\) fails to control confounding, since \(L_k\) is a confounder for \(A_k\).
Fixed-effects panel models: These control for time-invariant unmeasured confounders but do not address time-varying confounders that are affected by past treatment.
Instrumental variable methods: Require instruments that are hard to find and do not directly address the structural issue with treatment-confounder feedback.
Fine Point 20.1: The Structural Problem Is Not About Sample Size
A common misconception is that the bias of traditional methods in the presence of treatment-confounder feedback is a finite-sample problem that disappears with large enough datasets. This is incorrect. The bias is present even in infinite samples because it arises from the structural relationship between variables in the DAG, not from estimation error. Doubling the sample size doubles our precision in estimating the wrong quantity. The only remedy is to use an estimation strategy that is structurally correct — the g-methods of Chapter 21.
A method that correctly handles treatment-confounder feedback must:
The g-formula, IP weighting with marginal structural models, and g-estimation each accomplish these goals through different mechanisms, as described in Chapter 21.
A specific manifestation of the problem is the question of whether and how to include past treatment in a regression model for the outcome.
Consider a regression of \(Y\) on \((A_0, A_1, L_0, L_1)\). This model adjusts for both time points of treatment and both time points of the covariate. Is this a valid approach?
Remark 1 (Adjusting for Past Treatment in Regression). Including \(A_0\) in a regression model for \(Y\) that also includes \(L_1\) creates the following problem:
Adjusting for past treatment is appropriate in specific circumstances:
The fundamental lesson of Chapter 20 is that standard analytic tools, designed for cross-sectional or time-fixed settings, are not equipped to handle the recursive causal structure of longitudinal data with treatment-confounder feedback.
The solution — g-methods — are designed from the ground up with the sequential nature of the data in mind. They correctly separate the roles of \(L_k\): using it to remove confounding of \(A_k\) while preserving the causal path \(A_{k-1} \to L_k \to Y\).