Part I of this book was mostly conceptual, with calculations kept to a minimum. In contrast, Part II requires the use of computers to fit regression models. This chapter describes the differences between the nonparametric estimators used in Part I and the parametric (model-based) estimators used in Part II. It reviews the concept of smoothing and the bias-variance trade-off in modeling decisions, motivating the need for models in data analysis regardless of whether the goal is causal inference or prediction.
Even the simple task of estimating a population mean requires modeling assumptions when data become sparse.
Consider a study of 16 HIV-positive individuals randomly sampled from a super-population. Each receives treatment \(A\) (antiretroviral therapy), and we measure outcome \(Y\) (CD4 cell count, cells/mm³).
Goal: Estimate the population mean \(E[Y|A = a]\) for each treatment level \(a\).
Treatment \(A \in \{0, 1\}\) with 8 individuals in each group.
Estimator: Sample average within each group
This nonparametric estimator (sample mean) is consistent and unbiased.
Treatment \(A \in \{1, 2, 3, 4\}\) (none, low-dose, medium-dose, high-dose) with 4 individuals per group.
Estimates: 70.0, 80.0, 117.5, 195.0 for \(A = 1, 2, 3, 4\) respectively.
Issue: With only 4 individuals per category:
Treatment \(A\) is dose in mg/day, taking integer values from 0 to 100 mg.
Problem: With 16 individuals and 101 possible treatment values:
Question: How do we estimate \(E[Y|A = 90]\) when no one received dose 90?
Parametric models make assumptions about the functional form relating treatment to outcome, allowing estimation even with sparse data.
Assume the conditional mean follows a linear function:
\[E[Y|A = a] = \beta_0 + \beta_1 a\]
Parameters: \((\beta_0, \beta_1)\) define the line.
Estimation: Fit the model using least squares to estimate \((\hat{\beta}_0, \hat{\beta}_1)\).
Prediction: For any value \(a\), estimate \(E[Y|A = a] = \hat{\beta}_0 + \hat{\beta}_1 a\).
Example 1 (Linear Model for Continuous Treatment) With the HIV data and continuous treatment dose:
Even though no one received dose 90, the model provides an estimate by interpolation from observed doses.
Quadratic model: \[E[Y|A = a] = \beta_0 + \beta_1 a + \beta_2 a^2\]
Logarithmic model: \[E[Y|A = a] = \beta_0 + \beta_1 \log(a)\]
Piecewise linear (splines): Different linear relationships in different ranges of \(A\).
Each model makes different assumptions about the shape of the dose-response curve.
Smoothing refers to techniques that estimate the conditional mean as a smooth function, balancing between nonparametric flexibility and parametric smoothness.
Nonparametric (no smoothing): - Sample means within groups - No assumptions about functional form - High variance when data are sparse
Parametric (maximum smoothing): - Linear, quadratic, etc. models - Strong assumptions about functional form - Low variance but potential bias
Semiparametric (intermediate smoothing): - Methods that smooth but make weaker assumptions - Examples: kernel smoothing, local regression, splines - Balance bias and variance
Idea: Estimate \(E[Y|A = a]\) using a weighted average of nearby observations.
\[\hat{E}[Y|A = a] = \frac{\sum_i K\left(\frac{A_i - a}{h}\right) Y_i}{\sum_i K\left(\frac{A_i - a}{h}\right)}\]
where:
Bandwidth selection:
Every statistical estimator involves a trade-off between bias and variance.
Bias: Systematic error, the difference between the expected value of the estimator and the true parameter.
\[\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta\]
Variance: Random error, the variability of the estimator across repeated samples.
\[\text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]\]
Mean squared error (MSE): Combines both sources of error.
\[MSE(\hat{\theta}) = \text{Bias}^2(\hat{\theta}) + \text{Var}(\hat{\theta})\]
Nonparametric estimators (e.g., sample means):
Parametric estimators (e.g., linear regression):
Simulation studies can illustrate the bias-variance trade-off across different sample sizes.
Small sample size (\(n = 16\)):
Large sample size (\(n = 1000\)):
This chapter motivated the need for statistical models in data analysis.
Key concepts: