Fits an inverse-probability-weighted pooled logistic regression to estimate the odds ratio (as a hazard ratio approximation) for the STOPBASE arm relative to the CONTINUE arm, with time modeled via restricted cubic splines.
A data frame in long format (one row per participant-arm-month), as produced by expand_to_long().
covariate_cols
Character vector of column names to include as additional baseline adjustment terms. Set to NULL for no adjustment. Default: NULL.
weight_col
Name of the column containing IPW weights. Set to NULL for unweighted estimation. Default: “wp99”.
outcome_col
Name of the binary outcome column. Default: “dead_t1”.
arm_col
Name of the trial arm column. Default: “arm”.
month_col
Name of the 0-indexed month-from-entry column. Default: “month2”.
id_col
Name of the participant identifier column. Default: “id”.
cluster_id_col
Name of the column to use for clustering standard errors. When non-NULL and the sandwich package is available, cluster-robust confidence intervals are returned. Defaults to id_col.
max_month
Maximum month included in the model. Rows beyond this month are excluded. Default: 95L.
rcs_knots
Numeric vector with at least 3 elements specifying the knots for the restricted cubic spline on time: the first element is the left boundary knot, the last element is the right boundary knot, and any middle elements are interior knots. Must have at least one interior knot. Default: c(6, 48, 72) (one interior knot at month 48).
where month3 is the 0-indexed follow-up month, and ns1, …, nsK are the K = length(rcs_knots) - 1 columns of the natural spline basis for time (from splines::ns()).
Participant-level IPW weights from weight_col are passed to stats::glm(). The returned odds ratio is the exponentiated STOPBASE main-effect coefficient. Because the formula includes arm-by-time interaction terms, this coefficient represents the instantaneous log-odds ratio at baseline (month = 0), not an unconditional overall ratio; it is retained as a hazard ratio approximation consistent with the SAS cann20 macro output.
Because the dataset contains repeated person-month observations per participant, standard stats::glm() confidence intervals understate uncertainty. When cluster_id_col is provided and the sandwich package is installed, cluster-robust (HC) variance estimates are used to form the confidence interval, matching the variance estimation in the SAS PROC SURVEYLOGISTIC implementation. A Wald confidence interval is used in both branches for consistent output type.
Value
A named list with three elements:
model: The fitted stats::glm object.
or: Odds ratio for the STOPBASE arm (exp of the STOPBASE main-effect coefficient at baseline month = 0).
or_ci: Named numeric vector of length 2 giving the Wald 95% confidence interval for the odds ratio (cluster-robust if sandwich is available and cluster_id_col is set, otherwise standard).
References
García-Albéniz X, Uno H, Bhatt DL, McArdle PH, Joffe MM, Hernán MA. Continuation of Annual Screening Mammography and Breast Cancer Mortality in Women Older Than 70 Years: A Prospective Observational Study. Ann Intern Med. 2020;172(6):381–389. doi:10.7326/M18-1199