--- title: "Calibration-first pipeline --- toy-cohort illustration" author: "PhD student (lead), Max Moldovan, Usman Iqbal" date: "2026-05-17" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Calibration-first pipeline --- toy-cohort illustration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, fig.align = "center" ) set.seed(20260517L) # Stan toolchain detection. brms (used for the Bayesian # regularized-horseshoe fit) requires rstan + BH (Boost C++ headers). # On CI runners and minimal environments these may not be installed; # we detect their availability once and gate the Bayesian chunks # accordingly, so the rest of the vignette still renders. brms_ok <- requireNamespace("brms", quietly = TRUE) && requireNamespace("rstan", quietly = TRUE) && requireNamespace("BH", quietly = TRUE) run_bayes <- brms_ok && identical(Sys.getenv("PICMORT_RUN_VIGNETTE_BAYES"), "true") fit_bh <- NULL ``` ```{r brms-unavailable-notice, eval=!run_bayes, echo = FALSE, results = "asis"} cat( "> **Note.** The Bayesian regularized-horseshoe fit is skipped in ", "the default vignette render to keep `R CMD check` fast and portable. ", "Set `PICMORT_RUN_VIGNETTE_BAYES=true` on a host with `brms`, `rstan`, ", "and `BH` installed to include the Bayesian model.\n", sep = "" ) ``` > **Scope.** This vignette mirrors the calibration-first pediatric ICU > mortality pipeline end-to-end on a small synthetic cohort > (`inst/extdata/toy_cohort.rds`, n = 80, mortality = 10 %). Numbers > shown here are illustrative and do not reproduce the manuscript > results. The full pipeline on PIC v1.1.0 (n = 8,736) is shipped as > Supplementary File S1 (`pipeline/full_pipeline.R`) alongside the > manuscript bundle. # What this vignette demonstrates The calibration-first contract for `picMort` runs through four stages: 1. **Cohort.** First ICU stay per patient, age 0--18 y, in-hospital mortality outcome --- the frozen contract from `vignette("cohort_spec", package = "picMort")`. The toy cohort obeys the same schema and the same invariants, downsized to keep the vignette fast. 2. **Features.** T₀ + 24 h prediction window, demographics plus vitals and labs aggregated by `min` / `max` / `mean` / `last`, no length-of-stay, no discharge-time, no post-window leakage. On PIC v1.1.0 this is `build_features(cohort, pic_paths())`; here, because the toy cohort ships without raw `CHARTEVENTS` / `LABEVENTS`, the vignette synthesizes a small, deterministic feature matrix with the same shape and column conventions so the downstream pipeline is exercised faithfully. The synthesis routine is local to this vignette --- production runs go through `build_features()`. 3. **Models.** Penalized logistic regression (elastic net), XGBoost, and Bayesian logistic regression with a regularized-horseshoe prior. The Bayesian model is the package's headline functional output: every patient carries a posterior 95 % credible interval on predicted mortality. 4. **Evaluation.** Calibration suite (slope, intercept, ICI, calibration-in-the-large) and decision-curve analysis at 5 %, 10 %, 20 % thresholds are the manuscript headline. Discrimination metrics (AUROC, AUPRC, Brier, Brier skill score) are supporting. ```{r libs, message = FALSE, warning = FALSE} library(picMort) library(data.table) ``` # Stage 1 --- load the toy cohort ```{r cohort} toy_cohort_path <- system.file("extdata", "toy_cohort.rds", package = "picMort") cohort <- readRDS(toy_cohort_path) dim(cohort) table(mortality = cohort$hospital_expire_flag) ``` The toy cohort obeys the same invariants as the production cohort, scaled to a smaller envelope (50--150 rows, mortality 5--20 %). ```{r cohort-invariants} toy_expectations <- list( n_min = 50L, n_max = 150L, mortality_rate = c(0.05, 0.20), age_range_years = c(0, 18), sex_levels = c("F", "M"), distinct_subject = TRUE, no_overlap_stays = TRUE ) assert_cohort_invariants(cohort, expected = toy_expectations) ``` # Stage 2 --- build a small illustrative feature matrix On PIC v1.1.0 the call is `features <- build_features(cohort, pic_paths(), window_hours = 24L)`. The toy cohort ships without raw event tables, so this vignette assembles a matrix of the same shape from the cohort columns plus deterministic synthetic vitals and labs. The hard rule is identical to production: **no LOS, no discharge-time, no post-window timestamp ever enters the feature matrix.** ```{r synth-features} make_toy_features <- function(cohort, seed = 20260517L) { set.seed(seed) n <- nrow(cohort) baseline_hr <- 130 - 4 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 8) baseline_rr <- 40 - 1.5 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 4) baseline_sbp <- 80 + 2.5 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 10) baseline_spo2 <- pmin(100, 97 + stats::rnorm(n, 0, 1.5)) baseline_lactate <- pmax(0.5, 1.4 + 0.6 * cohort$hospital_expire_flag + stats::rnorm(n, 0, 0.4)) x <- data.table::data.table( icustay_id = cohort$icustay_id, age_months = cohort$age_months, age_years = cohort$age_years, sex_male = as.integer(cohort$sex == "M"), is_surgical = as.integer(cohort$is_surgical), primary_icd_chapter = cohort$primary_icd_chapter, hr_min = baseline_hr - abs(stats::rnorm(n, 5, 2)), hr_max = baseline_hr + abs(stats::rnorm(n, 8, 3)), hr_mean = baseline_hr, hr_last = baseline_hr + stats::rnorm(n, 0, 4), rr_min = baseline_rr - abs(stats::rnorm(n, 2, 1)), rr_max = baseline_rr + abs(stats::rnorm(n, 3, 1)), rr_mean = baseline_rr, rr_last = baseline_rr + stats::rnorm(n, 0, 2), sbp_min = baseline_sbp - abs(stats::rnorm(n, 6, 2)), sbp_max = baseline_sbp + abs(stats::rnorm(n, 8, 2)), sbp_mean = baseline_sbp, sbp_last = baseline_sbp + stats::rnorm(n, 0, 5), spo2_min = pmin(100, baseline_spo2 - abs(stats::rnorm(n, 2, 1))), spo2_max = pmin(100, baseline_spo2 + abs(stats::rnorm(n, 1, 0.5))), spo2_mean = baseline_spo2, spo2_last = pmin(100, baseline_spo2 + stats::rnorm(n, 0, 1)), lactate_min = baseline_lactate - abs(stats::rnorm(n, 0.2, 0.1)), lactate_max = baseline_lactate + abs(stats::rnorm(n, 0.4, 0.2)), lactate_last = baseline_lactate + stats::rnorm(n, 0, 0.2) ) list( x = x, y = as.integer(cohort$hospital_expire_flag), window_hours = 24L, feature_set = "toy" ) } features <- make_toy_features(cohort) dim(features$x) ``` `features$x` carries the `icustay_id` join key plus demographics, vitals and a single representative lab. Production runs ship dozens more lab panels; the shape is otherwise identical to `build_features()` output. # Stage 3 --- stratified train / test split ```{r split} split <- make_train_test_split(features, prop = 0.7, seed = 20260517L) length(split$train_idx) length(split$test_idx) ``` # Stage 4 --- fit the three models Run times below are intentionally aggressive to keep the vignette under two minutes. Production runs use wider hyperparameter grids, 1,000 bootstrap replicates for CIs, and four MCMC chains with 4,000 iterations for the Bayesian fit. ```{r fit-glmnet} fit_en <- fit_elastic_net(features, split$train_idx, seed = 20260517L) c(alpha = fit_en$best_alpha, lambda = fit_en$best_lambda) ``` ```{r fit-xgb} fit_xgb <- fit_xgboost(features, split$train_idx, seed = 20260517L) fit_xgb$best_nrounds ``` ```{r fit-bayes, message = FALSE, warning = FALSE, results = "hide", eval=run_bayes} fit_bh <- fit_bayes_horseshoe( features, split$train_idx, chains = 2L, iter = 500L, par_ratio = 0.10, adapt_delta = 0.95, seed = 20260517L ) ``` ```{r fit-bayes-summary, eval=run_bayes} fit_bh$chains fit_bh$iter ``` # Stage 5 --- predict on the held-out fold The test fold is touched once. Predictions are aligned by `icustay_id`; the Bayesian model additionally returns a 95 % credible interval on the predicted probability for every test-fold patient. ```{r predict} x_test <- features$x[split$test_idx, ] y_test <- features$y[split$test_idx] pred_en <- predict_mortality(fit_en, x_test) pred_xgb <- predict_mortality(fit_xgb, x_test) if (!is.null(fit_bh)) { pred_bh <- predict_mortality(fit_bh, x_test) head(pred_bh, 3) } else { pred_bh <- NULL } ``` # Stage 6 --- evaluation (the manuscript headline) ```{r eval, warning = FALSE} probs <- list( elastic_net = pred_en$prob_raw, xgboost = pred_xgb$prob_raw ) if (!is.null(pred_bh)) { probs$bayes_horseshoe <- pred_bh$prob_raw } calib <- lapply(probs, calibration_suite, y = y_test, n_boot = 200L, seed = 20260517L) dca <- decision_curve(probs, y_test, thresholds = c(0.05, 0.10, 0.20), plot_grid = FALSE) disc <- discrimination_metrics(probs, y_test, reference = "elastic_net", n_boot = 200L, seed = 20260517L) ``` ## Calibration headline ```{r calib-table} calib_tbl <- data.table::rbindlist(lapply(names(calib), function(nm) { data.table::data.table( model = nm, slope = calib[[nm]]$slope[["estimate"]], intercept = calib[[nm]]$intercept[["estimate"]], ici = calib[[nm]]$ici[["estimate"]], cit_large = calib[[nm]]$cit_large[["estimate"]] ) })) knitr::kable(calib_tbl, digits = 3, caption = "Calibration on the toy-cohort held-out fold.") ``` ## Decision-curve net benefit at clinically relevant thresholds ```{r dca-table} dca_headline <- dca[threshold %in% c(0.05, 0.10, 0.20) & type == "model"] knitr::kable(dca_headline, digits = 4, caption = "Net benefit at 5%, 10%, 20% thresholds.") ``` ## Discrimination (supporting) ```{r disc-table} knitr::kable(disc, digits = 3, caption = "Discrimination metrics with bootstrap 95% CIs.") ``` # How this maps to the manuscript This vignette walks the same four-stage pipeline that drives the calibration-first *Pediatric Critical Care Medicine* manuscript bundle (Supplementary File S1, `pipeline/full_pipeline.R`). The production bundle: - runs on PIC v1.1.0 (n = 8,736, mortality 0.0844) rather than the 80-row toy cohort, - uses the full feature panel from `build_features()` (vitals + labs from `CHARTEVENTS` and `LABEVENTS`, 13 panels) under the same T₀ + 24 h prediction window, - adds the PIM3 baseline reconstructed via `compute_pim3()` (Gate G3), - runs the Bayesian fit with four chains and 4,000 iterations, with patient-level 95 % credible intervals as the clinical-functional headline, - reports calibration and decision-curve analysis as the primary outcome, with discrimination supporting. Numbers in the tables above are not stable across runs of the toy synthesizer and must not be quoted in any paper. The toy cohort exists to let `R CMD check` exercise the package surface end-to-end and to give new contributors a runnable scaffold. ```{r session} sessionInfo() ```