---
title: "Calibration-first pipeline --- toy-cohort illustration"
author: "PhD student (lead), Max Moldovan, Usman Iqbal"
date: "2026-05-17"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 2
vignette: >
%\VignetteIndexEntry{Calibration-first pipeline --- toy-cohort illustration}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 6,
fig.height = 4,
fig.align = "center"
)
set.seed(20260517L)
# Stan toolchain detection. brms (used for the Bayesian
# regularized-horseshoe fit) requires rstan + BH (Boost C++ headers).
# On CI runners and minimal environments these may not be installed;
# we detect their availability once and gate the Bayesian chunks
# accordingly, so the rest of the vignette still renders.
brms_ok <- requireNamespace("brms", quietly = TRUE) &&
requireNamespace("rstan", quietly = TRUE) &&
requireNamespace("BH", quietly = TRUE)
run_bayes <- brms_ok &&
identical(Sys.getenv("PICMORT_RUN_VIGNETTE_BAYES"), "true")
fit_bh <- NULL
```
```{r brms-unavailable-notice, eval=!run_bayes, echo = FALSE, results = "asis"}
cat(
"> **Note.** The Bayesian regularized-horseshoe fit is skipped in ",
"the default vignette render to keep `R CMD check` fast and portable. ",
"Set `PICMORT_RUN_VIGNETTE_BAYES=true` on a host with `brms`, `rstan`, ",
"and `BH` installed to include the Bayesian model.\n",
sep = ""
)
```
> **Scope.** This vignette mirrors the calibration-first pediatric ICU
> mortality pipeline end-to-end on a small synthetic cohort
> (`inst/extdata/toy_cohort.rds`, n = 80, mortality = 10 %). Numbers
> shown here are illustrative and do not reproduce the manuscript
> results. The full pipeline on PIC v1.1.0 (n = 8,736) is shipped as
> Supplementary File S1 (`pipeline/full_pipeline.R`) alongside the
> manuscript bundle.
# What this vignette demonstrates
The calibration-first contract for `picMort` runs through four stages:
1. **Cohort.** First ICU stay per patient, age 0--18 y, in-hospital
mortality outcome --- the frozen contract from `vignette("cohort_spec",
package = "picMort")`. The toy cohort obeys the same schema and the
same invariants, downsized to keep the vignette fast.
2. **Features.** T0 + 24 h prediction window, demographics
plus vitals and labs aggregated by `min` / `max` / `mean` / `last`,
no length-of-stay, no discharge-time, no post-window leakage. On PIC
v1.1.0 this is `build_features(cohort, pic_paths())`; here, because
the toy cohort ships without raw `CHARTEVENTS` / `LABEVENTS`, the
vignette synthesizes a small, deterministic feature matrix with the
same shape and column conventions so the downstream pipeline is
exercised faithfully. The synthesis routine is local to this vignette
--- production runs go through `build_features()`.
3. **Models.** Penalized logistic regression (elastic net), XGBoost, and
Bayesian logistic regression with a regularized-horseshoe prior. The
Bayesian model is the package's headline functional output: every patient
carries a posterior 95 % credible interval on predicted mortality.
4. **Evaluation.** Calibration suite (slope, intercept, ICI,
calibration-in-the-large) and decision-curve analysis at 5 %, 10 %,
20 % thresholds are the manuscript headline. Discrimination metrics
(AUROC, AUPRC, Brier, Brier skill score) are supporting.
```{r libs, message = FALSE, warning = FALSE}
library(picMort)
library(data.table)
```
# Stage 1 --- load the toy cohort
```{r cohort}
toy_cohort_path <- system.file("extdata", "toy_cohort.rds", package = "picMort")
cohort <- readRDS(toy_cohort_path)
dim(cohort)
table(mortality = cohort$hospital_expire_flag)
```
The toy cohort obeys the same invariants as the production cohort, scaled
to a smaller envelope (50--150 rows, mortality 5--20 %).
```{r cohort-invariants}
toy_expectations <- list(
n_min = 50L,
n_max = 150L,
mortality_rate = c(0.05, 0.20),
age_range_years = c(0, 18),
sex_levels = c("F", "M"),
distinct_subject = TRUE,
no_overlap_stays = TRUE
)
assert_cohort_invariants(cohort, expected = toy_expectations)
```
# Stage 2 --- build a small illustrative feature matrix
On PIC v1.1.0 the call is
`features <- build_features(cohort, pic_paths(), window_hours = 24L)`. The
toy cohort ships without raw event tables, so this vignette assembles a
matrix of the same shape from the cohort columns plus deterministic
synthetic vitals and labs. The hard rule is identical to production:
**no LOS, no discharge-time, no post-window timestamp ever enters the
feature matrix.**
```{r synth-features}
make_toy_features <- function(cohort, seed = 20260517L) {
set.seed(seed)
n <- nrow(cohort)
baseline_hr <- 130 - 4 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 8)
baseline_rr <- 40 - 1.5 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 4)
baseline_sbp <- 80 + 2.5 * pmin(cohort$age_years, 15) + stats::rnorm(n, 0, 10)
baseline_spo2 <- pmin(100, 97 + stats::rnorm(n, 0, 1.5))
baseline_lactate <- pmax(0.5, 1.4 + 0.6 * cohort$hospital_expire_flag +
stats::rnorm(n, 0, 0.4))
x <- data.table::data.table(
icustay_id = cohort$icustay_id,
age_months = cohort$age_months,
age_years = cohort$age_years,
sex_male = as.integer(cohort$sex == "M"),
is_surgical = as.integer(cohort$is_surgical),
primary_icd_chapter = cohort$primary_icd_chapter,
hr_min = baseline_hr - abs(stats::rnorm(n, 5, 2)),
hr_max = baseline_hr + abs(stats::rnorm(n, 8, 3)),
hr_mean = baseline_hr,
hr_last = baseline_hr + stats::rnorm(n, 0, 4),
rr_min = baseline_rr - abs(stats::rnorm(n, 2, 1)),
rr_max = baseline_rr + abs(stats::rnorm(n, 3, 1)),
rr_mean = baseline_rr,
rr_last = baseline_rr + stats::rnorm(n, 0, 2),
sbp_min = baseline_sbp - abs(stats::rnorm(n, 6, 2)),
sbp_max = baseline_sbp + abs(stats::rnorm(n, 8, 2)),
sbp_mean = baseline_sbp,
sbp_last = baseline_sbp + stats::rnorm(n, 0, 5),
spo2_min = pmin(100, baseline_spo2 - abs(stats::rnorm(n, 2, 1))),
spo2_max = pmin(100, baseline_spo2 + abs(stats::rnorm(n, 1, 0.5))),
spo2_mean = baseline_spo2,
spo2_last = pmin(100, baseline_spo2 + stats::rnorm(n, 0, 1)),
lactate_min = baseline_lactate - abs(stats::rnorm(n, 0.2, 0.1)),
lactate_max = baseline_lactate + abs(stats::rnorm(n, 0.4, 0.2)),
lactate_last = baseline_lactate + stats::rnorm(n, 0, 0.2)
)
list(
x = x,
y = as.integer(cohort$hospital_expire_flag),
window_hours = 24L,
feature_set = "toy"
)
}
features <- make_toy_features(cohort)
dim(features$x)
```
`features$x` carries the `icustay_id` join key plus demographics, vitals
and a single representative lab. Production runs ship dozens more lab
panels; the shape is otherwise identical to `build_features()` output.
# Stage 3 --- stratified train / test split
```{r split}
split <- make_train_test_split(features, prop = 0.7, seed = 20260517L)
length(split$train_idx)
length(split$test_idx)
```
# Stage 4 --- fit the three models
Run times below are intentionally aggressive to keep the vignette under
two minutes. Production runs use wider hyperparameter grids, 1,000
bootstrap replicates for CIs, and four MCMC chains with 4,000
iterations for the Bayesian fit.
```{r fit-glmnet}
fit_en <- fit_elastic_net(features, split$train_idx, seed = 20260517L)
c(alpha = fit_en$best_alpha, lambda = fit_en$best_lambda)
```
```{r fit-xgb}
fit_xgb <- fit_xgboost(features, split$train_idx, seed = 20260517L)
fit_xgb$best_nrounds
```
```{r fit-bayes, message = FALSE, warning = FALSE, results = "hide", eval=run_bayes}
fit_bh <- fit_bayes_horseshoe(
features, split$train_idx,
chains = 2L, iter = 500L,
par_ratio = 0.10, adapt_delta = 0.95,
seed = 20260517L
)
```
```{r fit-bayes-summary, eval=run_bayes}
fit_bh$chains
fit_bh$iter
```
# Stage 5 --- predict on the held-out fold
The test fold is touched once. Predictions are aligned by
`icustay_id`; the Bayesian model additionally returns a 95 % credible
interval on the predicted probability for every test-fold patient.
```{r predict}
x_test <- features$x[split$test_idx, ]
y_test <- features$y[split$test_idx]
pred_en <- predict_mortality(fit_en, x_test)
pred_xgb <- predict_mortality(fit_xgb, x_test)
if (!is.null(fit_bh)) {
pred_bh <- predict_mortality(fit_bh, x_test)
head(pred_bh, 3)
} else {
pred_bh <- NULL
}
```
# Stage 6 --- evaluation (the manuscript headline)
```{r eval, warning = FALSE}
probs <- list(
elastic_net = pred_en$prob_raw,
xgboost = pred_xgb$prob_raw
)
if (!is.null(pred_bh)) {
probs$bayes_horseshoe <- pred_bh$prob_raw
}
calib <- lapply(probs, calibration_suite,
y = y_test, n_boot = 200L, seed = 20260517L)
dca <- decision_curve(probs, y_test,
thresholds = c(0.05, 0.10, 0.20),
plot_grid = FALSE)
disc <- discrimination_metrics(probs, y_test,
reference = "elastic_net",
n_boot = 200L, seed = 20260517L)
```
## Calibration headline
```{r calib-table}
calib_tbl <- data.table::rbindlist(lapply(names(calib), function(nm) {
data.table::data.table(
model = nm,
slope = calib[[nm]]$slope[["estimate"]],
intercept = calib[[nm]]$intercept[["estimate"]],
ici = calib[[nm]]$ici[["estimate"]],
cit_large = calib[[nm]]$cit_large[["estimate"]]
)
}))
knitr::kable(calib_tbl, digits = 3,
caption = "Calibration on the toy-cohort held-out fold.")
```
## Decision-curve net benefit at clinically relevant thresholds
```{r dca-table}
dca_headline <- dca[threshold %in% c(0.05, 0.10, 0.20) & type == "model"]
knitr::kable(dca_headline, digits = 4,
caption = "Net benefit at 5%, 10%, 20% thresholds.")
```
## Discrimination (supporting)
```{r disc-table}
knitr::kable(disc, digits = 3,
caption = "Discrimination metrics with bootstrap 95% CIs.")
```
# How this maps to the manuscript
This vignette walks the same four-stage pipeline that drives the
calibration-first *Pediatric Critical Care Medicine* manuscript bundle
(Supplementary File S1, `pipeline/full_pipeline.R`). The
production bundle:
- runs on PIC v1.1.0 (n = 8,736, mortality 0.0844) rather than the 80-row
toy cohort,
- uses the full feature panel from `build_features()` (vitals + labs from
`CHARTEVENTS` and `LABEVENTS`, 13 panels) under the same T0
+ 24 h prediction window,
- adds the PIM3 baseline reconstructed via `compute_pim3()` (Gate G3),
- runs the Bayesian fit with four chains and 4,000 iterations, with
patient-level 95 % credible intervals as the clinical-functional
headline,
- reports calibration and decision-curve analysis as the primary
outcome, with discrimination supporting.
Numbers in the tables above are not stable across runs of the toy
synthesizer and must not be quoted in any paper. The toy cohort exists
to let `R CMD check` exercise the package surface end-to-end and to give
new contributors a runnable scaffold.
```{r session}
sessionInfo()
```