Package 'picMort' reference manual

Title:	Reproducible Pediatric ICU Mortality Benchmark on PIC V1.1.0
Description:	Tools and pipelines for a calibration-first pediatric ICU mortality benchmark on the open-access PIC v1.1.0 database. Authoritative cohort specification, T+24h prediction-window feature extraction, PIM3 reconstruction, calibration and decision-curve evaluation. The package is the methodological substrate for the companion manuscript, illustrated end to end in a worked vignette.
Authors:	Max Moldovan [aut, cre] (ORCID: <https://orcid.org/0000-0001-9680-8474>), Usman Iqbal [aut, ths]
Maintainer:	Max Moldovan <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2026-07-03 21:00:14 UTC
Source:	https://github.com/max578/picMort

Cohort invariants (used by tests and as a sanity gate at runtime)

Description

Verifies that a cohort data.table satisfies the structural and epidemiological invariants committed in vignettes/cohort_spec.Rmd. Raises a structured error listing every violation; passes silently (invisibly returning TRUE) when all invariants hold.

Usage

assert_cohort_invariants(cohort, expected = default_cohort_expectations())
assert_cohort_invariants(cohort, expected = default_cohort_expectations())

Arguments

cohort

A cohort data.table from build_cohort().

expected

Named list of expected ranges. Defaults to the values committed in vignettes/cohort_spec.Rmd.

Value

Invisibly returns TRUE if all invariants hold; raises an error listing every violation otherwise.

Examples

toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
toy_expectations <- list(
  n_min            = 50L,
  n_max            = 150L,
  mortality_rate   = c(0.05, 0.20),
  age_range_years  = c(0, 18),
  sex_levels       = c("F", "M"),
  distinct_subject = TRUE,
  no_overlap_stays = TRUE
)
assert_cohort_invariants(toy, expected = toy_expectations)
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
toy_expectations <- list(
  n_min            = 50L,
  n_max            = 150L,
  mortality_rate   = c(0.05, 0.20),
  age_range_years  = c(0, 18),
  sex_levels       = c("F", "M"),
  distinct_subject = TRUE,
  no_overlap_stays = TRUE
)
assert_cohort_invariants(toy, expected = toy_expectations)

Audit feature matrix for prediction-window leakage

Description

Checks (i) no variable name contains a forbidden temporal token (LOS, discharge, deathtime, etc.); (ii) every dictionary entry declares the same window_hours; (iii) optional provenance check on raw events confirms every observation occurred in ⁠[0, window)⁠.

Usage

audit_no_leakage(feature_dict, raw_events = NULL, window_hours = 24L)
audit_no_leakage(feature_dict, raw_events = NULL, window_hours = 24L)

Arguments

feature_dict

Feature dictionary from build_features().

raw_events

Optional list of long-format event tables. Each table must contain either numeric t_hours offsets from ICU admission, or POSIXct columns t and intime from which the offset can be derived. When present, the function verifies that every event is strictly within ⁠[0, window_hours)⁠.

window_hours

Prediction-window length used at extraction.

Value

Invisibly TRUE on pass; raises a structured error on fail.

Examples

# A minimal in-spec dictionary
ok_dict <- data.table::data.table(
  variable       = c("age_years", "hr_min", "spo2_min"),
  source         = c("cohort", "CHARTEVENTS", "CHARTEVENTS"),
  transformation = c("static", "min over [0,24)h", "min over [0,24)h"),
  clinical_group = c("demographics", "vitals", "vitals"),
  window_hours   = 24L
)
audit_no_leakage(ok_dict, window_hours = 24L)

# A leaky dictionary (LOS is forbidden post-window information)
bad_dict <- data.table::copy(ok_dict)
bad_dict <- rbind(bad_dict,
  data.table::data.table(variable = "los_hours", source = "cohort",
                         transformation = "static",
                         clinical_group = "demographics",
                         window_hours = 24L))
tryCatch(audit_no_leakage(bad_dict, window_hours = 24L),
         error = function(e) conditionMessage(e))
# A minimal in-spec dictionary
ok_dict <- data.table::data.table(
  variable       = c("age_years", "hr_min", "spo2_min"),
  source         = c("cohort", "CHARTEVENTS", "CHARTEVENTS"),
  transformation = c("static", "min over [0,24)h", "min over [0,24)h"),
  clinical_group = c("demographics", "vitals", "vitals"),
  window_hours   = 24L
)
audit_no_leakage(ok_dict, window_hours = 24L)

# A leaky dictionary (LOS is forbidden post-window information)
bad_dict <- data.table::copy(ok_dict)
bad_dict <- rbind(bad_dict,
  data.table::data.table(variable = "los_hours", source = "cohort",
                         transformation = "static",
                         clinical_group = "demographics",
                         window_hours = 24L))
tryCatch(audit_no_leakage(bad_dict, window_hours = 24L),
         error = function(e) conditionMessage(e))

Build the canonical study cohort

Description

Constructs the canonical study cohort per vignettes/cohort_spec.Rmd: first ICU stay per patient; age 0-18 y at admission; valid ICU admit/discharge timestamps; in-hospital mortality outcome attached.

Cohort assembly is the single source of truth for the calibration-first analysis. Deviations are forbidden; sensitivity analyses operate on the same base cohort with documented filters.

Usage

build_cohort(
  paths,
  min_los_hours = 24L,
  prediction_window_hours = min_los_hours,
  verbose = TRUE
)
build_cohort(
  paths,
  min_los_hours = 24L,
  prediction_window_hours = min_los_hours,
  verbose = TRUE
)

Arguments

paths

Output of pic_paths().

min_los_hours

Minimum ICU length-of-stay in hours required for the prediction-window framing. Default 24 (T+24h lock). Set to 12 for the T+12h sensitivity arm.

prediction_window_hours

Prediction window in hours. Used by is_surgical to filter SURGERY_VITAL_SIGNS evidence to rows that occur strictly before T0 + prediction_window_hours. Without this filter, surgery evidence appearing only after the prediction-window close leaks future information. Default = min_los_hours (T+24h main analysis); pass 12L for the T+12h sensitivity arm.

verbose

Logical. Emit cohort-attrition log lines.

Value

A data.table keyed by subject_id, hadm_id, icustay_id. The cohort table carries an "attrition" attribute documenting the exclusion cascade, accessed via cohort_attrition().

Examples

# `build_cohort()` reads the registered PIC v1.1.0 CSVs. A pre-built
# synthetic 80-row toy cohort with the same schema ships in
# `inst/extdata/toy_cohort.rds` for examples and tests.
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
nrow(toy)
names(toy)

## Not run: 
paths  <- pic_paths()
cohort <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
assert_cohort_invariants(cohort)

## End(Not run)
# `build_cohort()` reads the registered PIC v1.1.0 CSVs. A pre-built
# synthetic 80-row toy cohort with the same schema ships in
# `inst/extdata/toy_cohort.rds` for examples and tests.
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
nrow(toy)
names(toy)

## Not run: 
paths  <- pic_paths()
cohort <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
assert_cohort_invariants(cohort)

## End(Not run)

Build T+window prediction-window feature matrix

Description

Extracts demographics (from cohort), vitals (CHARTEVENTS), and labs (LABEVENTS) within window_hours of ICU admission. No feature uses any timestamp at or after intime + window_hours. audit_no_leakage() runs as a runtime invariant.

Usage

build_features(
  cohort,
  paths,
  window_hours = 24L,
  feature_set = c("simple", "rich")
)
build_features(
  cohort,
  paths,
  window_hours = 24L,
  feature_set = c("simple", "rich")
)

Arguments

cohort

A cohort data.table from build_cohort().

paths

Output of pic_paths().

window_hours

Prediction-window length in hours. Default 24.

feature_set

One of "simple" (default; min/max/mean/last + missingness indicators) or "rich" (adds slopes / count rates; reserved for sensitivity analysis).

Value

A list with elements:

x - feature data.table keyed on icustay_id
y - outcome integer vector aligned to x$icustay_id
dict - feature dictionary (data.table)
audit - result of audit_no_leakage() (TRUE on pass)
window_hours, feature_set

Examples

## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
dim(features$x)
table(features$y)

## End(Not run)
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
dim(features$x)
table(features$y)

## End(Not run)

Calibration suite (the manuscript headline)

Description

Computes calibration slope, intercept, integrated calibration index (ICI), calibration-in-the-large, and a smoothed calibration curve (loess). Bootstrap percentile CIs (1,000 reps default) on every point estimate.

Definitions:

Calibration slope: logistic regression of y ~ logit(prob), slope coefficient. Ideal = 1; <1 indicates over-fitting.
Calibration intercept: logistic regression of y ~ offset(logit(prob)), intercept. Ideal = 0; >0 indicates systematic under-prediction.
Calibration-in-the-large: log(prev_obs / prev_pred). Ideal = 0.
ICI: mean absolute difference between smoothed observed and predicted probabilities.

Usage

calibration_suite(prob, y, n_boot = 1000L, seed = 20260508L)
calibration_suite(prob, y, n_boot = 1000L, seed = 20260508L)

Arguments

prob

Numeric vector of predicted probabilities.

y

Outcome (0/1 integer).

n_boot

Bootstrap replicates; default 1000.

seed

Integer seed; default 20260508.

Value

A list with elements slope, intercept, ici, cit_large (each named c(estimate, lower, upper)), curve (a data.frame for plotting), and n, n_events.

Examples

set.seed(20260517L)
n <- 200L
prob <- stats::plogis(stats::rnorm(n, mean = -2, sd = 1))
y    <- stats::rbinom(n, 1L, prob)
cal  <- calibration_suite(prob, y, n_boot = 50L)
cal$slope
cal$ici
head(cal$curve)
set.seed(20260517L)
n <- 200L
prob <- stats::plogis(stats::rnorm(n, mean = -2, sd = 1))
y    <- stats::rbinom(n, 1L, prob)
cal  <- calibration_suite(prob, y, n_boot = 50L)
cal$slope
cal$ici
head(cal$curve)

Cohort attrition table (for the manuscript Methods figure)

Description

Returns the exclusion cascade documenting how many ICU stays were dropped at each cohort-filter step. Surfaces the same data carried as the "attrition" attribute on build_cohort()'s output.

Usage

cohort_attrition(paths, min_los_hours = 24L)
cohort_attrition(paths, min_los_hours = 24L)

Arguments

paths

Output of pic_paths().

min_los_hours

See build_cohort().

Value

A data.table of (step, n, excluded, reason).

Examples

## Not run: 
paths <- pic_paths()
attrition <- cohort_attrition(paths, min_los_hours = 24L)
print(attrition)

## End(Not run)
## Not run: 
paths <- pic_paths()
attrition <- cohort_attrition(paths, min_los_hours = 24L)
print(attrition)

## End(Not run)

Reconstruct PIM3 from PIC fields

Description

Computes the Pediatric Index of Mortality 3 (Straney et al. 2013) from PIC CHARTEVENTS and the cohort table. Components that cannot be recovered from PIC default to 0 (Straney convention) and are listed in proxy_flags.

Demonstrating the linear-predictor calculation without PIC source files (this is what compute_pim3() produces internally per patient). This is the ANZICS PIM2 & PIM3 booklet's worked example (Jan 2019, p. 11): a 6 y-old girl, leukaemia post-1st-induction, intubated, SBP 70 mmHg, PaO2 65 mmHg, FiO2 0.7, base excess −12 mmol/L, reactive pupils, non-elective. The booklet gives PIM3val = −0.11114 and risk of death = 47.22%.

beta <- picMort:::pim3_coefficients()
logit <-
  beta$intercept +                       # -1.7928
  beta$pupils_fixed          * 0 +        # reactive pupils
  beta$elective              * 0 +        # non-elective
  beta$mech_vent             * 1 +        # intubated
  beta$base_excess_abs       * 12 +       # |-12|
  beta$sbp_linear            * 70 +       # SBP 70 mmHg
  beta$sbp_squared_over_1000 * (70 * 70 / 1000) +
  beta$fio2_pao2             * (100 * 0.7 / 65) +  # 100*FiO2/PaO2
  beta$recov_card_byp        * 0 +
  beta$recov_card_nonbyp     * 0 +
  beta$recov_noncard         * 0 +
  beta$very_high_risk_dx     * 1          # leukaemia after 1st induction
stats::plogis(logit)  # ≈ 0.4722

Usage

compute_pim3(cohort, paths, window_hours = 1L)
compute_pim3(cohort, paths, window_hours = 1L)

Arguments

cohort

A cohort data.table from build_cohort().

paths

Output of pic_paths().

window_hours

Time window in hours for first-hour PIM3 components. Straney uses 1 hour; we follow.

Value

A data.table keyed on icustay_id with columns: pim3_logit (numeric), pim3 (probability), risk_group (factor: low / default / high / very_high), sbp_used (numeric), proxy_flags (list-column; vector of components that defaulted).

References

Straney L et al. (2013). PIM3: an updated Pediatric Index of Mortality. Pediatr Crit Care Med 14(7):673-681.

Examples

## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
pim3_tbl <- compute_pim3(cohort, paths)
summary(pim3_tbl$pim3)
table(pim3_tbl$risk_group)

## End(Not run)
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
pim3_tbl <- compute_pim3(cohort, paths)
summary(pim3_tbl$pim3)
table(pim3_tbl$risk_group)

## End(Not run)

Decision-curve analysis at clinically meaningful thresholds

Description

Computes net benefit per threshold per model alongside the "treat-all" / "treat-none" references. Net benefit is computed analytically (no external package dependency); equivalent to dcurves::dca().

Usage

decision_curve(
  probs,
  y,
  thresholds = c(0.05, 0.1, 0.2),
  plot_grid = TRUE,
  n_boot = 0L,
  seed = 20260508L
)
decision_curve(
  probs,
  y,
  thresholds = c(0.05, 0.1, 0.2),
  plot_grid = TRUE,
  n_boot = 0L,
  seed = 20260508L
)

Arguments

probs

Named list of predicted-probability vectors.

y

Outcome (0/1 integer).

thresholds

Numeric vector of threshold probabilities; default c(0.05, 0.10, 0.20) plus a fine grid for plotting.

plot_grid

If TRUE (default), additionally returns a fine grid of thresholds (every 1 %) for plot rendering.

n_boot

Bootstrap replicates for paired CIs on net benefit at the prespecified thresholds and on the Brier skill score. Default 0 (no CIs); typical production value 1000.

seed

Integer seed for the bootstrap (used only when n_boot > 0). Default 20260508.

Value

A data.table of (model, threshold, net_benefit, type) when n_boot = 0, where type is "model", "all", or "none". When n_boot > 0, additional rows with ⁠(model, metric, estimate, lower, upper)⁠ columns carry paired bootstrap 95 % CIs on net benefit at each prespecified threshold and on the Brier skill score.

Examples

set.seed(20260517L)
n <- 200L
p_a <- stats::plogis(stats::rnorm(n, -2, 1))
p_b <- stats::plogis(stats::rnorm(n, -2, 1.4))
y   <- stats::rbinom(n, 1L, (p_a + p_b) / 2)
dca <- decision_curve(list(model_a = p_a, model_b = p_b), y,
                      thresholds = c(0.05, 0.10, 0.20),
                      plot_grid = FALSE)
dca
set.seed(20260517L)
n <- 200L
p_a <- stats::plogis(stats::rnorm(n, -2, 1))
p_b <- stats::plogis(stats::rnorm(n, -2, 1.4))
y   <- stats::rbinom(n, 1L, (p_a + p_b) / 2)
dca <- decision_curve(list(model_a = p_a, model_b = p_b), y,
                      thresholds = c(0.05, 0.10, 0.20),
                      plot_grid = FALSE)
dca

Recipe for preprocessing

Description

Returns a recipes::recipe encoding canonical preprocessing: median imputation for continuous predictors, mode imputation for categorical predictors, near-zero-variance drop, dummy-encoding, centring + scaling. Fit on training fold; applied to test fold via prep() / bake().

Usage

default_recipe(x, y)
default_recipe(x, y)

Arguments

x

Feature data.table from build_features().

y

Outcome integer vector aligned to x$icustay_id.

Value

A recipes::recipe object.

Examples

if (requireNamespace("recipes", quietly = TRUE)) {
  set.seed(20260517L)
  x <- data.table::data.table(
    icustay_id = 1:20,
    age_years  = runif(20, 0, 18),
    hr_mean    = rnorm(20, 100, 15),
    sex_male   = rbinom(20, 1L, 0.5)
  )
  y <- rbinom(20, 1L, 0.1)
  rec <- default_recipe(x, y)
  class(rec)
}
if (requireNamespace("recipes", quietly = TRUE)) {
  set.seed(20260517L)
  x <- data.table::data.table(
    icustay_id = 1:20,
    age_years  = runif(20, 0, 18),
    hr_mean    = rnorm(20, 100, 15),
    sex_male   = rbinom(20, 1L, 0.5)
  )
  y <- rbinom(20, 1L, 0.1)
  rec <- default_recipe(x, y)
  class(rec)
}

Discrimination metrics with bootstrap CIs (supporting role)

Description

AUROC, AUPRC, Brier score, and Brier skill score relative to a reference model.

Usage

discrimination_metrics(
  probs,
  y,
  reference = NULL,
  n_boot = 1000L,
  seed = 20260508L
)
discrimination_metrics(
  probs,
  y,
  reference = NULL,
  n_boot = 1000L,
  seed = 20260508L
)

Arguments

probs

Named list of predicted-probability vectors aligned to y.

y

Outcome (0/1 integer).

reference

Name in names(probs) to use for the Brier-skill- score reference. Default "pim3" if present, else first model.

n_boot

Bootstrap replicates; default 1000.

seed

Integer seed; default 20260508.

Value

A data.table of (model, metric, estimate, lower, upper).

Examples

set.seed(20260517L)
n <- 200L
p_a <- stats::plogis(stats::rnorm(n, -2, 1))
p_b <- stats::plogis(stats::rnorm(n, -2, 1.4))
y   <- stats::rbinom(n, 1L, (p_a + p_b) / 2)
discrimination_metrics(list(model_a = p_a, model_b = p_b), y,
                       reference = "model_a", n_boot = 50L)
set.seed(20260517L)
n <- 200L
p_a <- stats::plogis(stats::rnorm(n, -2, 1))
p_b <- stats::plogis(stats::rnorm(n, -2, 1.4))
y   <- stats::rbinom(n, 1L, (p_a + p_b) / 2)
discrimination_metrics(list(model_a = p_a, model_b = p_b), y,
                       reference = "model_a", n_boot = 50L)

Fit Bayesian logistic regression with regularized horseshoe prior

Description

Uses brms::brm() with a Bernoulli likelihood and a regularized horseshoe prior on the coefficients (Carvalho 2010 / Piironen & Vehtari 2017). The prior shrinks irrelevant coefficients aggressively while leaving informative ones effectively unregularized; the posterior gives every patient a credible interval on predicted mortality probability – the package's headline functional output.

Usage

fit_bayes_horseshoe(
  features,
  train_idx,
  chains = 2L,
  iter = 1000L,
  par_ratio = 0.1,
  adapt_delta = 0.95,
  cores = chains,
  seed = 20260508L
)
fit_bayes_horseshoe(
  features,
  train_idx,
  chains = 2L,
  iter = 1000L,
  par_ratio = 0.1,
  adapt_delta = 0.95,
  cores = chains,
  seed = 20260508L
)

Arguments

features

Output of build_features().

train_idx

Integer vector of training-row indices.

chains

MCMC chains. Default 2.

iter

Total iterations per chain (warmup + sampling). Default 1000.

par_ratio

Prior on the proportion of non-zero coefficients (regularized-horseshoe par_ratio). Default 0.1 (~10 % expected to be non-zero from ~100 candidate predictors).

adapt_delta

NUTS adaptation. Default 0.95.

cores

Number of cores. Default = chains.

seed

Integer seed; default 20260508.

Value

A list with model (a brmsfit), prep, predictors, chains, iter, seed, and type = "bayes_horseshoe".

Examples

# `fit_bayes_horseshoe()` compiles a Stan model via `brms`. Compilation
# alone runs 30 s - 3 min and a minimal 1-chain / 200-iter sample
# another 1-5 min, so the example is wrapped in `\dontrun{}` rather than
# `\donttest{}` -- `R CMD check --run-donttest` would otherwise need a
# fully provisioned rstan toolchain on every CI worker. The brms smoke
# test in `tests/testthat/test-fits.R` runs only when the environment
# variable `PICMORT_TEST_BAYES=true` is set.
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_bayes_horseshoe(features, split$train_idx,
                                chains = 1L, iter = 200L)
fit$type

## End(Not run)
# `fit_bayes_horseshoe()` compiles a Stan model via `brms`. Compilation
# alone runs 30 s - 3 min and a minimal 1-chain / 200-iter sample
# another 1-5 min, so the example is wrapped in `\dontrun{}` rather than
# `\donttest{}` -- `R CMD check --run-donttest` would otherwise need a
# fully provisioned rstan toolchain on every CI worker. The brms smoke
# test in `tests/testthat/test-fits.R` runs only when the environment
# variable `PICMORT_TEST_BAYES=true` is set.
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_bayes_horseshoe(features, split$train_idx,
                                chains = 1L, iter = 200L)
fit$type

## End(Not run)

Fit penalized logistic regression (elastic net)

Description

Inner CV over the (alpha, lambda) grid; class-weighted to handle the ~7-9 % pediatric ICU mortality rate. Selects best alpha by minimum CV deviance and uses lambda.1se for parsimony.

Usage

fit_elastic_net(
  features,
  train_idx,
  alpha_grid = seq(0, 1, by = 0.2),
  nfolds = 5L,
  seed = 20260508L
)
fit_elastic_net(
  features,
  train_idx,
  alpha_grid = seq(0, 1, by = 0.2),
  nfolds = 5L,
  seed = 20260508L
)

Arguments

features

Output of build_features().

train_idx

Integer vector of training-row indices.

alpha_grid

Numeric vector of alpha values; default seq(0, 1, 0.2).

nfolds

Inner CV folds; default 5.

seed

Integer seed; default 20260508.

Value

A list with model, prep, predictors, best_alpha, best_lambda, cv_log, and type = "glmnet".

Examples

## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_elastic_net(features, split$train_idx)
fit$best_alpha
fit$best_lambda

## End(Not run)
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_elastic_net(features, split$train_idx)
fit$best_alpha
fit$best_lambda

## End(Not run)

Fit XGBoost classifier

Description

Inner CV over a small principled grid. Class imbalance handled via scale_pos_weight = n_neg / n_pos. Returns the best parameter set and the corresponding fitted model.

Usage

fit_xgboost(features, train_idx, grid = NULL, nfolds = 5L, seed = 20260508L)
fit_xgboost(features, train_idx, grid = NULL, nfolds = 5L, seed = 20260508L)

Arguments

features

Output of build_features().

train_idx

Integer vector of training-row indices.

grid

data.frame of hyperparameter combinations. If NULL, uses the canonical 4-row grid documented in vignettes/paper1_baseline.Rmd.

nfolds

Inner CV folds; default 5.

seed

Integer seed; default 20260508.

Value

A list with model, prep, predictors, best_params, best_nrounds, cv_log, scale_pos_weight, and type = "xgboost".

Examples

## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_xgboost(features, split$train_idx)
fit$best_nrounds

## End(Not run)
## Not run: 
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_xgboost(features, split$train_idx)
fit$best_nrounds

## End(Not run)

Map an ICD-10 code to its WHO chapter

Description

Vectorised. Accepts either ICD-10 codes with a decimal point (e.g. "K52.901", "J18.900") or without (e.g. "K52901"). The chapter is determined by the leading letter and the two leading digits per the ICD-10 chapter ranges.

Usage

icd10_to_chapter(code)
icd10_to_chapter(code)

Arguments

code

Character vector of ICD-10 codes. NA codes return NA.

Value

Character vector of chapter names; one of: "infectious", "neoplasms", "blood", "endocrine", "mental", "nervous", "eye", "ear", "circulatory", "respiratory", "digestive", "skin", "musculoskeletal", "genitourinary", "pregnancy", "perinatal", "congenital", "symptoms", "injury", "external", "factors", "special", "unknown".

Examples

icd10_to_chapter(c("J18.900", "K52901", "C91.0", "I46", NA_character_))
# Round-trips with or without the decimal separator
identical(icd10_to_chapter("J18.900"), icd10_to_chapter("J18900"))
icd10_to_chapter(c("J18.900", "K52901", "C91.0", "I46", NA_character_))
# Round-trips with or without the decimal separator
identical(icd10_to_chapter("J18.900"), icd10_to_chapter("J18900"))

Stratified train / test split

Description

Outcome-stratified single split into training and test folds via rsample::initial_split(). Returns 1-based integer indices into features$y (and rows of features$x).

Usage

make_train_test_split(features, prop = 0.7, seed = 20260508L)
make_train_test_split(features, prop = 0.7, seed = 20260508L)

Arguments

features

Output of build_features().

prop

Proportion in the training fold. Default 0.7.

seed

Integer seed; default 20260508.

Value

A list with train_idx, test_idx (integer vectors).

Examples

if (requireNamespace("rsample", quietly = TRUE)) {
  set.seed(20260517L)
  features <- list(y = c(rep(0L, 90), rep(1L, 10)))
  split <- make_train_test_split(features, prop = 0.7, seed = 1L)
  length(split$train_idx) + length(split$test_idx) # 100
  mean(features$y[split$train_idx]) # ~ 0.10
}
if (requireNamespace("rsample", quietly = TRUE)) {
  set.seed(20260517L)
  features <- list(y = c(rep(0L, 90), rep(1L, 10)))
  split <- make_train_test_split(features, prop = 0.7, seed = 1L)
  length(split$train_idx) + length(split$test_idx) # 100
  mean(features$y[split$train_idx]) # ~ 0.10
}

Resolve PIC v1.1.0 source paths

Description

Returns a named list of canonical paths to PIC CSVs. If the PICMORT_DATA_DIR environment variable is set, it is treated as the directory containing the PIC v1.1.0 CSVs. Otherwise paths are resolved through the project-local ⁠data_links/pic_v110/⁠ symlink.

Usage

pic_paths(root = NULL, check = TRUE)
pic_paths(root = NULL, check = TRUE)

Arguments

root

Path to the directory containing ⁠data_links/⁠. Defaults to PICMORT_DATA_DIR when that environment variable is set, or to the project root containing ⁠data_links/⁠ otherwise.

check

Logical. If TRUE (default), verifies that every referenced CSV exists and is readable.

Value

Named list with elements admissions, patients, icustays, chartevents, labevents, diagnoses_icd, d_items, d_icd_diagnoses, d_labitems, prescriptions, inputevents, outputevents, microbiologyevents, surgery_vital_signs, or_exam_reports, emr_symptoms.

Examples

# Requires the registered PIC v1.1.0 source, either via
# `PICMORT_DATA_DIR` or `<project root>/data_links/pic_v110/`.
# Run `check = FALSE` to inspect the expected file names without
# touching disk.
## Not run: 
paths <- pic_paths()
names(paths)

## End(Not run)

# File-name discovery without verifying existence
tmp <- tempfile(); dir.create(file.path(tmp, "data_links", "pic_v110"),
                              recursive = TRUE)
paths <- pic_paths(root = tmp, check = FALSE)
names(paths)
unlink(tmp, recursive = TRUE)
# Requires the registered PIC v1.1.0 source, either via
# `PICMORT_DATA_DIR` or `<project root>/data_links/pic_v110/`.
# Run `check = FALSE` to inspect the expected file names without
# touching disk.
## Not run: 
paths <- pic_paths()
names(paths)

## End(Not run)

# File-name discovery without verifying existence
tmp <- tempfile(); dir.create(file.path(tmp, "data_links", "pic_v110"),
                              recursive = TRUE)
paths <- pic_paths(root = tmp, check = FALSE)
names(paths)
unlink(tmp, recursive = TRUE)

PIM3 face-validity check

Description

Reports the distribution of reconstructed PIM3 probabilities, the observed-vs-expected (O/E) ratio against actual cohort mortality, and a summary of which components defaulted to proxy values. The result is informational, not a hard gate – imperfect PIM3 reconstruction is normal when all components are not recoverable from the source database; the manuscript Limitations records which components were proxied.

Usage

pim3_face_validity(pim3_tbl, cohort)
pim3_face_validity(pim3_tbl, cohort)

Arguments

pim3_tbl

Output of compute_pim3().

cohort

Cohort data.table with hospital_expire_flag.

Value

A list with elements summary (named numeric of pim3 distribution stats), oe_ratio, oe_ci (Wilson 95% CI), proxy_freq (table of which proxies fired and how often), risk_group_counts (table), notes (character).

Examples

toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
# Build a minimal synthetic pim3 table matching compute_pim3()'s schema.
set.seed(20260517L)
pim3_tbl <- data.table::data.table(
  icustay_id  = toy$icustay_id,
  pim3_logit  = stats::rnorm(nrow(toy), mean = -2.5, sd = 0.5),
  pim3        = stats::plogis(stats::rnorm(nrow(toy), -2.5, 0.5)),
  risk_group  = factor(sample(c("default", "low", "high", "very_high"),
                              nrow(toy), replace = TRUE,
                              prob = c(0.7, 0.2, 0.08, 0.02)),
                       levels = c("default", "low", "high", "very_high")),
  sbp_used    = stats::rnorm(nrow(toy), 110, 20),
  proxy_flags = replicate(nrow(toy),
                          c("fio2_pao2", "base_excess"), simplify = FALSE)
)
fv <- pim3_face_validity(pim3_tbl, toy)
fv$summary
fv$oe_ratio
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
# Build a minimal synthetic pim3 table matching compute_pim3()'s schema.
set.seed(20260517L)
pim3_tbl <- data.table::data.table(
  icustay_id  = toy$icustay_id,
  pim3_logit  = stats::rnorm(nrow(toy), mean = -2.5, sd = 0.5),
  pim3        = stats::plogis(stats::rnorm(nrow(toy), -2.5, 0.5)),
  risk_group  = factor(sample(c("default", "low", "high", "very_high"),
                              nrow(toy), replace = TRUE,
                              prob = c(0.7, 0.2, 0.08, 0.02)),
                       levels = c("default", "low", "high", "very_high")),
  sbp_used    = stats::rnorm(nrow(toy), 110, 20),
  proxy_flags = replicate(nrow(toy),
                          c("fio2_pao2", "base_excess"), simplify = FALSE)
)
fv <- pim3_face_validity(pim3_tbl, toy)
fv$summary
fv$oe_ratio

PIM3 risk-group ICD-10 mapping (pediatric, simplified)

Description

Maps an ICD-10 code to one of "low", "high", "very_high", or NA_character_ (no risk-group assignment). Covers the most frequent pediatric admissions per Straney 2013 Tables 2-4. Codes not in the lookup return NA, treated as default-risk for PIM3.

Usage

pim3_risk_group(code)
pim3_risk_group(code)

Arguments

code

Character vector of ICD-10 codes.

Value

Character vector of risk-group labels (or NA).

Examples

pim3_risk_group(c("J45.0",   # asthma -> low
                  "I42.1",   # cardiomyopathy -> high
                  "I46",     # post-cardiac-arrest -> very_high
                  "K52.901", # unmapped -> NA
                  NA_character_))
pim3_risk_group(c("J45.0",   # asthma -> low
                  "I42.1",   # cardiomyopathy -> high
                  "I46",     # post-cardiac-arrest -> very_high
                  "K52.901", # unmapped -> NA
                  NA_character_))

Render the headline calibration plot

Description

Smoothed loess calibration curves per model with the ideal diagonal. Returns a ggplot object; save via ggplot2::ggsave().

Usage

plot_calibration(calib_list)
plot_calibration(calib_list)

Arguments

calib_list

Named list; each element is the output of calibration_suite() for one model.

Value

A ggplot object.

Examples

if (requireNamespace("ggplot2", quietly = TRUE)) {
  set.seed(20260517L)
  n <- 200L
  prob <- stats::plogis(stats::rnorm(n, -2, 1))
  y    <- stats::rbinom(n, 1L, prob)
  cal  <- calibration_suite(prob, y, n_boot = 25L)
  p    <- plot_calibration(list(model_a = cal))
  class(p)
}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  set.seed(20260517L)
  n <- 200L
  prob <- stats::plogis(stats::rnorm(n, -2, 1))
  y    <- stats::rbinom(n, 1L, prob)
  cal  <- calibration_suite(prob, y, n_boot = 25L)
  p    <- plot_calibration(list(model_a = cal))
  class(p)
}

Render the headline decision-curve plot

Description

Net benefit vs threshold across models, with treat-all and treat-none references.

Usage

plot_decision_curve(dca)
plot_decision_curve(dca)

Arguments

dca

A data.table from decision_curve() (with plot_grid = TRUE).

Value

A ggplot object.

Examples

if (requireNamespace("ggplot2", quietly = TRUE)) {
  set.seed(20260517L)
  n <- 200L
  p_a <- stats::plogis(stats::rnorm(n, -2, 1))
  y   <- stats::rbinom(n, 1L, p_a)
  dca <- decision_curve(list(model_a = p_a), y, plot_grid = TRUE)
  p   <- plot_decision_curve(dca)
  class(p)
}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  set.seed(20260517L)
  n <- 200L
  p_a <- stats::plogis(stats::rnorm(n, -2, 1))
  y   <- stats::rbinom(n, 1L, p_a)
  dca <- decision_curve(list(model_a = p_a), y, plot_grid = TRUE)
  p   <- plot_decision_curve(dca)
  class(p)
}

Predict mortality probabilities (unified interface)

Description

Routes to the appropriate predict implementation based on model$type. Returns a data.table with columns:

prob_raw – model output (probability of in-hospital mortality)
prob_calibrated – post-hoc calibrated (only XGBoost; same as raw otherwise)
prob_lower, prob_upper – 95 % credible interval (Bayesian only)

Usage

predict_mortality(model, new_data)
predict_mortality(model, new_data)

Arguments

model

Fit object from one of the ⁠fit_*()⁠ functions, OR a PIM3 data.table from compute_pim3().

new_data

New features$x-style data.table (with icustay_id and the same columns as the training features).

Value

A data.table aligned to nrow(new_data).

Examples

# PIM3 dispatch branch: predict_mortality() also accepts the PIM3
# data.table returned by compute_pim3(), bypassing model fits.
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
pim3_tbl <- data.table::data.table(
  icustay_id = toy$icustay_id,
  pim3       = stats::plogis(stats::rnorm(nrow(toy), mean = -2.5, sd = 0.5))
)
preds <- predict_mortality(pim3_tbl, toy[1:5, ])
preds

## Not run: 
# Full model-fit dispatch (requires the registered PIC data)
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_elastic_net(features, split$train_idx)
predict_mortality(fit, features$x[split$test_idx, ])

## End(Not run)
# PIM3 dispatch branch: predict_mortality() also accepts the PIM3
# data.table returned by compute_pim3(), bypassing model fits.
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
pim3_tbl <- data.table::data.table(
  icustay_id = toy$icustay_id,
  pim3       = stats::plogis(stats::rnorm(nrow(toy), mean = -2.5, sd = 0.5))
)
preds <- predict_mortality(pim3_tbl, toy[1:5, ])
preds

## Not run: 
# Full model-fit dispatch (requires the registered PIC data)
paths    <- pic_paths()
cohort   <- build_cohort(paths, min_los_hours = 24L, verbose = FALSE)
features <- build_features(cohort, paths, window_hours = 24L)
split    <- make_train_test_split(features)
fit      <- fit_elastic_net(features, split$train_idx)
predict_mortality(fit, features$x[split$test_idx, ])

## End(Not run)

Subgroup performance table

Description

Reports calibration + AUROC per pre-registered subgroup: age strata (<1, 1-5, 6-12, 13-18 y), surgical vs medical, and top-3 ICD chapters by frequency. Cells with fewer than n_min_events events are suppressed.

Usage

subgroup_performance(probs, cohort_test, n_min_events = 5L)
subgroup_performance(probs, cohort_test, n_min_events = 5L)

Arguments

probs

Named list of predicted-probability vectors aligned to cohort_test.

cohort_test

Cohort test fold (a data.table with age_years, is_surgical, primary_icd_chapter, hospital_expire_flag).

n_min_events

Minimum events per cell. Default 5.

Value

A data.table of (model, subgroup_var, level, n, n_events, auroc, ici, cal_slope).

Examples

toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
set.seed(20260517L)
probs <- list(model_a = stats::runif(nrow(toy)))
# n_min_events = 1L because the toy cohort only has 8 deaths.
subgroup_performance(probs, toy, n_min_events = 1L)
toy <- readRDS(system.file("extdata", "toy_cohort.rds",
                           package = "picMort"))
set.seed(20260517L)
probs <- list(model_a = stats::runif(nrow(toy)))
# n_min_events = 1L because the toy cohort only has 8 deaths.
subgroup_performance(probs, toy, n_min_events = 1L)

Package 'picMort'

Help Index

Cohort invariants (used by tests and as a sanity gate at runtime)

Description

Usage

Arguments

Value

Examples

Audit feature matrix for prediction-window leakage

Description

Usage

Arguments

Value

Examples

Build the canonical study cohort

Description

Usage

Arguments

Value

Examples

Build T+window prediction-window feature matrix

Description

Usage

Arguments

Value

Examples

Calibration suite (the manuscript headline)

Description

Usage

Arguments

Value

Examples

Cohort attrition table (for the manuscript Methods figure)

Description

Usage

Arguments

Value

Examples

Reconstruct PIM3 from PIC fields

Description

Usage

Arguments

Value

References

Examples

Decision-curve analysis at clinically meaningful thresholds

Description

Usage

Arguments

Value

Examples

Recipe for preprocessing

Description

Usage

Arguments

Value

Examples

Discrimination metrics with bootstrap CIs (supporting role)

Description

Usage

Arguments

Value

Examples

Fit Bayesian logistic regression with regularized horseshoe prior

Description

Usage

Arguments

Value

Examples

Fit penalized logistic regression (elastic net)

Description

Usage

Arguments

Value

Examples

Fit XGBoost classifier

Description

Usage

Arguments

Value