--- title: "Posterior-Predictive Check with MMD" author: "Max Moldovan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Posterior-Predictive Check with MMD} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 150 ) set.seed(1) ``` ## The question After running an iterative ensemble smoother (PESTO, PEST++, EnKF, ...) against historical observations, *does the calibrated model produce a posterior-predictive distribution that is consistent with a held-out year or paddock?* The standard answer in PEST++-style workflows is RMSE on the posterior-predictive mean. That can only catch differences in central tendency. A distribution can match in mean but fail dramatically in variance, skewness, or tail probability -- precisely the failures that matter for climate-risk decisions. `mmd_ppc()` answers the distributional version of the question: are the posterior-predictive draws and the held-out observations samples from the same distribution? It is a model-free, kernel two-sample test (MMD) plus a Bayesian-flavoured *surprise* diagnostic. ## The stub ensemble Until PESTO ships its native manifest emitter, we construct a `pesto_ensemble` object directly. This is the kernR-side schema for the cross-package contract. ```{r ensemble} library(kernR) # True data-generating distribution: bivariate (yield, biomass) n_obs <- 30L # Held-out observations from the "real" world truth_mean <- c(yield = 4.5, biomass = 12.0) truth_sd <- c(yield = 0.6, biomass = 1.5) observed <- cbind( yield = stats::rnorm(n_obs, truth_mean["yield"], truth_sd["yield"]), biomass = stats::rnorm(n_obs, truth_mean["biomass"], truth_sd["biomass"]) ) ``` ## Case A: well-calibrated posterior ```{r calibrated} M <- 300L post_good <- cbind( yield = stats::rnorm(M, truth_mean["yield"], truth_sd["yield"]), biomass = stats::rnorm(M, truth_mean["biomass"], truth_sd["biomass"]) ) ens_good <- pesto_ensemble( posterior = post_good, observed = observed, metadata = list(holdout_year = 2018L, ies_iters = 6L) ) ens_good fit_good <- mmd_ppc(ens_good, n_permutations = 299L, seed = 1L) fit_good ``` A small `surprise_bits` (well below `log2(B + 1)`) and a verdict of *consistent with observations* mean the calibrated model has not been falsified by the held-out data at the distributional level. ## Case B: miscalibrated posterior (variance too narrow) A common ensemble-smoother pathology is over-confidence: posterior draws clustered too tightly around the posterior mean. ```{r miscalibrated-variance} post_narrow <- cbind( yield = stats::rnorm(M, truth_mean["yield"], truth_sd["yield"] / 3), biomass = stats::rnorm(M, truth_mean["biomass"], truth_sd["biomass"] / 3) ) ens_narrow <- pesto_ensemble(post_narrow, observed, metadata = list(scenario = "narrow")) fit_narrow <- mmd_ppc(ens_narrow, n_permutations = 299L, seed = 1L) fit_narrow ``` The verdict here should be REJECT, with surprise well above the threshold of ~4.32 bits (the surprise corresponding to `p = 0.05`). ## Case C: miscalibrated posterior (mean-shifted) ```{r miscalibrated-mean} post_shifted <- cbind( yield = stats::rnorm(M, truth_mean["yield"] + 0.8, truth_sd["yield"]), biomass = stats::rnorm(M, truth_mean["biomass"] - 1.2, truth_sd["biomass"]) ) ens_shifted <- pesto_ensemble(post_shifted, observed) fit_shifted <- mmd_ppc(ens_shifted, n_permutations = 299L, seed = 1L) fit_shifted ``` ## Comparing surprise across scenarios ```{r summary-table} res <- list( calibrated = fit_good, narrow = fit_narrow, mean_shifted = fit_shifted ) data.frame( scenario = names(res), statistic = vapply(res, function(z) z$statistic, numeric(1)), p_value = vapply(res, function(z) z$p_value, numeric(1)), surprise_bits = vapply(res, function(z) z$surprise_bits, numeric(1)), reject = vapply(res, function(z) z$reject, logical(1)), row.names = NULL ) ``` ## Visual diagnostic The inherited `plot()` method shows the permutation null distribution with the observed MMD statistic marked: ```{r plot} plot(fit_narrow) ``` ## Notes on practice - **Ensemble size.** Distributional power is set by both the ensemble size `M` and the held-out sample size `n_obs`. For ag-scale problems `M` in the low hundreds is usually sufficient; `n_obs >= 20` is a practical floor. - **Permutations.** At `B = 299` the minimum achievable p-value is `1 / 300 ~ 0.0033`; correspondingly the maximum surprise is `log2(300) ~ 8.23` bits. Increase `B` when working in the small-p regime. - **Interpreting surprise.** Surprise is Shannon information of the permutation p-value: 0 bits at `p = 1`, 4.32 bits at `p = 0.05`, and capped at `log2(B + 1)`. It is *not* the strict Bayesian-surprise KL divergence; use it as an intuitive scalar verdict, not as a posterior probability. - **Multi-dimensional outputs.** `mmd_ppc()` handles `d > 1` natively via the kernel choice; the default RBF with median-heuristic bandwidth is computed over the pooled posterior + observed sample. ## Cross-package handoff PESTO 0.3.0 ships a native ensemble emitter — the `PESTO::pesto_ensemble_manifest` S7 class — which `mmd_ppc()` consumes via a dedicated method. The legacy lightweight `pesto_ensemble` S3 constructor is unchanged and remains supported. ```{r,eval=requireNamespace("PESTO",quietly=TRUE)} library(PESTO) # Build a tiny synthetic manifest (in real workflows: come from # pesto_ies_callback() + as_manifest()). nreal <- 80L; nobs <- 3L post <- matrix(rnorm(nreal * nobs), nreal, nobs) colnames(post) <- paste0("o", seq_len(nobs)) m <- pesto_ensemble_manifest( run_id = "vignette_demo", params = data.frame(real_name = paste0("r", seq_len(nreal)), p1 = rnorm(nreal), p2 = rnorm(nreal), check.names = FALSE), outputs = data.frame(real_name = paste0("r", seq_len(nreal)), post, check.names = FALSE), weights = setNames(rep(1, nobs), colnames(post)), obs_target = setNames(rnorm(nobs), colnames(post)), data_hash = "sha256:vignette_demo", pesto_version = as.character(packageVersion("PESTO")), timestamp = Sys.time(), method = "ies_callback", noptmax = 1L, lambda_schedule = 1 ) # Out-of-sample PPC: supply held-out observations explicitly. The # manifest's `obs_target` slot is a single nobs-dim point (the data # the posterior was fit to) and is not a valid two-sample comparator # on its own. held_out <- matrix(rnorm(20L * nobs), 20L, nobs) colnames(held_out) <- colnames(post) mmd_ppc(m, observed = held_out, n_permutations = 199L, seed = 1L) ``` The `outputs =` argument focuses the check on specific observation columns (parallel to `dr_date_scenario()`'s convention). ## References - Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. *Statistica Sinica*, 6(4), 733-760. - Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. *JMLR*, 13, 723-773.