--- title: "Calibrating APSIM Wheat with PESTO: a case study" output: rmarkdown::html_vignette: mathjax: "https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" pandoc_args: ["--include-in-header=mathjax-config.html"] vignette: > %\VignetteIndexEntry{Calibrating APSIM Wheat with PESTO: a case study} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.2, dpi = 96, message = FALSE, warning = FALSE ) library(PESTO) library(data.table) library(ggplot2) PAL <- c("#0072B2", "#D55E00", "#009E73", "#CC79A7", "#000000") ``` ## Why this study PESTO's flagship partner is **APSIM**, the agricultural-systems simulator (see the *Calibrating APSIM with PESTO* vignette for the coupling mechanics). This case study is the calibration counterpart: a complete, reproducible demonstration that PESTO's iterative ensemble smoother (IES) recovers known APSIM parameters and quantifies their uncertainty. It is deliberately a **synthetic-truth recovery experiment** (an OSSE, or "twin experiment"): we fix known-true APSIM parameters, simulate per-season wheat yields, add measurement noise, and then try to recover the truth from the noisy yields. Because the truth is known *by construction*, correctness is verifiable without any field data -- the cleanest way to show a calibration method works before confronting real observations. The forward model is the **bundled APSIM Wheat example** (a multi-decade run at a single site), driven in-process through `apsim_callback()`. This study has two parts. **Part 1** (the bulk below) is that synthetic-truth recovery. **Part 2** then calibrates physiological parameters to *real* observed wheat biomass and cross-checks PESTO against the established `apsimx` optimiser. ```{r load} rds <- system.file("extdata", "apsim_case_study", "apsim_wheat_calibration_result.rds", package = "PESTO") if (!nzchar(rds)) { rds <- file.path("..", "inst", "extdata", "apsim_case_study", "apsim_wheat_calibration_result.rds") } R <- readRDS(rds) spec <- R$spec; pnm <- R$pnm ``` The results below are **frozen real outputs** of the calibration, shipped with the package and loaded here; APSIM is not run at build time (it is an external simulator, absent on CRAN check farms). They were produced by `r R$apsim_version` via `apsimx` `r R$apsimx_version`, PESTO `r R$pesto_version`, seed `r R$seed`. The runnable driver is `system.file("case_studies/apsim_wheat_calibration.R", package = "PESTO")`. ## The calibration target Three parameters are estimated, chosen from a sensitivity screen of the real model to span the identifiability spectrum: two strong, mechanistically distinct levers (soil water and nitrogen) and one deliberately weak lever. ```{r target-table} tt <- data.frame( Parameter = pnm, Role = vapply(spec, `[[`, "", "role"), Unit = vapply(spec, `[[`, "", "unit"), Truth = vapply(spec, `[[`, 0, "truth"), `Prior range` = sprintf("%g -- %g", vapply(spec, `[[`, 0, "lo"), vapply(spec, `[[`, 0, "hi")), check.names = FALSE, row.names = NULL ) knitr::kable(tt, caption = "Calibration target: known truth and uniform prior.") ``` ## The synthetic observations The truth run produces one wheat yield per season; we add Gaussian measurement noise (SD `r R$obs_sd` kg/ha) and split the seasons into a calibration set (used in the fit) and a held-out validation set (used only to test out-of-sample prediction). ```{r obs-plot, fig.alt = "Per-season wheat yield across the multi-decade run: the noise-free truth and the noisy synthetic observations, with calibration and held-out validation seasons distinguished."} od <- data.table( year = R$season_year, truth = R$y_truth, obs = R$y_obs_all, set = ifelse(seq_along(R$y_truth) %in% R$cal_idx, "calibration", "validation") ) ggplot(od, aes(year, obs, colour = set)) + geom_line(aes(y = truth), colour = "grey60", linewidth = 0.3) + geom_point(size = 1.3) + scale_colour_manual(values = c(calibration = PAL[1], validation = PAL[2])) + labs(x = NULL, y = "Wheat yield (kg/ha)", colour = NULL, title = "Synthetic observations: truth (grey) + measurement noise") + theme_minimal(base_size = 11) ``` ## The calibration The forward model is APSIM itself, wrapped so a parameter matrix maps to a matrix of simulated yields; the IES then conditions the ensemble on the calibration-season observations. This is the entire calibration call (shown for reference; the frozen fit is loaded above): ```{r calib-code, eval = FALSE} fm <- apsim_callback( template = "Wheat.apsimx", param_map = list( CN2Bare = ".Simulations.Simulation.Field.Soil.SoilWater.CN2Bare", FertN = ".Simulations.Simulation.Field.Fertilise at sowing.Amount", SowingDepth = ".Simulations.Simulation.Field.Sow using a variable rule.SowingDepth" ), output_extractor = function(sim) as.numeric(sim$Yield) ) # Run APSIM realisations in parallel; hold out validation seasons via obs_sd. fm <- pesto_forward_model(fm, parallel = "multicore") fit <- pesto_ies_callback(fm, prior_ensemble = prior, obs = obs_all, obs_sd = sd_vec, noptmax = 6) ``` ### Convergence ```{r phi, fig.height = 3.6, fig.alt = "Objective-function (phi) trajectory over IES iterations: the ensemble's misfit drops sharply in the first iterations and then plateaus."} plot_phi(R$fit) + ggtitle("IES convergence: misfit (phi) by iteration") ``` ## Parameter recovery ```{r recovery-table} rec <- rbindlist(lapply(pnm, function(p) { q <- stats::quantile(R$post[, p], c(.05, .5, .95), na.rm = TRUE) data.table(Parameter = p, Truth = spec[[p]]$truth, `Posterior mean` = round(mean(R$post[, p], na.rm = TRUE), 1), `5%` = round(q[1], 1), `95%` = round(q[3], 1), `Truth in 90% CI` = ifelse(spec[[p]]$truth >= q[1] & spec[[p]]$truth <= q[3], "yes", "no")) })) knitr::kable(rec, caption = "Recovery of the known truth from noisy yields.") ``` ```{r recovery-plot, fig.height = 3.4, fig.alt = "Prior versus posterior densities for each parameter with the true value marked: the two strong parameters collapse tightly onto the truth, while the weak parameter's posterior stays broad."} dens <- rbindlist(lapply(pnm, function(p) rbind( data.table(Parameter = p, value = R$prior[, p], dist = "prior"), data.table(Parameter = p, value = R$post[, p], dist = "posterior") ))) truth_dt <- data.table(Parameter = pnm, truth = vapply(spec, `[[`, 0, "truth")) ggplot(dens, aes(value, fill = dist)) + geom_density(alpha = 0.5, colour = NA) + geom_vline(data = truth_dt, aes(xintercept = truth), linetype = "dashed", linewidth = 0.5) + facet_wrap(~ Parameter, scales = "free") + scale_fill_manual(values = c(prior = "grey70", posterior = PAL[1])) + labs(x = NULL, y = NULL, fill = NULL, title = "Prior to posterior; dashed line = truth") + theme_minimal(base_size = 10) ``` ### Identifiability -- what the data can constrain A parameter is well-identified when conditioning on the data collapses its spread far below the prior. The ratio of posterior to prior standard deviation makes this explicit: ```{r ident, fig.height = 3.2, fig.alt = "Posterior-to-prior standard-deviation ratio by parameter: low for the two strong parameters (well constrained), near one for the weak parameter (barely constrained)."} idr <- data.table( Parameter = pnm, ratio = vapply(pnm, function(p) stats::sd(R$post[, p], na.rm = TRUE) / stats::sd(R$prior[, p]), 0) ) ggplot(idr, aes(stats::reorder(Parameter, ratio), ratio)) + geom_col(fill = PAL[1], width = 0.6) + geom_hline(yintercept = 1, linetype = "dotted") + coord_flip() + labs(x = NULL, y = "posterior SD / prior SD (lower = better identified)", title = "Identifiability from the observed yields") + theme_minimal(base_size = 11) ``` The yields strongly constrain soil runoff (`CN2Bare`) and fertiliser nitrogen (`FertN`); sowing depth is barely identifiable from yield alone, and PESTO's diagnostics say so rather than returning a falsely confident estimate. That is the value of ensemble UQ: it reports what the data can and cannot teach you. ## Out-of-sample validation -- and the under-dispersion caveat Using the posterior ensemble to predict the **held-out** seasons tests genuine out-of-sample skill. The key finding is that the *raw* ensemble's predictive band is too narrow: ```{r validation, fig.height = 3.8, fig.alt = "Held-out seasonal yields with the posterior 90 percent predictive band, without and with covariance inflation: the raw band is too narrow and misses several points, the inflated band covers more of them."} oe <- R$obs_ensemble band <- function(mat) { data.table(year = R$season_year[R$val_idx], truth = R$y_truth[R$val_idx], lo = apply(mat[, R$val_idx, drop = FALSE], 2L, stats::quantile, .05, na.rm = TRUE), hi = apply(mat[, R$val_idx, drop = FALSE], 2L, stats::quantile, .95, na.rm = TRUE)) } cov_of <- function(b) mean(b$truth >= b$lo & b$truth <= b$hi) b_base <- band(oe); b_base[, panel := sprintf("raw ensemble (%.0f%% covered)", 100 * cov_of(b_base))] panels <- list(b_base) if (!is.null(R$inflation)) { b_inf <- band(R$inflation$obs_ensemble) b_inf[, panel := sprintf("with RTPS inflation (%.0f%% covered)", 100 * cov_of(b_inf))] panels <- list(b_base, b_inf) } bd <- rbindlist(panels) ggplot(bd, aes(year)) + geom_ribbon(aes(ymin = lo, ymax = hi), fill = PAL[1], alpha = 0.25) + geom_point(aes(y = truth), colour = PAL[2], size = 1.3) + facet_wrap(~ panel) + labs(x = NULL, y = "Held-out yield (kg/ha)", title = "Held-out seasons: 90% predictive band vs truth") + theme_minimal(base_size = 10) ``` The raw 90% band covers well under 90% of the held-out truth -- the finite-ensemble **under-dispersion** an iterative smoother suffers. PESTO ships the remedy: covariance inflation (and, for spatial problems, localisation), documented in the *Inflation and localisation* vignette. Re-running with RTPS inflation widens the posterior and lifts coverage toward nominal. Inflation strength is problem-dependent, so report coverage and tune it rather than trusting a default blindly. ## The ensemble manifest A calibrated run emits a portable, hash-verified `pesto_ensemble_manifest`, so a downstream tool can validate and reuse the posterior without re-running APSIM: ```{r manifest} man <- as_manifest(R$fit) class(man) man@method man@schema_version ``` # Part 2 -- calibration to real observed data Part 1 proved correctness against a *known* truth. Part 2 confronts **real field observations**: the `obsWheat` dataset shipped with the `apsimx` package (Miguez 2025) -- ten dates of measured wheat above-ground biomass over a 2016--2017 season, with `Ames.met` weather. There is no known truth here, so the test is different: does the calibrated model *fit* the observations and *predict* held-out ones, and does PESTO agree with an independent calibrator? We calibrate the two genuinely uncertain physiological parameters of the bundled `Wheat-opt-ex.apsimx` model -- radiation-use efficiency (`RUE`) and the cultivar `BasePhyllochron` (phenology) -- the same parameters, model, and data that the `apsimx` package's own `optim_apsimx()` example optimises. That gives an **independent cross-check**: PESTO's ensemble posterior against `apsimx`'s point optimum. ```{r p2-load} prd <- system.file("extdata", "apsim_case_study", "apsim_realdata_result.rds", package = "PESTO") if (!nzchar(prd)) { prd <- file.path("..", "inst", "extdata", "apsim_case_study", "apsim_realdata_result.rds") } P <- readRDS(prd); P_spec <- P$spec; P_pnm <- P$pnm ``` ```{r p2-obs, fig.height = 3.4, fig.alt = "Observed wheat above-ground biomass at ten dates through the season, with calibration and held-out validation dates distinguished; biomass rises from zero to roughly 1200 g/m2."} po <- data.table(date = P$obs_dates, biomass = P$y_obs, set = ifelse(seq_along(P$y_obs) %in% P$cal_idx, "calibration", "held-out")) ggplot(po, aes(date, biomass, colour = set)) + geom_line(colour = "grey70", linewidth = 0.3) + geom_point(size = 2) + scale_colour_manual(values = c(calibration = PAL[1], `held-out` = PAL[2])) + labs(x = NULL, y = "Observed biomass (g/m2)", colour = NULL, title = "obsWheat: real measured above-ground biomass") + theme_minimal(base_size = 11) ``` ### Estimates, with an independent cross-check ```{r p2-estimates} est <- rbindlist(lapply(P_pnm, function(p) { q <- stats::quantile(P$post[, p], c(.05, .5, .95), na.rm = TRUE) data.table(Parameter = p, Role = P_spec[[p]]$role, `PESTO mean` = round(mean(P$post[, p], na.rm = TRUE), 2), `5%` = round(q[1], 2), `95%` = round(q[3], 2), `apsimx optimum` = P_spec[[p]]$apsimx_opt, Default = P_spec[[p]]$default) })) knitr::kable(est, caption = paste( "PESTO posterior vs apsimx optim_apsimx() optimum (independent calibrator)", "and the model default.")) ``` For both parameters the `apsimx` point optimum lies **inside** PESTO's 90% credible interval, and both calibrators move the same direction away from the default (higher `RUE`, lower `BasePhyllochron`). Two independent calibration engines agree -- and PESTO additionally reports the uncertainty the point optimiser does not. ```{r p2-fit, fig.height = 3.8, fig.alt = "Posterior predictive 90 percent band for biomass at each observation date with the observed points overlaid; the band tracks the observations across the season and brackets the held-out points."} qb <- function(i) stats::quantile(P$obs_ensemble[, i], c(.05, .95), na.rm = TRUE) fitb <- data.table( date = P$obs_dates, obs = P$y_obs, lo = vapply(seq_along(P$obs_dates), function(i) qb(i)[1], 0), hi = vapply(seq_along(P$obs_dates), function(i) qb(i)[2], 0), set = ifelse(seq_along(P$y_obs) %in% P$cal_idx, "calibration", "held-out")) ggplot(fitb, aes(date)) + geom_ribbon(aes(ymin = lo, ymax = hi), fill = PAL[1], alpha = 0.25) + geom_point(aes(y = obs, shape = set), colour = PAL[2], size = 2) + scale_shape_manual(values = c(calibration = 16, `held-out` = 1)) + labs(x = NULL, y = "Biomass (g/m2)", shape = NULL, title = "Posterior predictive band vs observed biomass") + theme_minimal(base_size = 11) ``` ```{r p2-coverage, echo = FALSE} vl <- vapply(P$val_idx, function(i) stats::quantile(P$obs_ensemble[, i], .05, na.rm = TRUE), 0) vh <- vapply(P$val_idx, function(i) stats::quantile(P$obs_ensemble[, i], .95, na.rm = TRUE), 0) p2_cov <- mean(P$y_obs[P$val_idx] >= vl & P$y_obs[P$val_idx] <= vh) ``` The held-out dates (excluded from the fit) fall within the posterior predictive band: out-of-sample coverage is `r sprintf("%.0f%%", 100 * p2_cov)` of the `r length(P$val_idx)` held-out points. (With so few held-out points this is indicative, not a precise coverage estimate.) ## Reproducibility Every number here is reproducible. The driver (`apsim_wheat_calibration.R`, shipped under `case_studies/`) hardcodes no machine paths: point it at an APSIM install with the `PESTO_APSIM_EXE` / `PESTO_APSIM_EXAMPLES` environment variables (and `PESTO_DOTNET_ROOT` if the build needs an explicit runtime), then: ```{r repro, eval = FALSE} # Part 1 (synthetic-truth recovery): source(system.file("case_studies/apsim_wheat_calibration.R", package = "PESTO")) # Part 2 (real obsWheat data): source(system.file("case_studies/apsim_wheat_realdata.R", package = "PESTO")) ``` Provenance: `r R$apsim_version`, `apsimx` `r R$apsimx_version`, PESTO `r R$pesto_version`, seed `r R$seed`, generated `r R$generated_on`. Part 2 uses the `obsWheat` data and `Wheat-opt-ex.apsimx` model shipped with `apsimx`. ## What this study shows - **Strong parameters recovered.** Soil runoff and fertiliser nitrogen are recovered with the truth inside the 90% credible interval. - **Weak parameter flagged, not faked.** Sowing depth is poorly identifiable from yield alone; the posterior stays broad and the identifiability diagnostic says so. - **Raw intervals under-cover.** Out-of-sample coverage is below nominal until inflation is applied -- a real property of finite-ensemble smoothers, surfaced rather than hidden. - **From synthetic to real (Part 2).** On real `obsWheat` biomass, PESTO calibrates the physiological parameters (`RUE`, phenology), fits the observed season, and predicts held-out dates -- and its posterior brackets the independent `apsimx` optimiser's optimum for both parameters, adding the uncertainty the point optimiser does not report. ## References - Miguez, F. (2025). *apsimx: Inspect, Read, Edit and Run APSIM Next Generation and APSIM Classic.* R package version 2.8.235. (source of the `obsWheat` data and `Wheat-opt-ex.apsimx` model used in Part 2). - Holzworth, D. et al. (2014). APSIM -- evolution towards a new generation of agricultural systems simulation. *Environmental Modelling & Software* 62, 327--350. ```