---
title: "In-Process IES via R Callback -- apsimx Adapter"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{In-Process IES via R Callback -- apsimx Adapter}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width = 6, fig.height = 4
)
set.seed(20260516)
```

## Why a callback driver?

`pesto_ies()` shells out to the `pestpp-ies` binary, which writes and
reads `.pst` / ensemble / observation files between every realisation.
For an in-process R forward model (an `apsimx` wrapper, a Python
bridge, or a fast synthetic test problem), that file round-trip is
pure overhead. `pesto_ies_callback()` keeps the same C++ ensemble
kernel (`ensemble_solution()`, Chen & Oliver 2013) but drives the
outer loop in R, calling the forward model in-process.

This document shows:

1. The driver on a synthetic linear-Gaussian problem (recovers truth).
2. The typed forward-model contract (`pesto_forward_model()`) shared by
   both adapter modes, including parallel, fault-tolerant evaluation.
3. The shape of the `apsim_callback()` adapter for `apsimx` (run
   block is disabled -- needs APSIM installed).
4. Multi-fidelity calibration: a cheap surrogate ramped to an expensive
   model through a `fidelity_schedule`.

## A synthetic recovery example

```{r}
library(PESTO)

npar  <- 4L
nobs  <- 8L
nreal <- 120L
sigma <- 0.05

G          <- matrix(rnorm(nobs * npar), nobs, npar)
theta_true <- c(1.0, -0.5, 2.0, 0.25)
y          <- as.numeric(G %*% theta_true) + rnorm(nobs, sd = sigma)
names(y)   <- paste0("o", seq_len(nobs))

# Forward model: any function that takes an (nreal x npar) matrix and
# returns an (nreal x nobs) matrix.
forward <- function(theta) theta %*% t(G)

prior <- matrix(rnorm(nreal * npar), nreal, npar,
                dimnames = list(NULL, paste0("p", seq_len(npar))))
```

Run six IES iterations with a single lambda:

```{r}
fit <- pesto_ies_callback(
  forward_model  = forward,
  prior_ensemble = prior,
  obs            = y,
  obs_sd         = sigma,
  noptmax        = 6L,
  lambda         = 1.0,
  verbose        = FALSE
)
```

Diagnostics:

```{r}
prior_rmse <- sqrt(mean((colMeans(prior) - theta_true)^2))
post_mean  <- colMeans(as.matrix(fit$par_ensemble[, -1L]))
post_rmse  <- sqrt(mean((post_mean - theta_true)^2))

cat(sprintf("prior RMSE: %.4f\n", prior_rmse))
cat(sprintf("posterior RMSE: %.4f\n", post_rmse))

mean_phi <- vapply(fit$iterations, `[[`, numeric(1L), "mean_phi")
print(mean_phi)
```

Posterior RMSE should be a small fraction of prior RMSE, and mean phi
should decrease across iterations.

The full per-realisation phi trajectory:

```{r, fig.alt = "Per-realisation phi decreasing across IES iterations"}
plot_phi(fit)
```

## The forward-model contract

A bare `function(theta) -> obs` is the quickest way in, but it carries
no metadata: the driver cannot know the output dimension ahead of time,
cannot run realisations in parallel, and cannot express a failure
budget. `pesto_forward_model()` wraps the same callable in a typed
contract that does. It is the single object both the native-callback
driver here and the classic `.pst` path (`pesto_ies()`) are built to
honour, so a forward model travels between modes unchanged.

```{r}
fm <- pesto_forward_model(
  fn          = forward,
  n_obs       = nobs,
  param_names = paste0("p", seq_len(npar)),
  on_failure  = "na"
)

# A contract object is evaluated directly with pesto_evaluate(); the
# returned matrix carries the per-realisation failure count.
sim <- pesto_evaluate(fm, prior)
cat(sprintf("evaluated %d x %d; failures: %d\n",
            nrow(sim), ncol(sim), attr(sim, "n_failures")))

# Passing the contract object to the driver is equivalent to passing the
# bare function -- the bare form is auto-wrapped internally.
fit_typed <- pesto_ies_callback(fm, prior, y, sigma,
                                noptmax = 6L, verbose = FALSE)
identical(
  as.matrix(fit_typed$par_ensemble[, -1L]),
  as.matrix(fit$par_ensemble[, -1L])
)
```

### Parallel, fault-tolerant ensembles

For an expensive forward model -- an APSIM ensemble especially -- the
realisations within one iteration are embarrassingly parallel. Build the
contract with `parallel = "multicore"` and the evaluation engine
dispatches rows across forked workers via `parallel::mclapply()`. For
reproducible draws, set an `"L'Ecuyer-CMRG"` RNG first; each realisation
then receives an independent stream.

```{r,eval=FALSE}
RNGkind("L'Ecuyer-CMRG")
set.seed(42L)

fm_par <- pesto_forward_model(
  fn       = forward,
  n_obs    = nobs,
  parallel = "multicore",
  n_cores  = 4L
)
fit_par <- pesto_ies_callback(fm_par, prior, y, sigma, noptmax = 6L)
```

A custom `map_fn` (any `lapply`-shaped function) is the escape hatch for
cross-platform or cluster backends, for example
`future.apply::future_lapply` or a `mirai` map. Failures are governed by
`on_failure` (`"na"` tolerates, `"stop"` aborts) and by `max_fail_frac`,
which aborts once the fraction of failed realisations in any single
evaluation exceeds the budget -- a guardrail against a silently
collapsing ensemble.

## Driving APSIM through `apsim_callback()`

The adapter wraps `apsimx` so each realisation gets its own working
copy of an APSIM template, parameter edits per `param_map`, a run, and
an extraction step. The signature returned matches what
`pesto_ies_callback()` expects.

```{r,eval=FALSE}
library(apsimx)   # >= 2.7.0 from CRAN
# Requires a working APSIM Next Gen installation.

forward <- apsim_callback(
  template  = "wheat_wagga.apsimx",
  param_map = list(
    RUE = "Wheat.Leaf.Photosynthesis.RUE.FixedValue",
    CN2 = "Soil.SoilWater.CN2Bare"
  ),
  output_extractor = function(sim) {
    # sim is the data.frame of report-table variables.
    as.numeric(sim$Wheat.Grain.Total.Wt)
  }
)

prior <- cbind(
  RUE = runif(40, 1.0, 2.0),
  CN2 = runif(40, 60,  90)
)

fit <- pesto_ies_callback(
  forward_model  = forward,
  prior_ensemble = prior,
  obs            = c(y_2018 = 4500, y_2019 = 5200, y_2020 = 4900),
  obs_sd         = 250,
  noptmax        = 4L
)

summary(fit$par_ensemble)
```

### Failure handling

Per-realisation APSIM crashes (corrupt config, solver divergence,
missing report variable) are caught by the adapter and emerge as `NA`
rows. `pesto_ies_callback()`'s `on_failure = "na"` (default) carries
those realisations forward unchanged so the ensemble survives partial
failures. Setting `on_failure = "stop"` aborts as soon as one row is
missing.

### Concurrency

The `apsim_callback()` closure writes each realisation to its own
uniquely-named working file, so it is safe to evaluate in parallel:
wrap it in a `pesto_forward_model(parallel = "multicore")` exactly as
above and the ensemble runs across forked workers, each invoking APSIM
on its own input. Start with a modest `n_cores` -- `apsimx`'s own
thread-safety under heavy ensemble load has not been independently
characterised. For pure-R synthetic models the serial loop is already
fast enough that the overhead is dominated by `ensemble_solution()`
itself.

## Multi-fidelity calibration

Process-based crop models expose a cost/accuracy dial: a daily time-step
with a lite soil profile is cheap; a sub-daily step with the full
profile is expensive. `pesto_multifidelity_model()` makes that dial
first-class. It bundles an ordered stack of fidelity levels -- cheapest
first -- and the IES driver picks a level per iteration through a
`fidelity_schedule`, so early iterations explore cheaply and late
iterations sharpen against the expensive truth. The final ensemble is
always refreshed at the highest fidelity.

```{r}
# Cheap level: the linear model with a small systematic bias.
# Expensive level: the unbiased truth.
cheap_fn     <- function(theta) theta %*% t(G) + 0.30
expensive_fn <- forward

mf <- pesto_multifidelity_model(
  levels = list(
    pesto_forward_model(fn = cheap_fn,     n_obs = nobs, fidelity = 0L),
    pesto_forward_model(fn = expensive_fn, n_obs = nobs, fidelity = 1L)
  ),
  costs = c(1, 25)   # the expensive level is ~25x dearer per run
)

# Ramp: two cheap iterations, then four expensive ones.
fit_mf <- pesto_ies_callback(
  forward_model     = mf,
  prior_ensemble    = prior,
  obs               = y,
  obs_sd            = sigma,
  noptmax           = 6L,
  fidelity_schedule = c(0L, 0L, 1L, 1L, 1L, 1L),
  verbose           = FALSE
)

mf_rmse <- sqrt(mean(
  (colMeans(as.matrix(fit_mf$par_ensemble[, -1L])) - theta_true)^2
))
cat(sprintf("multi-fidelity posterior RMSE: %.4f\n", mf_rmse))
```

When a cheap level is run over the whole ensemble and the expensive
level over only a subset, `mf_control_variate()` lifts the cheap outputs
toward the expensive ones with the variance-minimising affine
correction -- the plug-in primitive for a surrogate cascade:

```{r}
sub      <- 1:20
low_all  <- pesto_evaluate(mf, prior, level = 0L)
low_sub  <- low_all[sub, , drop = FALSE]
high_sub <- pesto_evaluate(mf, prior[sub, , drop = FALSE], level = 1L)

corrected <- mf_control_variate(low_all, high_sub, low_sub)
cat(sprintf("mean |cheap - corrected| over the ensemble: %.4f\n",
            mean(abs(low_all - corrected))))
```

## Sequential (filter-mode) assimilation

`pesto_ies_callback()` is a *smoother*: it assimilates every observation in
one batch. For an in-season setting -- where a growing season's
observations arrive over time and you want an updated parameter posterior
*as the season progresses*, not only at harvest -- `pesto_ies_filter()` is
the *filter*. It assimilates time-ordered observation **windows** one after
another against the same static calibration parameters; each window's
posterior becomes the next window's prior, so information accrues and the
posterior tightens window by window.

```{r}
# Reuse the synthetic problem above; split the nobs observations into
# three temporal windows (e.g. early / mid / late season).
windows <- list(1:3, 4:6, 7:8)

fit_seq <- pesto_ies_filter(
  forward_model  = forward,
  prior_ensemble = prior,
  obs            = y,
  obs_sd         = sigma,
  windows        = windows,
  verbose        = FALSE
)

# The per-window history carries the per-parameter ensemble sd; its mean
# should fall window over window as more of the season is assimilated.
sd_trace <- vapply(fit_seq$windows, function(w) mean(w$par_sd), numeric(1))
data.frame(
  window      = seq_along(sd_trace),
  obs_assim   = vapply(fit_seq$windows, function(w) length(w$obs_indices),
                       integer(1)),
  mean_par_sd = round(sd_trace, 4)
)
```

The posterior at the final window matches what the batch smoother would
recover, but you also have the intermediate posteriors -- the value of the
filter is the *trajectory*, not just the endpoint. A multi-fidelity stack
plugs in through `fidelity_schedule` (one level per window: run cheap early,
sharpen late), and the result wraps into the same manifest contract via
`as_manifest()` (tagged `method = "ies_filter"`).

## When to prefer `pesto_ies()` instead

Use the classic `.pst` path when:

- Your forward model lives behind a non-R executable that PEST++ can
  drive directly via its template/instruction-file mechanism.
- You need full bit-for-bit compatibility with `pestpp-ies` behaviour
  (e.g. for cross-validating against an existing PEST++ workflow).
- You depend on `pestpp-ies`-specific features not yet exposed by the
  R driver (full lambda line-search, regularisation modes,
  Tikhonov priors).

Otherwise the callback driver is faster to iterate, easier to debug,
and avoids per-realisation file I/O entirely.

## Reproducibility

```{r}
sessionInfo()
```