---
title: "Surrogate-Accelerated Iterative Ensemble Smoother"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Surrogate-Accelerated Iterative Ensemble Smoother}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Overview

The surrogate-accelerated IES is PESTO's approach to reducing the number
of expensive forward model evaluations during iterative parameter
estimation. It embeds an online Gaussian Process (GP) surrogate within
the IES update loop, using uncertainty-driven switching to decide which
ensemble members require full model evaluation and which can use the
cheaper surrogate prediction.

This vignette demonstrates the full workflow on a nonlinear test problem.

### Regime of applicability

The Gaussian-process surrogate inside PESTO is a tool with a clearly
bounded operating envelope, and it is worth stating that envelope before
the demo. As an empirical soft floor we recommend
`n_train >= 5 * n_params` -- below that ratio the GP posterior variance
rarely drops far enough for the uncertainty-driven switching rule to
fire, and the surrogate honestly defers to the full model on most
realisations. The second axis is the smoothness of the forward model:
smooth, near-linear responses are well-approximated by an RBF-kernel GP
and yield large savings, whereas rough or near-discontinuous responses
(sharp wetting fronts, threshold activations, regime switches) defeat
the kernel and savings collapse. The third axis is ensemble size -- larger ensembles improve the GP fit roughly as $\mathcal{O}(n^{-1})$ in
posterior variance, and ensembles of fewer than about twenty
realisations rarely produce a usable surrogate at any dimensionality.
If your problem falls outside this envelope (high `n_params / n_train`
ratio, rough forward model, or tiny ensemble), the right thing to do is
run pure IES via `ensemble_solution()` -- `surrogate_ensemble_update()`
will report near-zero savings rather than degrading the posterior, but
you pay the GP training cost for no return.

## The Test Problem

We define a nonlinear forward model inspired by groundwater flow, where
the response includes both linear sensitivity and a nonlinear
interaction term:

$$y_j = \sum_k G_{jk}\theta_k + \alpha \sin\!\left(\sum_k G_{jk}\theta_k\right) \exp\!\left(-0.1\left|\sum_k G_{jk}\theta_k\right|\right) + \varepsilon_j$$

The parameter $\alpha$ controls nonlinearity: $\alpha = 0$ is linear,
$\alpha = 1$ is strongly nonlinear.

```{r test-problem}
library(PESTO)
library(data.table)
library(ggplot2)

forward_model <- function(theta, G, alpha = 0.3) {
  linear <- as.numeric(G %*% theta)
  nonlinear <- alpha * sin(linear) * exp(-0.1 * abs(linear))
  linear + nonlinear
}

set.seed(42)
n_par <- 20
n_obs <- 50
n_real <- 50

theta_true <- rnorm(n_par)
G <- matrix(rnorm(n_obs * n_par, sd = 1 / sqrt(n_par)), n_obs, n_par)
y_obs <- forward_model(theta_true, G, alpha = 0.3) + rnorm(n_obs, sd = 0.1)
```

## Step 1: Generate Prior Ensemble

```{r prior-ensemble}
par_ens <- matrix(rnorm(n_real * n_par, sd = 1.5), n_real, n_par)

# Run forward model for each realisation
obs_ens <- t(apply(par_ens, 1, function(p) {
  forward_model(p, G, alpha = 0.3) + rnorm(n_obs, sd = 0.1)
}))

cat("Parameter ensemble:", nrow(par_ens), "realisations x", ncol(par_ens), "parameters\n")
cat("Observation ensemble:", nrow(obs_ens), "realisations x", ncol(obs_ens), "observations\n")
```

## Step 2: Train the GP Surrogate

PESTO trains a GP with an RBF kernel directly from the ensemble:

```{r train-gp}
gp <- train_gp_surrogate(par_ens, obs_ens)

cat("Training samples:", gp$n_train, "\n")
cat("Parameters:", gp$n_par, "\n")
cat("Outputs:", gp$n_obs, "\n")
cat("Length scale:", round(gp$length_scale, 4), "\n")
cat("Signal variance:", round(gp$signal_var, 4), "\n")
cat("Log marginal likelihood:", round(gp$log_marginal_likelihood, 1), "\n")
```

## Step 3: Evaluate Surrogate Predictions

```{r predict-gp}
predictions <- predict_gp_surrogate(gp, par_ens)

cat("Prediction dimensions:", dim(predictions$mean), "\n")
cat("Mean prediction uncertainty:", round(mean(predictions$uncertainty), 6), "\n")
cat("Max prediction uncertainty:", round(max(predictions$uncertainty), 6), "\n")

# Check prediction accuracy
pred_error <- sqrt(mean((predictions$mean - obs_ens)^2))
cat("Prediction RMSE:", round(pred_error, 6), "\n")
```

The GP achieves near-zero prediction error because it is interpolating
within its training set -- this is by design, not an artefact.

## Step 4: Surrogate-Assisted IES Update

The `surrogate_ensemble_update()` function performs the full algorithm:

1. Train GP from current ensemble
2. Predict model outputs for all realisations
3. Classify by uncertainty: high → full model, low → surrogate
4. Apply control-variate bias correction
5. Compute IES update on blended ensemble

```{r surrogate-update}
weights <- rep(1 / 0.1, n_obs)    # 1 / obs_noise
parcov_inv <- rep(1 / 1.5^2, n_par)  # 1 / prior_var

result <- surrogate_ensemble_update(
  par_ensemble = par_ens,
  obs_ensemble = obs_ens,
  obs_target = y_obs,
  weights = weights,
  parcov_inv = parcov_inv,
  cur_lam = 1.0,
  uncertainty_threshold = 0.1
)

cat("Full model runs needed:", result$n_model_runs, "\n")
cat("Surrogate predictions:", result$n_surrogate_runs, "\n")
cat("Total realisations:", result$n_total, "\n")
cat("Model savings:", sprintf("%.0f%%", result$savings_pct), "\n")
cat("Mean GP uncertainty:", round(result$mean_uncertainty, 6), "\n")
cat("GP length scale:", round(result$gp_length_scale, 4), "\n")
```

## Comparison: Standard vs Surrogate

```{r compare}
# Standard IES update (for comparison)
par_mean <- colMeans(par_ens)
obs_mean <- colMeans(obs_ens)
par_diff <- t(par_ens) - par_mean
obs_diff <- t(obs_ens) - obs_mean
# Sign: ensemble_solution() expects sim - obs (see ?ensemble_solution).
obs_resid <- t(obs_ens) - matrix(rep(y_obs, n_real), n_obs, n_real)
par_resid <- par_diff
Am <- matrix(rnorm(n_par * (n_real - 1)), n_par, n_real - 1)

standard_upgrade <- ensemble_solution(
  par_diff, obs_diff, obs_resid, par_resid,
  weights, parcov_inv, Am, cur_lam = 1.0
)

surrogate_upgrade <- result$upgrade

# Compare upgrades
upgrade_diff <- sqrt(mean((standard_upgrade - surrogate_upgrade)^2))
upgrade_corr <- cor(as.numeric(standard_upgrade), as.numeric(surrogate_upgrade))

cat("Upgrade RMSE difference:", round(upgrade_diff, 6), "\n")
cat("Upgrade correlation:", round(upgrade_corr, 6), "\n")
```

## Effect of Nonlinearity

How does surrogate accuracy vary with model nonlinearity?

```{r nonlinearity-sweep, fig.cap = "Surrogate RMSE ratio across nonlinearity levels."}
results_dt <- rbindlist(lapply(seq(0, 1, by = 0.1), function(alpha) {
  set.seed(42)
  y_a <- forward_model(theta_true, G, alpha) + rnorm(n_obs, sd = 0.1)
  obs_a <- t(apply(par_ens, 1, function(p) {
    forward_model(p, G, alpha) + rnorm(n_obs, sd = 0.1)
  }))

  pm <- colMeans(par_ens); om <- colMeans(obs_a)
  pd <- t(par_ens) - pm; od <- t(obs_a) - om
  or_ <- t(obs_a) - matrix(rep(y_a, n_real), n_obs, n_real)
  pr <- pd
  Am_a <- matrix(rnorm(n_par * (n_real - 1)), n_par, n_real - 1)

  upg_std <- ensemble_solution(pd, od, or_, pr, weights, parcov_inv, Am_a, 1.0)
  par_std <- par_ens + upg_std
  rmse_std <- sqrt(mean((colMeans(par_std) - theta_true)^2))

  surr <- surrogate_ensemble_update(par_ens, obs_a, y_a, weights, parcov_inv,
                                     uncertainty_threshold = 0.1)
  par_surr <- par_ens + surr$upgrade
  rmse_surr <- sqrt(mean((colMeans(par_surr) - theta_true)^2))

  data.table(alpha = alpha, rmse_std = rmse_std, rmse_surr = rmse_surr,
             ratio = rmse_surr / rmse_std, savings = surr$savings_pct)
}))

ggplot(results_dt, aes(x = alpha)) +
  geom_line(aes(y = ratio), colour = "#009E73", linewidth = 1.2) +
  geom_point(aes(y = ratio), colour = "#009E73", size = 3) +
  geom_hline(yintercept = 1.0, linetype = "dashed", colour = "grey50") +
  scale_y_continuous(limits = c(0.95, 1.05)) +
  labs(title = "Surrogate Accuracy Across Nonlinearity Levels",
       x = expression(paste("Nonlinearity strength ", alpha)),
       y = "RMSE ratio (surrogate / standard)",
       caption = "Ratio = 1.0 means identical posterior quality") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold"))
```

## Random Fourier Features for Large Ensembles

The exact GP has $O(n^3)$ training cost. For large ensembles ($n > 300$),
the Random Fourier Feature (RFF) approximation scales as $O(nD^2)$ where
$D$ is the number of random features:

```{r rff}
# Train RFF surrogate (D = 200 features)
rff <- train_rff_surrogate(par_ens, obs_ens, n_features = 200L)
cat("Training MSE:", round(rff$train_mse, 6), "\n")

# Predict
rff_pred <- predict_rff_surrogate(rff, par_ens)
rff_error <- sqrt(mean((rff_pred$mean - obs_ens)^2))
cat("RFF prediction RMSE:", round(rff_error, 4), "\n")

# Compare GP vs RFF accuracy
gp_error <- sqrt(mean((predictions$mean - obs_ens)^2))
cat("Exact GP prediction RMSE:", round(gp_error, 6), "\n")
cat("RFF / GP error ratio:", round(rff_error / max(gp_error, 1e-10), 1), "x\n")
```

## Scaling Comparison

```{r scaling, fig.cap = "GP vs RFF training time scaling."}
scaling_dt <- rbindlist(lapply(c(30, 50, 100, 200, 500), function(nr) {
  X <- matrix(rnorm(nr * 20), nr, 20)
  Y <- matrix(rnorm(nr * 50), nr, 50)

  t_gp <- system.time(train_gp_surrogate(X, Y))[["elapsed"]] * 1000
  t_rff <- system.time(train_rff_surrogate(X, Y, 200L))[["elapsed"]] * 1000

  data.table(n = nr, `Exact GP` = t_gp, `RFF (D=200)` = t_rff)
}))

scaling_long <- melt(scaling_dt, id.vars = "n",
                     variable.name = "method", value.name = "time_ms")

ggplot(scaling_long, aes(x = n, y = pmax(time_ms, 0.01), colour = method)) +
  geom_line(linewidth = 1) + geom_point(size = 2.5) +
  scale_y_log10() +
  scale_colour_manual(values = c("Exact GP" = "#D55E00", "RFF (D=200)" = "#0072B2")) +
  labs(title = "GP Surrogate Training Time",
       x = "Ensemble size", y = "Training time (ms, log scale)",
       colour = "Method") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold"), legend.position = "bottom")
```

## When Does the Surrogate Help?

The surrogate is most effective when:

- The ensemble occupies a **compact region** of parameter space
  (GP interpolation is accurate)
- The forward model varies **smoothly** over the parameter range
- The ensemble is **large enough** to train the GP (practically $n \geq 20$)

It degrades gracefully when:

- The parameter space is very high-dimensional ($n_p \gg 100$)
- The model has **sharp discontinuities** or bifurcations
- The ensemble spans a very **wide region** (uninformative prior)

In all cases, the uncertainty threshold $\tau$ ensures the algorithm
falls back to full model evaluations when the surrogate is unreliable.

## References

- Rasmussen, C.E. & Williams, C.K.I. (2006). *Gaussian Processes for
  Machine Learning*. MIT Press.
- Rahimi, A. & Recht, B. (2007). Random features for large-scale
  kernel machines. *NeurIPS*.
- Chen, Y. & Oliver, D.S. (2013). Levenberg-Marquardt forms of the
  iterative ensemble smoother. *Computational Geosciences*, 17(4).
- Glasserman, P. (2003). *Monte Carlo Methods in Financial Engineering*.
  Springer.