---
title: "Three regimes: moment, sample, and KLD-EM"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Three regimes: moment, sample, and KLD-EM}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4,
  out.width = "100%"
)
set.seed(20260513)
```

```{r setup}
library(proxymix)
```

This vignette walks through the three KL-optimal regimes of Hoek and Elliott (2024). For each, we show what information the regime needs about the target, the M-step update it implements, and an end-to-end fit on a target whose ground truth we know.

We use the bundled three-component mixture target — its log-density is known exactly, and (because the target is itself a Gaussian mixture) so are its true samples.

```{r target}
tgt <- mixture_target(with_samples = TRUE, n = 1500L, seed = 1L)
tgt
```

```{r target-plot, fig.cap = "Three-mixture target with overlaid samples."}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  grid_x <- seq(-4.5, 4.5, length.out = 100)
  G <- expand.grid(x1 = grid_x, x2 = grid_x)
  G$f <- exp(tgt@log_density(as.matrix(G)))
  ggplot(G, aes(x1, x2)) +
    geom_contour_filled(aes(z = f), bins = 10L, alpha = 0.9) +
    geom_point(data = as.data.frame(tgt@samples[1:500, ]),
               aes(V1, V2), alpha = 0.15, size = 0.6) +
    scale_fill_viridis_d(option = "mako", guide = "none") +
    coord_equal() +
    labs(x = expression(x[1]), y = expression(x[2])) +
    theme_minimal(base_size = 12)
}
```

## Regime (i): closed-form moment matching

When `N == 1`, the KL-optimal Gaussian proxy has the same mean and covariance as the target. That is regime (i). The function is `fit_moment_match()`, and the dispatcher chooses it under `regime = "auto"` when `N == 1` and samples (or moments) are available.

```{r moment}
m_fit <- fit_proxymix(tgt, N = 1L, regime = "moment")
m_fit
```

Recovered mean and covariance match the target's empirical moments to within sampling error:

```{r moment-recover}
m_fit@means[[1L]]
m_fit@covariances[[1L]]
```

A single Gaussian is, of course, a bad fit to a three-mode mixture — but it is the *KL-optimal* single Gaussian. Regime (i) is most useful when used as a deterministic *seed* for regimes (ii) or (iii) at `N > 1`.

## Regime (ii): classical EM on samples

When the target carries samples, the textbook EM algorithm fits an `N`-component Gaussian mixture by maximum likelihood. That is regime (ii). The algorithm is the standard iterate-until-convergence:

* **E-step** — \(\gamma_{nk} = \pi_k \mathcal{N}(x_n \mid \mu_k, \Sigma_k) / \sum_j \pi_j \mathcal{N}(x_n \mid \mu_j, \Sigma_j)\).
* **M-step** — re-weight \(\pi_k, \mu_k, \Sigma_k\) by the responsibilities, with a diagonal ridge `epsilon_cov` to keep \(\Sigma_k\) positive-definite.
* **Convergence** — relative change in the empirical log-likelihood below `tol`.

```{r em}
s_fit <- fit_proxymix(tgt, N = 3L, regime = "sample",
                      max_iter = 200L, n_starts = 4L)
s_fit
```

The log-likelihood trace is monotone-up:

```{r em-trace, fig.cap = "Sample-EM log-likelihood trace (monotone-up)."}
trace <- s_fit@diagnostics$loglik_trace
if (requireNamespace("ggplot2", quietly = TRUE)) {
  ggplot(data.frame(iter = seq_along(trace), loglik = trace),
         aes(iter, loglik)) +
    geom_line() + geom_point(size = 1) +
    labs(x = "EM iteration", y = "log-likelihood") +
    theme_minimal(base_size = 12)
}
```

Information criteria are populated:

```{r em-bic}
bic_aic(s_fit)
```

## Regime (iii): importance-sampled KLD-EM — the wedge

Regime (iii) is the wedge of `proxymix`: it fits a Gaussian-mixture proxy when the target's `log_density` is evaluable but **no samples are available**. Examples include posterior distributions in Bayesian models, intractable likelihoods, expensive simulators, and downscaling targets in spatial statistics.

The algorithm:

1. Draw `is_size` samples from a user-supplied importance proposal `q`.
2. Compute self-normalised importance weights \(W_n \propto f(x_n) / q(x_n)\).
3. Run EM iterations using the IS weights — the M-step minimises \(\mathrm{KL}(f \Vert g_\theta)\) by re-weighting responsibilities with `W`.

The proposal `q` is the only free knob beyond the usual mixture configuration; `is_mvt()` with `df = 5` is a robust default.

```{r kld}
k_fit <- fit_proxymix(tgt, N = 3L, regime = "kld",
                      proposal = is_mvt(n_dim = 2L,
                                        mean = c(0, 0),
                                        sigma = 6 * diag(2),
                                        df = 5),
                      is_size = 3000L,
                      max_iter = 60L,
                      seed = 1L)
k_fit
```

Diagnostics:

```{r kld-trace, fig.cap = "KLD-EM convergence trace."}
trace <- kld_trace(k_fit)
if (requireNamespace("ggplot2", quietly = TRUE)) {
  ggplot(data.frame(iter = seq_along(trace), kld = trace),
         aes(iter, kld)) +
    geom_line() + geom_point(size = 1) +
    labs(x = "KLD-EM iteration", y = expression(widehat(KL)(f * "||" * g))) +
    theme_minimal(base_size = 12)
}
```

Effective sample size of the importance weights:

```{r kld-ess}
ess_trace(k_fit)
```

## Side-by-side overlay

The three fits — moment (N = 1), classical EM (N = 3), KLD-EM (N = 3) — overlaid on the target:

```{r overlay, fig.height = 4.5, fig.cap = "Target (filled) overlaid with the three regimes' fits (dashed)."}
if (requireNamespace("ggplot2", quietly = TRUE) &&
    requireNamespace("patchwork", quietly = TRUE)) {
  grid_x <- seq(-4.5, 4.5, length.out = 100)
  G <- expand.grid(x1 = grid_x, x2 = grid_x)
  G$f  <- exp(tgt@log_density(as.matrix(G)))
  G$mm <- exp(dgmm(as.matrix(G[, c("x1", "x2")]), m_fit, log = TRUE))
  G$ss <- exp(dgmm(as.matrix(G[, c("x1", "x2")]), s_fit, log = TRUE))
  G$kk <- exp(dgmm(as.matrix(G[, c("x1", "x2")]), k_fit, log = TRUE))

  base <- ggplot(G, aes(x1, x2)) +
    geom_contour_filled(aes(z = f), bins = 10L, alpha = 0.85) +
    scale_fill_viridis_d(option = "mako", guide = "none") +
    coord_equal() + theme_minimal(base_size = 11)

  p1 <- base + geom_contour(aes(z = mm), bins = 5L, colour = "white",
                            linetype = "dashed") +
    labs(title = "Regime (i) moment, N = 1", x = NULL, y = NULL)
  p2 <- base + geom_contour(aes(z = ss), bins = 5L, colour = "white",
                            linetype = "dashed") +
    labs(title = "Regime (ii) sample EM, N = 3", x = NULL, y = NULL)
  p3 <- base + geom_contour(aes(z = kk), bins = 5L, colour = "white",
                            linetype = "dashed") +
    labs(title = "Regime (iii) KLD-EM, N = 3", x = NULL, y = NULL)

  patchwork::wrap_plots(p1, p2, p3, nrow = 1)
}
```

## A sanity check: (i) and (iii) agree at N = 1

When `N = 1`, the KL-optimal Gaussian is the moment-matched one. Regime (iii) — a numerical optimiser — should converge to that same Gaussian (up to Monte Carlo error). Let us check this on the banana target where regime (iii) is the *only* available regime if we ignore samples:

```{r agreement}
banana <- banana_target(with_samples = TRUE, n = 2000L, seed = 1L)
m_b <- fit_proxymix(banana, N = 1L, regime = "moment")
k_b <- fit_proxymix(banana, N = 1L, regime = "kld",
                    proposal = is_mvt(n_dim = 2L,
                                      sigma = 4 * diag(2),
                                      df = 5),
                    is_size = 3000L, max_iter = 50L, seed = 1L)
rbind(
  moment_mean = m_b@means[[1L]],
  kld_mean    = k_b@means[[1L]]
)
rbind(
  moment_tr   = sum(diag(m_b@covariances[[1L]])),
  kld_tr      = sum(diag(k_b@covariances[[1L]]))
)
```

The means and trace-covariances agree to within Monte Carlo error.

## Reference

Hoek, J. van der and Elliott, R. J. (2024). *Mixtures of multivariate Gaussians.* Stochastic Analysis and Applications. <https://doi.org/10.1080/07362994.2024.2372605>.