---
title: "Tier-2 stubs: the research roadmap"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tier-2 stubs: the research roadmap}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4,
  out.width = "100%"
)
set.seed(20260513)
```

```{r setup}
library(proxymix)
```

`proxymix` ships five **Tier-2 stubs** — functions with stable signatures, full roxygen documentation, and signature-stability tests, whose bodies raise a `proxymix_not_yet_implemented` condition. They mark research directions the package's foundations support but does not yet implement. Each will graduate to Tier 1 in a later release after its own design, validation, and stress-audit pass.

The stubs collectively reveal *why* the regime-(iii) wedge is load-bearing: each of the five stubs ultimately calls into KLD-EM with a target that cannot be sampled.

## What the stubs are for

### `from_kde()` — KDE to Gaussian-mixture proxy *(graduated in v0.2.0)*

Take an `n` by `p` sample matrix, fit a kernel density estimate, then fit a Gaussian-mixture proxy *to that KDE* via KLD-EM. The proxy retains the closed-form mixture operations of `gmm_marginalise()` / `gmm_conditionalise()`, which the KDE lacks. Useful as a smoothing-and-closing step in posterior summarisation. Companion vignette: `vignettes/from_kde.Rmd`.

### `from_aggregate_likelihood()` — kernel-downsizing integration

Aggregate-likelihood downscaling concerns models of the form
\[
g(y) = \int K(y \mid x) \, f(x) \, dx
\]
where `y` is observed at aggregate scale and `x` at finer scale. If `f` is a Gaussian-mixture proxy fitted via KLD-EM, the integral is closed-form and the downscaling likelihood becomes a Gaussian mixture in `y`. This stub plugs `proxymix` into Sejdinovic et al.'s kernel-downsizing framework — *the parametric `f` alternative to their GP `f`*.

```{r from_agg}
tryCatch(
  from_aggregate_likelihood(matrix(0, 1, 1),
                            latent_aggregator = identity,
                            N = 2L),
  proxymix_not_yet_implemented = function(e) message(conditionMessage(e))
)
```

### `fit_kld_em_collider()` — DAG-constrained KLD-EM

Project each KLD-EM iteration onto the manifold of joint densities respecting a user-supplied directed-acyclic-graph's set of conditional-independence constraints (the collider-regularised regression idea, Sejdinovic et al.). This is a *novel* methodological extension beyond Hoek and Elliott (2024) — useful for testing causal-inference methods on Gaussian-mixture-generated joints with known DAG structure.

```{r kld_collider}
tryCatch(
  fit_kld_em_collider(banana_target(), dag = matrix(0, 2, 2), N = 2L),
  proxymix_not_yet_implemented = function(e) message(conditionMessage(e))
)
```

### `to_apsim_scenarios()` — APSIM scenario generation

Convert samples from a `gmm_fit` (typically fitted to a multivariate weather / soil / management distribution) into the tabular format consumed by APSIM scenario runners. Provides a clean bridge from `proxymix` proxies to mechanistic agronomy simulators.

```{r apsim}
x <- matrix(stats::rnorm(200), ncol = 2)
fit <- fit_proxymix(gmm_target_from_samples(x), N = 2L, regime = "sample",
                    max_iter = 10L)
tryCatch(
  to_apsim_scenarios(fit, n = 100L, schema = list()),
  proxymix_not_yet_implemented = function(e) message(conditionMessage(e))
)
```

### `from_simulator()` — wrap an expensive simulator as a target

Probe an expensive simulator on a designed grid of inputs, build a KDE (or empirical-likelihood surface) on its outputs, and expose the result as a `gmm_target` with an evaluable `log_density`. The simulator is treated as a black-box `f` that can be evaluated but not (cheaply) sampled — the wedge use case.

```{r from_sim}
tryCatch(
  from_simulator(simulator = identity,
                 design = matrix(stats::rnorm(20), ncol = 2)),
  proxymix_not_yet_implemented = function(e) message(conditionMessage(e))
)
```

## Why these stubs and not others

The choice of stubs is opinionated. Three guidelines apply:

1. **Each stub must terminate in a regime-(i)–(iii) verb.** If a stub graduates to a Tier-1 implementation, it should be a thin shim around the existing fitters. No new core algorithms are introduced silently.
2. **Each stub must have a sponsor application.** `from_kde` / `from_simulator` are general-purpose; `from_aggregate_likelihood` and `fit_kld_em_collider` are anchored on Sejdinovic et al.'s recent work; `to_apsim_scenarios` is anchored on agronomy applications. *Unsponsored stubs do not appear here.*
3. **Each stub must respect the package's Tier-3 deferrals.** No stub introduces normalising flows, automatic differentiation variational inference, or Stan / INLA interop.

## What is *not* coming

The following are explicit **Tier-3 deferrals** and will *not* appear in `proxymix` as stubs:

* **Adaptive importance sampling.** Redrawing the IS sample each KLD-EM iteration. A real win at moderate dimensions; deferred because a clean treatment requires more design than the current scope allows.
* **Variational boosting.** Add components until the IS-estimated KLD plateaus.
* **Normalising-flow proposals.** Out of scope for a Gaussian-mixture package; recommend [`tensorflow` / `keras`] or `torch` directly.
* **Stan / INLA interop.** Adjacent but distinct ecosystem.

## Reference

Hoek, J. van der and Elliott, R. J. (2024). *Mixtures of multivariate Gaussians.* Stochastic Analysis and Applications. <https://doi.org/10.1080/07362994.2024.2372605>.