autoplot() method for gmm_fit. Render a fitted proxy with
ggplot2::autoplot(fit) — a marginal density curve in one dimension, or a
viridis density raster with per-component ellipses in two. Any ambient
dimension is supported: the requested coordinates are reduced through the
package's own closed-form gmm_marginalise() before plotting (e.g.
autoplot(fit, dims = c(1L, 3L))). ggplot2 stays an optional dependency
— the method registers only when ggplot2 is installed, so R CMD check
remains clean with no sibling package present.Second methodological extension. Brings a complete affine-Gaussian operator calculus to Gaussian-mixture proxies — pushforward, Bayesian update on a noisy linear observation, aggregation, missing-data conditioning — each closed-form and component-wise.
gmm_affine(g, A, b, noise_cov) — closed-form pushforward
of a Gaussian mixture through y = A x + b + epsilon,
epsilon ~ N(0, noise_cov). Returns the mixture in R^m with
mu'_k = A mu_k + b, Sigma'_k = A Sigma_k A' + noise_cov,
weights unchanged.gmm_observe(g, A, y, noise_cov) — Bayesian update on a
noisy linear observation. Applies the Kalman gain per component and
reweights component weights by per-component marginal evidence. The
finite-mixture analogue of a Kalman update.gmm_aggregate(g, A, noise_cov) — named alias for
gmm_affine() aimed at downscaling / aggregation pipelines.gmm_missing(g, observed, values) — Schur-complement
conditioning routed through an integer-index API for missing-data
pipelines.docs/design/operator_calculus_v0.3.md — pre-implementation design
note: maths, honesty constraints (no non-affine fallbacks, no
approximate closed form), public API freeze, test obligations,
performance budget, pre-release gate.vignettes/operator_calculus.Rmd — educational vignette with
Kalman parity check, sequential vs stacked observations,
aggregation through a coarsening matrix, and a comparison to a
Gaussian-process latent.inst/validation/operator_calculus_pinned.R — three pinned
reference pipelines (Kalman parity, sequential vs stacked,
aggregate-then-observe) with hand-coded acceptance ranges.test-operator-calculus.R (12 tests, 46 expectations): A0–A2
(affine of moments), O0–O2 (Kalman parity, vanishing-evidence
guard, Bayes consistency), G0 (aggregate alias), M0 (missing vs
conditionalise), C0 (composition with marginalise), plus full
input-validation coverage.R/operator_calculus.R consolidates the four operators with
shared validation helpers (.validate_A, .validate_b,
.validate_noise_cov) and a single numerical-hygiene policy
(ridge after each output covariance, symmetrisation, chol-based
inverse with retry on near-singular matrices).gmm_observe() issues a proxymix_observe_no_update warning when
the marginal evidence is numerically zero at every component and
returns the prior unchanged with a metadata flag.First Tier-2 graduation. Two methodological extensions that compose cleanly with the regime-(iii) wedge:
from_kde() (Tier-2 graduation). Compiles a kernel density
estimate over an n by p sample matrix into a closed-form
Gaussian-mixture proxy via regime-(iii) KLD-EM. Supports scalar and
diagonal bandwidths ("silverman", "scott", numeric scalar, or
per-coordinate numeric vector). Dimensional guard: p <= 5
recommended, p <= 10 allowed with warning, p > 10 rejected. The
KDE-target is normalised = TRUE by construction, so downstream
KLD and Hellinger diagnostics report absolute values. Companion
vignette: vignettes/from_kde.Rmd.gmm_target_from_posterior() (Contract A constructor).
S3 generic that compiles an (unnormalised) Bayesian posterior into a
gmm_target. The function method accepts a bare vectorised callable
with required parameter_names; the default method points users at
either a registered Bayesian-package method (flexyBayes, brms,
Stan, ...) or the function-based path. Vectorisation contract is
enforced at construction by a probe call.DESCRIPTION now ships the canonical GitHub
namespace at github.com/max578/proxymix.test-from-kde.R (8 tests, 25 expectations): end-to-end recovery,
bandwidth selection branches, dimensional guards, normalisation
declaration, default proposal sanity, metadata pass-through.test-from-posterior.R (7 tests, 21 expectations): vectorisation
contract enforcement, log-normalizer pass-through, default-method
hinting, name validation, round-trip through fit_proxymix(regime = "kld"), attribute-based parameter-name support.inst/validation/from_kde_pinned_fits.R: pinned validation across
three reference KDE -> GMM pipelines (bimodal, banana, mixture)
with MC-SE-aware acceptance ranges.vignettes/from_kde.Rmd: educational walk-through covering scope,
bandwidth sensitivity, recovery on a known mixture, and the
contrast between KDE and proxy log-densities.gmm_target_from_posterior registers an S3 generic, paving the way
for flexyBayes::gmm_target_from_posterior.flexybayes (and analogous
methods for brms, Stan, pymc-via-reticulate) without coupling
proxymix to any specific Bayesian backend.Audit-driven scientific hardening pass. No new Tier-2 bodies; the wedge is made harder to misuse.
gmm_target gains two new properties,
normalised (logical or NA) and log_normalizer (numeric or NA),
so that an unnormalised log_density can be supplied without making
downstream KLD or Hellinger diagnostics misleading. All three built-in
targets (banana_target(), donut_target(), mixture_target())
declare normalised = TRUE; the unnormalised case is now explicitly
documented at the target level.gmm_canonicalise() function
reorders the components of a gmm (or gmm_fit) by descending
weight, then by descending ||mu|| as a tiebreaker. fit_proxymix()
and the regime-specific fitters now canonicalise their outputs by
default (canonicalise = TRUE), making prints, snapshot tests, and
cross-run comparisons reproducible. Set canonicalise = FALSE to
retain the raw EM-order parameters.fit_kld_em() (and
therefore fit_proxymix(regime = "kld")) accepts validation_size
and validation_proposal. When validation_size > 0, a second
independent IS sample is drawn and the fit's diagnostics list
records validation_kld, validation_ess, and
validation_max_weight. This lets users tell the difference between
in-sample overfit and a fit that generalises across IS draws.fit_kld_em() now records
ess_relative (ESS / is_size), max_weight (largest
self-normalised weight), support_fraction (fraction of IS draws
with finite log-density under target and proposal), and a
Monte-Carlo standard error for the final KLD estimate
(mc_se_kld). A new ess_summary() helper returns the headline
numbers as a small list.kld_is_shifted and kld_shift_explanation whenever the target is
unnormalised or its normalisation is unknown, so users do not silently
read a shifted MC integral as an absolute divergence.hellinger_mc() now warns when the target is
not declared normalised = TRUE — the squared Hellinger distance is
not meaningful against an unnormalised target.fit_kld_em() issues a cli warning
when more than 5% of importance-sample draws fall outside the
proposal's support or carry non-finite weights. The most common
trigger is an is_uniform() proposal whose box does not cover the
target's mass.inst/validation/regime_iii_pinned_fits.R — a runnable validation
script that fits the three built-in targets with pinned seeds and
records final KLD, ESS, max weight, validation KLD, and runtime;
intended as the seed of a growing inst/validation/ corpus per the
audit's recommendation.test-canonicalise.R, test-normalisation.R,
test-validation-split.R, test-support-warning.R, and
test-monotone-objective.R. The last asserts monotonicity of the
fixed IS-weighted objective (\sum_n W_n \log g_\theta(x_n)) under
exact KLD-EM updates, which is a tighter check than the previous
generic "trace decreases" test.critical_review_20260514.md — itemised response to the audit.plan/proxymix_plan_v0.2_methodological.md — forward methodological
plan: v0.2 (from_kde() graduation guard-railed), v0.3 (affine-
Gaussian operator calculus), and the audit-mandated five-phase
protocol for the collider / DAG research branch.gmm_canonicalise() is the single source of truth for component
ordering — used by all three fitters and the dispatcher.Initial development release. Local-only; not yet on CRAN.
fit_proxymix() top-level dispatcher with three fitting regimes:
"moment" (closed-form moment matching), "sample" (classical EM on
i.i.d. samples), and "kld" (importance-sampled KLD-EM against an
evaluable-only target density). The "auto" regime picks the cheapest
applicable regime from the structure of the supplied gmm_target.gmm_target, gmm_fit, is_proposal, with
print() / format() methods and validators.gmm_ops.R: dgmm(), rgmm(),
gmm_marginalise(), gmm_conditionalise() (Schur complement),
gmm_kld() (Monte Carlo estimator with variational upper / lower
bounds for sanity).proposals.R: is_uniform(),
is_mvn(), is_mvt(); all wrap a is_proposal instance.kld_trace(), ess_trace(), hellinger_mc(),
bic_aic().init.R,
plus init_random(), init_kmeans(), init_moment_seed().banana_target(),
donut_target(), mixture_target(), plus the from-samples and
from-function constructors gmm_target_from_samples() and
gmm_target().quickstart, three_regimes, density_shapes
(the wedge demonstration), and roadmap (Tier-2 stubs).The following functions ship with stable signatures, full documentation,
and signature-stability tests; the body raises a "not yet implemented"
condition with a pointer to vignettes/roadmap.Rmd.
from_kde() — KDE to GMM proxy via KLD-EM.from_aggregate_likelihood() — aggregate-likelihood downscaling
(Sejdinovic et al. kernel-downsizing framework).fit_kld_em_collider() — KLD-EM under DAG-implied conditional
independence constraints.to_apsim_scenarios() — Gaussian-mixture samples to APSIM scenario
tables.from_simulator() — wrap an expensive simulator as a
gmm_target via kernel-density or empirical-likelihood
bridges.Adaptive importance sampling, variational boosting, normalising-flow proposals, Stan / INLA inter-operation.