Changes in version 0.3.0                        

New features

  - New: autoplot() method for gmm_fit. Render a fitted proxy with
    ggplot2::autoplot(fit) — a marginal density curve in one dimension,
    or a viridis density raster with per-component ellipses in two. Any
    ambient dimension is supported: the requested coordinates are
    reduced through the package's own closed-form gmm_marginalise()
    before plotting (e.g. autoplot(fit, dims = c(1L, 3L))). ggplot2
    stays an optional dependency — the method registers only when
    ggplot2 is installed, so R CMD check remains clean with no sibling
    package present.

proxymix 0.3.0 (2026-05-14)

Second methodological extension. Brings a complete affine-Gaussian
operator calculus to Gaussian-mixture proxies — pushforward, Bayesian
update on a noisy linear observation, aggregation, missing-data
conditioning — each closed-form and component-wise.

User-visible changes

  - New: gmm_affine(g, A, b, noise_cov) — closed-form pushforward of a
    Gaussian mixture through y = A x + b + epsilon, epsilon ~ N(0,
    noise_cov). Returns the mixture in R^m with mu'_k = A mu_k + b,
    Sigma'_k = A Sigma_k A' + noise_cov, weights unchanged.
  - New: gmm_observe(g, A, y, noise_cov) — Bayesian update on a noisy
    linear observation. Applies the Kalman gain per component and
    reweights component weights by per-component marginal evidence. The
    finite-mixture analogue of a Kalman update.
  - New: gmm_aggregate(g, A, noise_cov) — named alias for gmm_affine()
    aimed at downscaling / aggregation pipelines.
  - New: gmm_missing(g, observed, values) — Schur-complement
    conditioning routed through an integer-index API for missing-data
    pipelines.

Design and validation

  - docs/design/operator_calculus_v0.3.md — pre-implementation design
    note: maths, honesty constraints (no non-affine fallbacks, no
    approximate closed form), public API freeze, test obligations,
    performance budget, pre-release gate.
  - vignettes/operator_calculus.Rmd — educational vignette with Kalman
    parity check, sequential vs stacked observations, aggregation
    through a coarsening matrix, and a comparison to a Gaussian-process
    latent.
  - inst/validation/operator_calculus_pinned.R — three pinned reference
    pipelines (Kalman parity, sequential vs stacked,
    aggregate-then-observe) with hand-coded acceptance ranges.

Tests

  - test-operator-calculus.R (12 tests, 46 expectations): A0–A2 (affine
    of moments), O0–O2 (Kalman parity, vanishing-evidence guard, Bayes
    consistency), G0 (aggregate alias), M0 (missing vs conditionalise),
    C0 (composition with marginalise), plus full input-validation
    coverage.

Internal

  - R/operator_calculus.R consolidates the four operators with shared
    validation helpers (.validate_A, .validate_b, .validate_noise_cov)
    and a single numerical-hygiene policy (ridge after each output
    covariance, symmetrisation, chol-based inverse with retry on
    near-singular matrices).
  - gmm_observe() issues a proxymix_observe_no_update warning when the
    marginal evidence is numerically zero at every component and returns
    the prior unchanged with a metadata flag.

proxymix 0.2.0 (2026-05-14)

First Tier-2 graduation. Two methodological extensions that compose
cleanly with the regime-(iii) wedge:

User-visible changes

  - New: from_kde() (Tier-2 graduation). Compiles a kernel density
    estimate over an n by p sample matrix into a closed-form
    Gaussian-mixture proxy via regime-(iii) KLD-EM. Supports scalar and
    diagonal bandwidths ("silverman", "scott", numeric scalar, or
    per-coordinate numeric vector). Dimensional guard: p <= 5
    recommended, p <= 10 allowed with warning, p > 10 rejected. The
    KDE-target is normalised = TRUE by construction, so downstream KLD
    and Hellinger diagnostics report absolute values. Companion
    vignette: vignettes/from_kde.Rmd.
  - New: gmm_target_from_posterior() (Contract A constructor). S3
    generic that compiles an (unnormalised) Bayesian posterior into a
    gmm_target. The function method accepts a bare vectorised callable
    with required parameter_names; the default method points users at
    either a registered Bayesian-package method (flexyBayes, brms, Stan,
    ...) or the function-based path. Vectorisation contract is enforced
    at construction by a probe call.
  - URL and BugReports. DESCRIPTION now ships the canonical GitHub
    namespace at github.com/max578/proxymix.

Tests

  - test-from-kde.R (8 tests, 25 expectations): end-to-end recovery,
    bandwidth selection branches, dimensional guards, normalisation
    declaration, default proposal sanity, metadata pass-through.
  - test-from-posterior.R (7 tests, 21 expectations): vectorisation
    contract enforcement, log-normalizer pass-through, default-method
    hinting, name validation, round-trip through fit_proxymix(regime =
    "kld"), attribute-based parameter-name support.
  - inst/validation/from_kde_pinned_fits.R: pinned validation across
    three reference KDE -> GMM pipelines (bimodal, banana, mixture) with
    MC-SE-aware acceptance ranges.

Documentation

  - vignettes/from_kde.Rmd: educational walk-through covering scope,
    bandwidth sensitivity, recovery on a known mixture, and the contrast
    between KDE and proxy log-densities.

Internal

  - gmm_target_from_posterior registers an S3 generic, paving the way
    for flexyBayes::gmm_target_from_posterior.flexybayes (and analogous
    methods for brms, Stan, pymc-via-reticulate) without coupling
    proxymix to any specific Bayesian backend.
  - From-KDE log-density evaluation uses chunked matrix builds so that
    peak memory stays bounded for large IS samples.

proxymix 0.1.1 (2026-05-14)

Audit-driven scientific hardening pass. No new Tier-2 bodies; the wedge
is made harder to misuse.

User-visible changes

  - Normalisation-aware targets. gmm_target gains two new properties,
    normalised (logical or NA) and log_normalizer (numeric or NA), so
    that an unnormalised log_density can be supplied without making
    downstream KLD or Hellinger diagnostics misleading. All three
    built-in targets (banana_target(), donut_target(), mixture_target())
    declare normalised = TRUE; the unnormalised case is now explicitly
    documented at the target level.
  - Canonical component ordering. A new gmm_canonicalise() function
    reorders the components of a gmm (or gmm_fit) by descending weight,
    then by descending ||mu|| as a tiebreaker. fit_proxymix() and the
    regime-specific fitters now canonicalise their outputs by default
    (canonicalise = TRUE), making prints, snapshot tests, and cross-run
    comparisons reproducible. Set canonicalise = FALSE to retain the raw
    EM-order parameters.
  - Held-out importance-sample validation. fit_kld_em() (and therefore
    fit_proxymix(regime = "kld")) accepts validation_size and
    validation_proposal. When validation_size > 0, a second independent
    IS sample is drawn and the fit's diagnostics list records
    validation_kld, validation_ess, and validation_max_weight. This lets
    users tell the difference between in-sample overfit and a fit that
    generalises across IS draws.
  - Richer IS diagnostics. fit_kld_em() now records ess_relative (ESS /
    is_size), max_weight (largest self-normalised weight),
    support_fraction (fraction of IS draws with finite log-density under
    target and proposal), and a Monte-Carlo standard error for the final
    KLD estimate (mc_se_kld). A new ess_summary() helper returns the
    headline numbers as a small list.
  - Shifted-KLD labelling. Diagnostics now record kld_is_shifted and
    kld_shift_explanation whenever the target is unnormalised or its
    normalisation is unknown, so users do not silently read a shifted MC
    integral as an absolute divergence.
  - Hellinger guard. hellinger_mc() now warns when the target is not
    declared normalised = TRUE — the squared Hellinger distance is not
    meaningful against an unnormalised target.
  - Proposal-support warning. fit_kld_em() issues a cli warning when
    more than 5% of importance-sample draws fall outside the proposal's
    support or carry non-finite weights. The most common trigger is an
    is_uniform() proposal whose box does not cover the target's mass.

Validation corpus

  - inst/validation/regime_iii_pinned_fits.R — a runnable validation
    script that fits the three built-in targets with pinned seeds and
    records final KLD, ESS, max weight, validation KLD, and runtime;
    intended as the seed of a growing inst/validation/ corpus per the
    audit's recommendation.

Tests

  - New: test-canonicalise.R, test-normalisation.R,
    test-validation-split.R, test-support-warning.R, and
    test-monotone-objective.R. The last asserts monotonicity of the
    fixed IS-weighted objective (\sum_n W_n \log g_\theta(x_n)) under
    exact KLD-EM updates, which is a tighter check than the previous
    generic "trace decreases" test.

Documentation

  - critical_review_20260514.md — itemised response to the audit.
  - plan/proxymix_plan_v0.2_methodological.md — forward methodological
    plan: v0.2 (from_kde() graduation guard-railed), v0.3 (affine-
    Gaussian operator calculus), and the audit-mandated five-phase
    protocol for the collider / DAG research branch.

Internal

  - gmm_canonicalise() is the single source of truth for component
    ordering — used by all three fitters and the dispatcher.

proxymix 0.0.1 (2026-05-13)

Initial development release. Local-only; not yet on CRAN.

Tier 1 — implemented

  - fit_proxymix() top-level dispatcher with three fitting regimes:
    "moment" (closed-form moment matching), "sample" (classical EM on
    i.i.d. samples), and "kld" (importance-sampled KLD-EM against an
    evaluable-only target density). The "auto" regime picks the cheapest
    applicable regime from the structure of the supplied gmm_target.
  - S7 class hierarchy: gmm_target, gmm_fit, is_proposal, with print() /
    format() methods and validators.
  - Closed-form GMM operators in gmm_ops.R: dgmm(), rgmm(),
    gmm_marginalise(), gmm_conditionalise() (Schur complement),
    gmm_kld() (Monte Carlo estimator with variational upper / lower
    bounds for sanity).
  - Importance-sampling proposals in proposals.R: is_uniform(),
    is_mvn(), is_mvt(); all wrap a is_proposal instance.
  - Diagnostics: kld_trace(), ess_trace(), hellinger_mc(), bic_aic().
  - Multi-start best-of (Karlis & Xekalaki) initialisation in init.R,
    plus init_random(), init_kmeans(), init_moment_seed().
  - Built-in target factories used in the vignettes: banana_target(),
    donut_target(), mixture_target(), plus the from-samples and
    from-function constructors gmm_target_from_samples() and
    gmm_target().
  - Four vignettes: quickstart, three_regimes, density_shapes (the wedge
    demonstration), and roadmap (Tier-2 stubs).

Tier 2 — provisioned stubs only

The following functions ship with stable signatures, full documentation,
and signature-stability tests; the body raises a "not yet implemented"
condition with a pointer to vignettes/roadmap.Rmd.

  - from_kde() — KDE to GMM proxy via KLD-EM.
  - from_aggregate_likelihood() — aggregate-likelihood downscaling
    (Sejdinovic et al. kernel-downsizing framework).
  - fit_kld_em_collider() — KLD-EM under DAG-implied conditional
    independence constraints.
  - to_apsim_scenarios() — Gaussian-mixture samples to APSIM scenario
    tables.
  - from_simulator() — wrap an expensive simulator as a gmm_target via
    kernel-density or empirical-likelihood bridges.

Tier 3 — deferred (not in scope)

Adaptive importance sampling, variational boosting, normalising-flow
proposals, Stan / INLA inter-operation.