| Title: | KL-Optimal Gaussian Mixture Proxies for Arbitrary Target Densities |
|---|---|
| Description: | Fits multivariate Gaussian-mixture proxies that are Kullback-Leibler optimal to user-supplied target densities on real Euclidean space. Three fitting regimes are unified under one verb: (i) closed-form moment matching for a single component, (ii) classical expectation-maximisation when independent samples are available, and (iii) importance-sampled KLD-EM when the target can be evaluated point-wise but not (cheaply) sampled. Closed-form Gaussian-mixture operators (density, sampling, marginalisation, conditioning, divergence) round out the toolkit. Implements the regime hierarchy of Hoek and Elliott (2024) <doi:10.1080/07362994.2024.2372605>. |
| Authors: | Max Moldovan [aut, cre] (ORCID: <https://orcid.org/0000-0001-9680-8474>), Johannes van der Hoek [ctb] (Foundational theory in Hoek & Elliott (2024).), Robert J. Elliott [ctb] (Foundational theory in Hoek & Elliott (2024).) |
| Maintainer: | Max Moldovan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-06-02 10:24:50 UTC |
| Source: | https://github.com/max578/proxymix |
An autoplot() method for gmm_fit objects, rendering the fitted mixture
with ggplot2. The displayed coordinates are reduced to the requested one
or two dimensions through the closed-form marginal gmm_marginalise(), so
the method works for a proxy of any ambient dimension p.
A one-dimensional request draws the marginal mixture density, optionally with the per-component densities underneath and a rug of the target's samples. A two-dimensional request draws the marginal density as a viridis raster with white contour lines, optionally overlaying each component's mean and a probability-contour ellipse.
object |
A gmm_fit, typically from |
dims |
Integer vector of length one or two giving the coordinate(s) to
display, in |
n_grid |
Integer scalar — the number of grid points per axis at which
the density is evaluated. A two-dimensional plot evaluates |
n_sd |
Numeric scalar — how many component standard deviations beyond the extreme component means the plotting window extends. |
level |
Numeric scalar in |
show_components |
Logical scalar — whether to overlay the per-component densities (one dimension) or mean-and-ellipse glyphs (two dimensions). |
show_data |
Logical scalar — whether to overlay the target's samples, when the fitted target carries any. |
... |
Currently ignored, present for generic compatibility. |
The method is registered against the ggplot2::autoplot() generic only
when ggplot2 is installed; call it as ggplot2::autoplot(fit) or load
ggplot2 first. It returns the ggplot object, so the usual + layering
applies for further customisation.
A ggplot object.
Other classes:
gmm(),
gmm_dim(),
gmm_fit(),
gmm_n_components(),
gmm_target(),
is_proposal()
samples <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(samples) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) ggplot2::autoplot(fit) ggplot2::autoplot(fit, dims = 1L)samples <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(samples) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) ggplot2::autoplot(fit) ggplot2::autoplot(fit, dims = 1L)
A 2-D "banana" density obtained by warping an isotropic Gaussian
through the map .
The map has unit Jacobian, so the resulting density is exactly normalised.
banana_target(with_samples = FALSE, n = 2000L, seed = 1L)banana_target(with_samples = FALSE, n = 2000L, seed = 1L)
with_samples |
If |
n |
Number of samples to attach when |
seed |
Optional integer seed used when drawing the samples. |
A gmm_target in dimension 2.
Other targets:
donut_target(),
gmm_target_from_samples(),
mixture_target()
b <- banana_target() b b@log_density(matrix(c(0, 0, 1, 0), ncol = 2, byrow = TRUE))b <- banana_target() b b@log_density(matrix(c(0, 0, 1, 0), ncol = 2, byrow = TRUE))
Returns the BIC and AIC of a regime-(ii) fit. Both criteria are computed
against the empirical log-likelihood of the samples used to fit the
model. They are NA for regimes that do not have an empirical likelihood
("moment", "kld").
bic_aic(fit)bic_aic(fit)
fit |
A gmm_fit. |
A list with bic, aic, and n_params.
Other diagnostics:
ess_summary(),
ess_trace(),
hellinger_mc(),
kld_trace()
x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) bic_aic(fit)x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) bic_aic(fit)
Evaluates the density (or log-density) of a Gaussian mixture at one or more points.
dgmm(x, g, log = FALSE)dgmm(x, g, log = FALSE)
x |
A numeric matrix with one observation per row, or a length- |
g |
|
log |
Logical. If |
A numeric vector of length nrow(x).
Other ops:
gmm_canonicalise(),
gmm_conditionalise(),
gmm_kld(),
gmm_marginalise(),
rgmm()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) dgmm(c(0, 0), g) dgmm(c(0, 0), g, log = TRUE)g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) dgmm(c(0, 0), g) dgmm(c(0, 0), g, log = TRUE)
A rotationally symmetric annulus on , with density
Numerical integration in polar coordinates fixes the normaliser; the
returned target exposes a normalised log_density.
donut_target(r0 = 2.5, sigma = 0.5, with_samples = FALSE, n = 2000L, seed = 1L)donut_target(r0 = 2.5, sigma = 0.5, with_samples = FALSE, n = 2000L, seed = 1L)
r0 |
Centre radius of the annulus. |
sigma |
Annulus width. |
with_samples |
If |
n |
Number of samples to attach when |
seed |
Optional integer seed used when drawing the samples. |
A gmm_target in dimension 2.
Other targets:
banana_target(),
gmm_target_from_samples(),
mixture_target()
d <- donut_target() dd <- donut_target() d
Convenience accessor returning the headline IS-quality numbers for a
regime-(iii) fit: effective sample size and its ratio to is_size,
the largest self-normalised weight, the support fraction (proportion
of draws that received a finite weight), and the Monte-Carlo standard
error of the final KLD estimate. Returns NA fields for regimes that
do not use importance sampling.
ess_summary(fit)ess_summary(fit)
fit |
A gmm_fit. |
Validation-side numbers (validation_*) are populated only when the
fit was called with validation_size > 0.
A list of numeric scalars (or NAs where not applicable).
Other diagnostics:
bic_aic(),
ess_trace(),
hellinger_mc(),
kld_trace()
fit <- fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 1500L, max_iter = 20L, seed = 1L, validation_size = 1500L) ess_summary(fit)fit <- fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 1500L, max_iter = 20L, seed = 1L, validation_size = 1500L) ess_summary(fit)
Returns the effective sample size (1 / sum(W^2)) of the
self-normalised importance weights used by a regime-(iii) fit. NA for
regimes that do not use importance sampling.
ess_trace(fit)ess_trace(fit)
fit |
A gmm_fit. |
Numeric scalar (or NA_real_).
Other diagnostics:
bic_aic(),
ess_summary(),
hellinger_mc(),
kld_trace()
fit <- fit_proxymix(banana_target(), N = 2L, regime = "kld", is_size = 1000L, max_iter = 15L, seed = 1L) ess_trace(fit)fit <- fit_proxymix(banana_target(), N = 2L, regime = "kld", is_size = 1000L, max_iter = 15L, seed = 1L) ess_trace(fit)
Implements regime (ii) of Hoek and Elliott (2024). Runs the textbook expectation-maximisation algorithm for Gaussian mixtures on the supplied samples, with diagonal ridge regularisation for numerical stability, optional multi-start, and monotone-log-likelihood checking.
fit_em_samples( target, N = 2L, init = NULL, max_iter = 100L, tol = 1e-06, ridge_eps = 1e-06, n_starts = 5L, canonicalise = TRUE )fit_em_samples( target, N = 2L, init = NULL, max_iter = 100L, tol = 1e-06, ridge_eps = 1e-06, n_starts = 5L, canonicalise = TRUE )
target |
A gmm_target carrying an |
N |
Number of mixture components. |
init |
A gmm initialisation, or |
max_iter |
Maximum number of EM iterations. |
tol |
Relative-log-likelihood convergence tolerance. |
ridge_eps |
Ridge added to each component covariance at every M-step. |
n_starts |
Number of multi-start initialisations (only when |
canonicalise |
Logical. If |
A gmm_fit with regime = "sample".
Other fitting:
fit_kld_em(),
fit_moment_match(),
from_kde()
x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit <- fit_em_samples(tgt, N = 2L, max_iter = 30L, n_starts = 2L) fit@diagnostics$loglik_finalx <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit <- fit_em_samples(tgt, N = 2L, max_iter = 30L, n_starts = 2L) fit@diagnostics$loglik_final
Implements regime (iii) of Hoek and Elliott (2024). Minimises
KL(f || g_theta) where f is supplied as an evaluable log-density on
the target, via expectation-maximisation against importance-sampled
draws from a user-chosen proposal q.
fit_kld_em( target, N = 3L, proposal = NULL, is_size = 5000L, init = NULL, max_iter = 100L, tol = 1e-05, ridge_eps = 1e-06, min_ess = 50, seed = NULL, validation_size = 0L, validation_proposal = NULL, validation_seed = NULL, support_warn = TRUE, canonicalise = TRUE )fit_kld_em( target, N = 3L, proposal = NULL, is_size = 5000L, init = NULL, max_iter = 100L, tol = 1e-05, ridge_eps = 1e-06, min_ess = 50, seed = NULL, validation_size = 0L, validation_proposal = NULL, validation_seed = NULL, support_warn = TRUE, canonicalise = TRUE )
target |
A gmm_target with a non-NULL |
N |
Number of mixture components. |
proposal |
An is_proposal. Defaults to a multivariate-t with
|
is_size |
Number of importance-sampling draws used for fitting. |
init |
A gmm initialisation, or |
max_iter |
Maximum number of EM iterations. |
tol |
Convergence tolerance on the relative change in the importance-sampled KLD estimate. |
ridge_eps |
Ridge added to each component covariance at every M-step. |
min_ess |
Minimum effective sample size below which a warning is issued. |
seed |
Optional integer seed used when drawing the fitting IS sample. |
validation_size |
Number of independent importance-sampling draws
to use for held-out validation. Default |
validation_proposal |
Optional is_proposal for the validation sample. Defaults to the same proposal used for fitting. |
validation_seed |
Optional integer seed used when drawing the
validation sample. Defaults to |
support_warn |
Logical. If |
canonicalise |
Logical. If |
The Monte Carlo draws from q are computed once at the start; the
resulting self-normalised importance-sampling weights are reused at every
EM iteration. Adaptive importance sampling — redrawing each round — is
a Tier-3 deferral.
Since v0.1.1 the function also draws an independent validation IS
sample when validation_size > 0 and reports its own KLD estimate,
effective sample size, and largest weight share. This lets users tell
the difference between in-sample EM overfit to one particular IS draw
and a fit that generalises across independent IS draws.
When the target's normalised property is FALSE or NA, the
importance-sampled kld_final and kld_trace measure
rather than the absolute
divergence. The fit's diagnostics list records this via
kld_is_shifted = TRUE and a kld_shift_explanation string. When the
target also supplies a finite log_normalizer, a corrected absolute
estimate is reported as kld_final_absolute.
A gmm_fit with regime = "kld". The diagnostics list
contains, among others, kld_trace, kld_final,
kld_is_shifted, kld_final_absolute (when computable), ess,
ess_relative (ess / is_size), max_weight, support_fraction,
mc_se_kld, validation_kld, validation_ess, and
validation_max_weight.
Other fitting:
fit_em_samples(),
fit_moment_match(),
from_kde()
tgt <- banana_target() q <- is_mvt(n_dim = 2L, mean = c(0, 0), sigma = 4 * diag(2), df = 5) fit <- fit_kld_em(tgt, N = 3L, proposal = q, is_size = 1500L, max_iter = 25L, seed = 1L, validation_size = 1500L) fit@diagnostics$kld_final fit@diagnostics$validation_kldtgt <- banana_target() q <- is_mvt(n_dim = 2L, mean = c(0, 0), sigma = 4 * diag(2), df = 5) fit <- fit_kld_em(tgt, N = 3L, proposal = q, is_size = 1500L, max_iter = 25L, seed = 1L, validation_size = 1500L) fit@diagnostics$kld_final fit@diagnostics$validation_kld
Tier-2 stub — signature stable; body not yet implemented.
fit_kld_em_collider(target, dag, N = 3L, ...)fit_kld_em_collider(target, dag, N = 3L, ...)
target |
A gmm_target with a non-NULL |
dag |
A description of the DAG structure (planned: an adjacency
matrix or a |
N |
Number of mixture components. |
... |
Future configuration arguments. |
Plan: a regime-(iii) KLD-EM that projects each iteration onto the manifold of joint densities respecting a DAG-implied set of conditional-independence constraints. Targets the collider-regularised regression direction described by Sejdinovic et al.
A gmm_fit (when implemented).
Other stubs:
from_aggregate_likelihood(),
from_simulator(),
to_apsim_scenarios()
try(fit_kld_em_collider(banana_target(), dag = matrix(0, 2, 2), N = 2L))try(fit_kld_em_collider(banana_target(), dag = matrix(0, 2, 2), N = 2L))
Implements regime (i) of Hoek and Elliott (2024). When N == 1, this is
the exact moment match: mu is the target mean and Sigma is the target
covariance. When N > 1, the function returns the deterministic
moment-seed of init_moment_seed() wrapped as a gmm_fit, without
iterative refinement — useful as a starting point for the iterative
regimes.
fit_moment_match(target, N = 1L, ridge_eps = 1e-06, canonicalise = TRUE)fit_moment_match(target, N = 1L, ridge_eps = 1e-06, canonicalise = TRUE)
target |
A gmm_target. |
N |
Number of components. |
ridge_eps |
Ridge added to the empirical covariance for numerical stability. |
canonicalise |
Logical. If |
Either the target must carry an n by p samples matrix, or its
metadata slot must contain pre-computed moments of the form
list(mean = <p-vec>, cov = <p-by-p>).
A gmm_fit with regime = "moment".
Other fitting:
fit_em_samples(),
fit_kld_em(),
from_kde()
x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit_moment_match(tgt, N = 1L)x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) fit_moment_match(tgt, N = 1L)
The unified front door of proxymix. Picks a fitting regime (or honours
an explicit choice) and dispatches to the corresponding regime-specific
fitter:
fit_proxymix( target, N = 1L, regime = c("auto", "moment", "sample", "kld"), ... )fit_proxymix( target, N = 1L, regime = c("auto", "moment", "sample", "kld"), ... )
target |
A gmm_target. |
N |
Number of components. |
regime |
One of |
... |
Additional arguments forwarded to the regime-specific fitter.
The most useful pass-throughs are |
"moment" - closed-form moment matching (fit_moment_match()).
"sample" - classical EM on i.i.d. samples (fit_em_samples()).
"kld" - importance-sampled KLD-EM (fit_kld_em()) — the wedge.
With regime = "auto" the choice is made from the shape of the supplied
target:
N == 1 and the target carries samples or moments: "moment".
N >= 2 and the target carries samples: "sample".
The target carries log_density only (no samples): "kld".
A gmm_fit.
## auto: samples + N=2 -> classical EM. x <- matrix(stats::rnorm(200), ncol = 2) tgt_s <- gmm_target_from_samples(x) fit_proxymix(tgt_s, N = 2L, max_iter = 25L) ## explicit "kld" on a log-density-only target. fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 1000L, max_iter = 20L, seed = 1L)## auto: samples + N=2 -> classical EM. x <- matrix(stats::rnorm(200), ncol = 2) tgt_s <- gmm_target_from_samples(x) fit_proxymix(tgt_s, N = 2L, max_iter = 25L) ## explicit "kld" on a log-density-only target. fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 1000L, max_iter = 20L, seed = 1L)
Tier-2 stub — signature stable; body not yet implemented.
from_aggregate_likelihood(y, latent_aggregator, N = 3L, ...)from_aggregate_likelihood(y, latent_aggregator, N = 3L, ...)
y |
A numeric matrix or vector of aggregate-level observations. |
latent_aggregator |
A function mapping latent |
N |
Number of Gaussian components in the proxy. |
... |
Future configuration arguments. |
Plan: fit a Gaussian-mixture proxy f_theta(x) so that, when used as
the latent-level density inside an aggregate-likelihood downscaling
framework g(y) = E_{x | y}[f(x)], the resulting downscaling likelihood
has tractable closed-form marginalisation. Targets the
kernel-downsizing application described by Sejdinovic et al.
A gmm_fit (when implemented).
Other stubs:
fit_kld_em_collider(),
from_simulator(),
to_apsim_scenarios()
try(from_aggregate_likelihood(matrix(0, 1, 1), latent_aggregator = identity, N = 2L))try(from_aggregate_likelihood(matrix(0, 1, 1), latent_aggregator = identity, N = 2L))
Fits an N-component Gaussian-mixture proxy to a (Gaussian, diagonal-
bandwidth) kernel-density estimate over samples, via regime (iii)
KLD-EM. The proxy is closed-form marginalisable, conditionable, and
samplable; the KDE is none of those things on its own.
from_kde( samples, N = 3L, bandwidth = "silverman", proposal = NULL, is_size = 5000L, max_iter = 100L, tol = 1e-05, ridge_eps = 1e-06, min_ess = 50L, seed = NULL, validation_size = 0L, validation_proposal = NULL, validation_seed = NULL, support_warn = TRUE, canonicalise = TRUE )from_kde( samples, N = 3L, bandwidth = "silverman", proposal = NULL, is_size = 5000L, max_iter = 100L, tol = 1e-05, ridge_eps = 1e-06, min_ess = 50L, seed = NULL, validation_size = 0L, validation_proposal = NULL, validation_seed = NULL, support_warn = TRUE, canonicalise = TRUE )
samples |
An |
N |
Number of mixture components in the proxy. |
bandwidth |
Either |
proposal |
Optional is_proposal. Default is a multivariate-t
centred at |
is_size |
Importance-sample size for fitting. Default |
max_iter |
Maximum EM iterations. Forwarded to |
tol |
Convergence tolerance. Forwarded to |
ridge_eps |
Ridge added to each component covariance at every
M-step. Forwarded to |
min_ess |
Minimum effective sample size below which a warning is
issued. Forwarded to |
seed |
Optional integer seed for the fitting IS draw. |
validation_size |
Held-out IS sample size. Forwarded to
|
validation_proposal |
Optional is_proposal for the held-out
sample. Forwarded to |
validation_seed |
Optional integer seed for the held-out IS draw.
Forwarded to |
support_warn |
Logical. Forwarded to |
canonicalise |
Logical. If |
This is a compression operation: take an n-sample KDE and replace
it with the closest N-component mixture in the Kullback-Leibler sense
(which is much smaller than n for typical use). Bias inherited from
the KDE is reproduced in the proxy; the bandwidth controls the
bias-variance trade-off.
Dimensional scope. The graduation guard is p <= 5 (recommended),
p <= 10 (allowed with warning), p > 10 (rejected). The wedge of
KLD-EM is regime (iii) IS, whose effective-sample-size collapses
sharply in high dimensions; richer ambient spaces will await the v0.3
affine-Gaussian operator calculus.
A gmm_fit with regime = "kld" and metadata recording the
KDE inputs (kde_samples_n, bandwidth, bandwidth_method).
Other fitting:
fit_em_samples(),
fit_kld_em(),
fit_moment_match()
Other v0_2:
gmm_target_from_posterior()
set.seed(1L) x <- rbind( mvnfast::rmvn(120L, mu = c(-2, 0), sigma = diag(2)), mvnfast::rmvn(120L, mu = c( 2, 0), sigma = diag(2)) ) fit <- from_kde(x, N = 2L, is_size = 2000L, max_iter = 40L, seed = 1L) fit ess_summary(fit)set.seed(1L) x <- rbind( mvnfast::rmvn(120L, mu = c(-2, 0), sigma = diag(2)), mvnfast::rmvn(120L, mu = c( 2, 0), sigma = diag(2)) ) fit <- from_kde(x, N = 2L, is_size = 2000L, max_iter = 40L, seed = 1L) fit ess_summary(fit)
gmm_target (stub)Tier-2 stub — signature stable; body not yet implemented.
from_simulator(simulator, design, ...)from_simulator(simulator, design, ...)
simulator |
A function |
design |
An |
... |
Future configuration arguments. |
Plan: probe an expensive simulator over a Latin-hypercube design,
build a kernel-density estimate (or empirical-likelihood surface) on
the simulator outputs, and expose the result as a gmm_target with
an evaluable log_density.
A gmm_target (when implemented).
Other stubs:
fit_kld_em_collider(),
from_aggregate_likelihood(),
to_apsim_scenarios()
try(from_simulator(simulator = identity, design = matrix(stats::rnorm(20), ncol = 2)))try(from_simulator(simulator = identity, design = matrix(stats::rnorm(20), ncol = 2)))
Lightweight S7 class representing an N-component multivariate Gaussian
mixture on . Use gmm() to construct, dgmm() / rgmm()
to evaluate or sample, and gmm_marginalise() / gmm_conditionalise()
for closed-form operations.
gmm( weights = numeric(0), means = list(), covariances = list(), name = "gmm", metadata = list() )gmm( weights = numeric(0), means = list(), covariances = list(), name = "gmm", metadata = list() )
weights |
Numeric vector of length |
means |
List of length |
covariances |
List of length |
name |
Optional human-readable name. |
metadata |
Optional list of arbitrary metadata (regime tags, diagnostic snapshots, etc.). |
An S7 object inheriting from gmm.
Other classes:
autoplot.gmm_fit,
gmm_dim(),
gmm_fit(),
gmm_n_components(),
gmm_target(),
is_proposal()
g <- gmm( weights = c(0.4, 0.6), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2)) ) gg <- gmm( weights = c(0.4, 0.6), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2)) ) g
Returns the (closed-form) distribution of
when is a Gaussian mixture and is independent additive Gaussian noise.
gmm_affine(g, A, b = 0, noise_cov = NULL, ridge_eps = 1e-06)gmm_affine(g, A, b = 0, noise_cov = NULL, ridge_eps = 1e-06)
g |
|
A |
An |
b |
Numeric scalar or length- |
noise_cov |
|
ridge_eps |
Tiny ridge added to the output covariances for
numerical hygiene. Set to |
For each component k, the pushed-forward parameters are
and the mixture weights are unchanged. This is the finite-mixture analogue of a Kalman-style predict step.
The channel is required to be affine in x and the noise is
required to be Gaussian. Non-linear channels are not closed form and
must not be silently approximated; pass them to a Monte Carlo helper
instead.
A gmm in R^m with the same number of components and the
same weights as g.
Other operators:
gmm_aggregate(),
gmm_missing(),
gmm_observe()
Other v0_3:
gmm_aggregate(),
gmm_missing(),
gmm_observe()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) A <- matrix(c(1, 0, 0, 1, 1, -1), nrow = 3L, byrow = TRUE) gmm_affine(g, A, b = c(0, 0, 0), noise_cov = 0.01 * diag(3))g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) A <- matrix(c(1, 0, 0, 1, 1, -1), nrow = 3L, byrow = TRUE) gmm_affine(g, A, b = c(0, 0, 0), noise_cov = 0.01 * diag(3))
A named alias for gmm_affine() when A is a (row-wise) aggregation
matrix — e.g. a block-sum, block-average, or unequal-weight aggregation
used in downscaling pipelines. The mathematics is identical to
gmm_affine(); the alias gives the public API a clearer hook for
aggregation-specific diagnostics in later releases.
gmm_aggregate(g, A, noise_cov = NULL, ridge_eps = 1e-06)gmm_aggregate(g, A, noise_cov = NULL, ridge_eps = 1e-06)
g |
|
A |
An |
noise_cov |
Optional |
ridge_eps |
Tiny ridge added to the output covariances for numerical hygiene. |
A gmm in R^m.
Other operators:
gmm_affine(),
gmm_missing(),
gmm_observe()
Other v0_3:
gmm_affine(),
gmm_missing(),
gmm_observe()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0, 1), c(1, 0, -1)), covariances = list(diag(3), diag(3))) # Sum coordinates 1 and 2 into a single aggregate; pass coord 3 through. A <- matrix(c(1, 1, 0, 0, 0, 1), nrow = 2L, byrow = TRUE) gmm_aggregate(g, A)g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0, 1), c(1, 0, -1)), covariances = list(diag(3), diag(3))) # Sum coordinates 1 and 2 into a single aggregate; pass coord 3 through. A <- matrix(c(1, 1, 0, 0, 0, 1), nrow = 2L, byrow = TRUE) gmm_aggregate(g, A)
Returns a new gmm (or gmm_fit) with the components permuted into a
canonical order: weight descending, then ||mu|| descending as a
tiebreaker. The mixture distribution is unchanged — only the bookkeeping
order is — but the canonical ordering removes the EM label-switching
nuisance from snapshot tests, cross-run comparisons, and printed
summaries.
gmm_canonicalise(g)gmm_canonicalise(g)
g |
Applied automatically by the regime-specific fitters
(fit_moment_match(), fit_em_samples(), fit_kld_em()) and by the
top-level dispatcher fit_proxymix() when canonicalise = TRUE
(the default).
A gmm (or gmm_fit) of the same subclass as g, with the
components permuted into canonical order.
Other ops:
dgmm(),
gmm_conditionalise(),
gmm_kld(),
gmm_marginalise(),
rgmm()
g <- gmm(weights = c(0.1, 0.6, 0.3), means = list(c(0, 0), c(3, 0), c(-1, 1)), covariances = list(diag(2), diag(2), diag(2))) gmm_canonicalise(g)g <- gmm(weights = c(0.1, 0.6, 0.3), means = list(c(0, 0), c(3, 0), c(-1, 1)), covariances = list(diag(2), diag(2), diag(2))) gmm_canonicalise(g)
Computes the conditional distribution of a Gaussian mixture given fixed
values of a subset of coordinates, by the Schur-complement formula
applied component-wise and re-weighted by the marginal evidence
of each component.
gmm_conditionalise(g, given)gmm_conditionalise(g, given)
g |
|
given |
A length- |
A gmm object in dimension equal to the number of free coordinates.
Other ops:
dgmm(),
gmm_canonicalise(),
gmm_kld(),
gmm_marginalise(),
rgmm()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) gmm_conditionalise(g, given = c(NA, 0.5))g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) gmm_conditionalise(g, given = c(NA, 0.5))
Convenience accessor returning the ambient dimension .
gmm_dim(x)gmm_dim(x)
x |
Integer scalar.
Other classes:
autoplot.gmm_fit,
gmm(),
gmm_fit(),
gmm_n_components(),
gmm_target(),
is_proposal()
g <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2))) gmm_dim(g)g <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2))) gmm_dim(g)
A gmm_fit is the result of fit_proxymix() (or one of the regime-specific
fitters). It inherits the mixture parameters of gmm and adds a record of
the target it was fitted to, the regime used, and the iteration
diagnostics.
gmm_fit( weights = numeric(0), means = list(), covariances = list(), name = "gmm", metadata = list(), target = NULL, regime = NA_character_, diagnostics = list(), converged = NA, iterations = NA_integer_, call = NULL )gmm_fit( weights = numeric(0), means = list(), covariances = list(), name = "gmm", metadata = list(), target = NULL, regime = NA_character_, diagnostics = list(), converged = NA, iterations = NA_integer_, call = NULL )
weights, means, covariances, name, metadata
|
See gmm. |
target |
The gmm_target the mixture was fitted to. |
regime |
One of |
diagnostics |
A list of regime-specific diagnostics
(see |
converged |
Logical scalar. |
iterations |
Integer scalar. |
call |
The matched call. |
An S7 object inheriting from gmm_fit (and gmm).
Other classes:
autoplot.gmm_fit,
gmm(),
gmm_dim(),
gmm_n_components(),
gmm_target(),
is_proposal()
samples <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(samples) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) inherits(fit, "proxymix::gmm_fit")samples <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(samples) fit <- fit_proxymix(tgt, N = 2L, regime = "sample", max_iter = 25L) inherits(fit, "proxymix::gmm_fit")
Estimates KL(p || q) between two Gaussian mixtures by Monte Carlo and,
optionally, evaluates the Hershey–Olsen variational approximation as a
deterministic sanity check.
gmm_kld(p, q, n_mc = 5000L, variational = TRUE)gmm_kld(p, q, n_mc = 5000L, variational = TRUE)
p, q
|
|
n_mc |
Number of Monte Carlo samples drawn from |
variational |
If |
The Monte Carlo estimator draws n_mc samples from p and returns the
empirical mean of log p(x) - log q(x), together with a Monte Carlo
standard error.
The variational approximation is
which is exact when p == q and tends to be a usable lower bound when
the components of p and q are well-separated. The closed-form
Gaussian–Gaussian KL KL(p_a || q_b) is used internally.
A list with components
mc - the Monte Carlo estimate of KL(p || q),
mc_se - its Monte Carlo standard error,
variational - the variational approximation (NA if
variational = FALSE),
n_mc - the number of Monte Carlo samples used.
Other ops:
dgmm(),
gmm_canonicalise(),
gmm_conditionalise(),
gmm_marginalise(),
rgmm()
p <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) q <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2) * 2)) gmm_kld(p, q, n_mc = 500L)p <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) q <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2) * 2)) gmm_kld(p, q, n_mc = 500L)
Computes the marginal distribution of a Gaussian mixture over a subset of coordinates. The marginal of a Gaussian mixture is itself a Gaussian mixture with the same weights.
gmm_marginalise(g, keep)gmm_marginalise(g, keep)
g |
|
keep |
Integer vector of coordinate indices to retain (in |
A gmm object in dimension length(keep).
Other ops:
dgmm(),
gmm_canonicalise(),
gmm_conditionalise(),
gmm_kld(),
rgmm()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0, 2), c(1, 0, -2)), covariances = list(diag(3), diag(3))) gmm_marginalise(g, keep = c(1L, 3L))g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0, 2), c(1, 0, -2)), covariances = list(diag(3), diag(3))) gmm_marginalise(g, keep = c(1L, 3L))
A structured wrapper around gmm_conditionalise() for the common case
where the observed coordinates are specified by integer index rather
than NA-padded vector. Equivalent to gmm_observe() with a
selection matrix A and zero noise covariance, but routed through
the Schur-complement path for efficiency.
gmm_missing(g, observed, values)gmm_missing(g, observed, values)
g |
|
observed |
Integer vector of indices in |
values |
Numeric vector of length |
A gmm in R^(p - length(observed)).
Other operators:
gmm_affine(),
gmm_aggregate(),
gmm_observe()
Other v0_3:
gmm_affine(),
gmm_aggregate(),
gmm_observe()
g <- gmm(weights = c(0.4, 0.6), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) # Condition coord 2 on the value 0.5; keep coord 1. gmm_missing(g, observed = 2L, values = 0.5)g <- gmm(weights = c(0.4, 0.6), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) # Condition coord 2 on the value 0.5; keep coord 1. gmm_missing(g, observed = 2L, values = 0.5)
Number of components in a Gaussian mixture
gmm_n_components(x)gmm_n_components(x)
x |
Integer scalar.
Other classes:
autoplot.gmm_fit,
gmm(),
gmm_dim(),
gmm_fit(),
gmm_target(),
is_proposal()
g <- gmm(weights = c(0.5, 0.5), means = list(c(0, 0), c(1, 1)), covariances = list(diag(2), diag(2))) gmm_n_components(g)g <- gmm(weights = c(0.5, 0.5), means = list(c(0, 0), c(1, 1)), covariances = list(diag(2), diag(2))) gmm_n_components(g)
Conditions a Gaussian mixture g on a single noisy linear observation
, .
Per component, applies the Kalman gain
and updates
Component weights are multiplied by the marginal evidence
and renormalised. This is
the finite-mixture analogue of a Kalman update step.
gmm_observe(g, A, y, noise_cov, b = 0, ridge_eps = 1e-06)gmm_observe(g, A, y, noise_cov, b = 0, ridge_eps = 1e-06)
g |
|
A |
An |
y |
A length- |
noise_cov |
An |
b |
Numeric scalar or length- |
ridge_eps |
Tiny ridge added to updated covariances for numerical
hygiene. Set to |
If the marginal evidence vanishes at every component (e.g. y is
many standard deviations from every component), the function issues a
warning and returns g unchanged with metadata$gmm_observe_no_update = TRUE.
A gmm in R^p with the same number of components and the
reweighted component weights.
Other operators:
gmm_affine(),
gmm_aggregate(),
gmm_missing()
Other v0_3:
gmm_affine(),
gmm_aggregate(),
gmm_missing()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) A <- matrix(c(1, 0), nrow = 1L) gmm_observe(g, A = A, y = 0.8, noise_cov = matrix(0.25, 1, 1))g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) A <- matrix(c(1, 0), nrow = 1L) gmm_observe(g, A = A, y = 0.8, noise_cov = matrix(0.25, 1, 1))
An S7 representation of a target density that proxymix is asked to
approximate. A target may carry an evaluable log_density, a matrix of
i.i.d. samples, or both. Each of the three fitting regimes consumes a
different subset:
gmm_target( n_dim = integer(0), log_density = NULL, samples = NULL, normalised = NA, log_normalizer = NA_real_, name = "gmm_target", metadata = list() )gmm_target( n_dim = integer(0), log_density = NULL, samples = NULL, normalised = NA, log_normalizer = NA_real_, name = "gmm_target", metadata = list() )
n_dim |
Integer scalar — the ambient dimension |
log_density |
Optional function: |
samples |
Optional |
normalised |
Logical scalar declaring whether |
log_normalizer |
Numeric scalar |
name |
Human-readable name. |
metadata |
Optional list of additional descriptors. |
regime "moment" needs samples (or both moments via metadata);
regime "sample" needs samples;
regime "kld" needs the log-density.
Use gmm_target() or gmm_target_from_samples() to construct.
Importance-sampled KLD-EM (regime "kld") only requires log_density
to be specified up to an unknown additive constant — the self-normalised
weights are invariant to scaling. The package's diagnostics downstream,
however, do depend on normalisation: an importance-sampled KLD estimate
against an unnormalised log-density measures
rather than
, and a squared-Hellinger Monte Carlo
estimate is only meaningful when both densities integrate to one.
Declare the target's normalisation explicitly via normalised (and,
where possible, supply log_normalizer) so that the package can label
shifted KLDs as shifted and refuse misleading Hellinger reports.
An S7 object of class gmm_target.
Other classes:
autoplot.gmm_fit,
gmm(),
gmm_dim(),
gmm_fit(),
gmm_n_components(),
is_proposal()
tgt <- banana_target() tgttgt <- banana_target() tgt
gmm_target
Generic S3 constructor that turns a Bayesian posterior — represented
either by a fitted model object (e.g. from flexyBayes, brms,
Stan) or by a bare callable — into a gmm_target suitable for the
regime (iii) wedge of fit_proxymix() / fit_kld_em().
gmm_target_from_posterior(model, ...) ## Default S3 method: gmm_target_from_posterior(model, ...) ## S3 method for class ''function'' gmm_target_from_posterior( model, ..., parameter_names = NULL, log_normalizer = NA_real_, name = NULL )gmm_target_from_posterior(model, ...) ## Default S3 method: gmm_target_from_posterior(model, ...) ## S3 method for class ''function'' gmm_target_from_posterior( model, ..., parameter_names = NULL, log_normalizer = NA_real_, name = NULL )
model |
One of:
|
... |
Forwarded to method-specific implementations. |
parameter_names |
Character vector of parameter names. Required
for the |
log_normalizer |
Numeric scalar |
name |
Optional human-readable target name. Defaults to
|
The contract for the underlying callable is:
Vectorised: accepts a numeric matrix with rows indexing
independent parameter draws and columns indexing parameters; returns
a length-nrow(theta) numeric vector of log p(theta | data) + const.
Unnormalised is fine: the marginal likelihood log Z is not
required. Where the source package can supply it, pass
log_normalizer.
Side-effect free: no plotting, no mutable state. Pure function.
Domain-safe: returns -Inf outside support rather than raising
an error.
The default method errors with a hint pointing the user at either
(a) a Bayesian package that registers a method, or (b) the
function method below.
A gmm_target with normalised = FALSE and the user-supplied
log_normalizer (or NA_real_).
Other v0_2:
from_kde()
# A trivial unnormalised log-posterior: a 2D banana centred near (1, 0). log_post <- function(theta) { x <- theta[, 1L] y <- theta[, 2L] -0.5 * (x^2 + (y - 0.1 * x^2 + 1)^2) } tgt <- gmm_target_from_posterior( log_post, parameter_names = c("x", "y") ) tgt# A trivial unnormalised log-posterior: a 2D banana centred near (1, 0). log_post <- function(theta) { x <- theta[, 1L] y <- theta[, 2L] -0.5 * (x^2 + (y - 0.1 * x^2 + 1)^2) } tgt <- gmm_target_from_posterior( log_post, parameter_names = c("x", "y") ) tgt
Wraps a numeric matrix of i.i.d. samples as a gmm_target. The resulting
target carries no log_density, so it can only feed regimes "moment"
(via empirical moments) and "sample" (classical EM).
gmm_target_from_samples(samples, name = "target_from_samples")gmm_target_from_samples(samples, name = "target_from_samples")
samples |
An |
name |
Optional human-readable name. Defaults to
|
A gmm_target object.
Other targets:
banana_target(),
donut_target(),
mixture_target()
x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) tgtx <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) tgt
Estimates the squared Hellinger distance H^2(f, g) = 1 - integral sqrt(f(x) g(x)) dx by importance sampling against the proposal stored
in the fit (for regime "kld") or by sampling from the fit itself (for
regime "sample"). The target's log_density must be supplied and
normalised; otherwise the Monte Carlo integral is biased by the
missing . When the target's normalised property is
not TRUE, a warning is issued and the returned value is flagged.
hellinger_mc(fit, n_mc = 5000L, seed = NULL)hellinger_mc(fit, n_mc = 5000L, seed = NULL)
fit |
A gmm_fit whose target carries a |
n_mc |
Number of Monte Carlo samples. |
seed |
Optional integer seed. |
A list with components
h2 - estimate of H^2(f, g),
se - Monte Carlo standard error,
n_mc - sample size used.
Other diagnostics:
bic_aic(),
ess_summary(),
ess_trace(),
kld_trace()
fit <- fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 2000L, max_iter = 25L, seed = 1L) hellinger_mc(fit, n_mc = 1000L, seed = 1L)fit <- fit_proxymix(banana_target(), N = 3L, regime = "kld", is_size = 2000L, max_iter = 25L, seed = 1L) hellinger_mc(fit, n_mc = 1000L, seed = 1L)
Runs stats::kmeans() on the supplied samples and uses the resulting
cluster centres and within-cluster covariances to seed an EM-style
fitter.
init_kmeans(samples, N = 2L, ridge_eps = 1e-06, nstart = 10L)init_kmeans(samples, N = 2L, ridge_eps = 1e-06, nstart = 10L)
samples |
An |
N |
Number of components. |
ridge_eps |
Ridge added to each cluster covariance for numerical stability when a cluster has fewer than two points. |
nstart |
|
A gmm of N components in dimension ncol(samples).
Other init:
init_moment_seed(),
init_random(),
init_warm_start(),
multi_start_best_of()
x <- matrix(stats::rnorm(200), ncol = 2) init_kmeans(x, N = 3L)x <- matrix(stats::rnorm(200), ncol = 2) init_kmeans(x, N = 3L)
Computes the global mean and covariance of the supplied samples and
spreads N components along the leading principal direction. Useful as
a deterministic starting point that survives multi-modal targets better
than a single-Gaussian fit.
init_moment_seed(samples, N = 2L, spread = 1.5)init_moment_seed(samples, N = 2L, spread = 1.5)
samples |
An |
N |
Number of components. |
spread |
Multiplier on the principal-direction standard deviation used to place the component means symmetrically about the global mean. |
A gmm of N components in dimension ncol(samples).
Other init:
init_kmeans(),
init_random(),
init_warm_start(),
multi_start_best_of()
x <- matrix(stats::rnorm(200), ncol = 2) init_moment_seed(x, N = 3L)x <- matrix(stats::rnorm(200), ncol = 2) init_moment_seed(x, N = 3L)
Generates an N-component random initialisation by perturbing isotropic
means around a centre. Useful as one starting point in a multi-start
best-of strategy.
init_random( N = 1L, p = 2L, centre = rep(0, p), scale = 1, sigma_diag = 1, seed = NULL )init_random( N = 1L, p = 2L, centre = rep(0, p), scale = 1, sigma_diag = 1, seed = NULL )
N |
Number of components. |
p |
Ambient dimension. |
centre |
Length- |
scale |
Standard deviation of the mean perturbation. |
sigma_diag |
Diagonal value used for the initial component covariances. |
seed |
Optional integer seed. |
A gmm of N components in dimension p.
Other init:
init_kmeans(),
init_moment_seed(),
init_warm_start(),
multi_start_best_of()
init_random(N = 3L, p = 2L, seed = 1L)init_random(N = 3L, p = 2L, seed = 1L)
Returns the input as-is. Provided as a name so that the multi-start driver can include "warm starts" by symbolic name.
init_warm_start(g)init_warm_start(g)
g |
The input g, validated.
Other init:
init_kmeans(),
init_moment_seed(),
init_random(),
multi_start_best_of()
g <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2))) init_warm_start(g)g <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2))) init_warm_start(g)
Builds an is_proposal using a multivariate-normal N(mean, cov)
density and sampler.
is_mvn(n_dim, mean = rep(0, n_dim), cov = diag(n_dim))is_mvn(n_dim, mean = rep(0, n_dim), cov = diag(n_dim))
n_dim |
Ambient dimension |
mean |
Length- |
cov |
A |
An is_proposal object.
Other proposals:
is_mvt(),
is_uniform()
q <- is_mvn(n_dim = 2L, mean = c(0, 0), cov = 4 * diag(2)) qq <- is_mvn(n_dim = 2L, mean = c(0, 0), cov = 4 * diag(2)) q
Builds an is_proposal using a multivariate-Student-t density and
sampler with df degrees of freedom, location mean, and scale matrix
sigma. Heavier tails than is_mvn(), so often a safer importance
proposal at moderate dimensions.
is_mvt(n_dim, mean = rep(0, n_dim), sigma = diag(n_dim), df = 5)is_mvt(n_dim, mean = rep(0, n_dim), sigma = diag(n_dim), df = 5)
n_dim |
Ambient dimension |
mean |
Length- |
sigma |
A |
df |
Degrees of freedom ( |
An is_proposal object.
Other proposals:
is_mvn(),
is_uniform()
q <- is_mvt(n_dim = 2L, df = 5) qq <- is_mvt(n_dim = 2L, df = 5) q
An is_proposal packages a sampler with its corresponding log-density.
Pass one to fit_kld_em() (or fit_proxymix() with regime = "kld")
to plug an alternative proposal into the regime-(iii) loop.
is_proposal( n_dim = integer(0), sample = NULL, log_density = NULL, name = "is_proposal", metadata = list() )is_proposal( n_dim = integer(0), sample = NULL, log_density = NULL, name = "is_proposal", metadata = list() )
n_dim |
Integer scalar — the ambient dimension |
sample |
Function: |
log_density |
Function: |
name |
Human-readable name. |
metadata |
Optional list of additional descriptors. |
An S7 object of class is_proposal.
Other classes:
autoplot.gmm_fit,
gmm(),
gmm_dim(),
gmm_fit(),
gmm_n_components(),
gmm_target()
q <- is_mvn(n_dim = 2L, mean = c(0, 0), cov = diag(2)) qq <- is_mvn(n_dim = 2L, mean = c(0, 0), cov = diag(2)) q
Builds an is_proposal that samples uniformly on the hyperrectangle
and reports the (constant) log-density on that box. Outside the box the
log-density is -Inf.
is_uniform(n_dim, lower = -1, upper = 1)is_uniform(n_dim, lower = -1, upper = 1)
n_dim |
Ambient dimension |
lower |
Length- |
upper |
Length- |
An is_proposal object.
Other proposals:
is_mvn(),
is_mvt()
q <- is_uniform(n_dim = 2L, lower = -5, upper = 5) q q@sample(3L)q <- is_uniform(n_dim = 2L, lower = -5, upper = 5) q q@sample(3L)
Returns the per-iteration estimate of KL(f || g_theta) produced during
a regime-(iii) fit, or NA for regimes that do not estimate the KLD
internally.
kld_trace(fit)kld_trace(fit)
fit |
A gmm_fit. |
Numeric vector (or NA_real_).
Other diagnostics:
bic_aic(),
ess_summary(),
ess_trace(),
hellinger_mc()
fit <- fit_proxymix(banana_target(), N = 2L, regime = "kld", is_size = 1000L, max_iter = 15L, seed = 1L) kld_trace(fit)fit <- fit_proxymix(banana_target(), N = 2L, regime = "kld", is_size = 1000L, max_iter = 15L, seed = 1L) kld_trace(fit)
A toy three-component planar mixture target where everything is known
exactly: the log_density matches the GMM density formula and the
attached samples are drawn from that same mixture. Useful for sanity-
checking the three fitting regimes against ground truth.
mixture_target(with_samples = FALSE, n = 2000L, seed = 1L)mixture_target(with_samples = FALSE, n = 2000L, seed = 1L)
with_samples |
If |
n |
Number of samples to attach when |
seed |
Optional integer seed used when drawing the samples. |
A gmm_target in dimension 2.
Other targets:
banana_target(),
donut_target(),
gmm_target_from_samples()
m <- mixture_target(with_samples = TRUE, n = 100L) mm <- mixture_target(with_samples = TRUE, n = 100L) m
Runs the supplied fitter from each of several initialisations and returns the fit with the best score, following Karlis and Xekalaki (2003)'s recommendation.
multi_start_best_of(fit_fn, inits, score_fn, ...)multi_start_best_of(fit_fn, inits, score_fn, ...)
fit_fn |
A function with signature |
inits |
A list of gmm initialisations. |
score_fn |
A function |
... |
Additional arguments forwarded to |
The gmm_fit with the largest score_fn(fit).
Other init:
init_kmeans(),
init_moment_seed(),
init_random(),
init_warm_start()
x <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) inits <- list(init_random(2L, 2L, seed = 1L), init_moment_seed(x, N = 2L)) best <- multi_start_best_of( fit_fn = function(init, ...) fit_em_samples(tgt, init = init, ...), inits = inits, score_fn = function(fit) fit@diagnostics$loglik_final, max_iter = 25L ) best@diagnostics$loglik_finalx <- matrix(stats::rnorm(200), ncol = 2) tgt <- gmm_target_from_samples(x) inits <- list(init_random(2L, 2L, seed = 1L), init_moment_seed(x, N = 2L)) best <- multi_start_best_of( fit_fn = function(init, ...) fit_em_samples(tgt, init = init, ...), inits = inits, score_fn = function(fit) fit@diagnostics$loglik_final, max_iter = 25L ) best@diagnostics$loglik_final
Draws n independent samples from a Gaussian mixture.
rgmm(n, g)rgmm(n, g)
n |
Number of samples (positive integer scalar). |
g |
A numeric matrix of dimension n by p.
Other ops:
dgmm(),
gmm_canonicalise(),
gmm_conditionalise(),
gmm_kld(),
gmm_marginalise()
g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) x <- rgmm(50L, g) dim(x)g <- gmm(weights = c(0.5, 0.5), means = list(c(-1, 0), c(1, 0)), covariances = list(diag(2), diag(2))) x <- rgmm(50L, g) dim(x)
Tier-2 stub — signature stable; body not yet implemented.
to_apsim_scenarios(fit, n = 100L, schema = list(), ...)to_apsim_scenarios(fit, n = 100L, schema = list(), ...)
fit |
A gmm_fit. |
n |
Number of scenarios to generate. |
schema |
A list mapping mixture coordinates to APSIM variable names and units. |
... |
Future configuration arguments. |
Plan: convert samples from a fitted gmm_fit into the tabular format consumed by an APSIM scenario runner, optionally honouring a user-supplied schema mapping mixture coordinates to APSIM variable names and units.
A data frame in APSIM scenario format (when implemented).
Other stubs:
fit_kld_em_collider(),
from_aggregate_likelihood(),
from_simulator()
x <- matrix(stats::rnorm(200), ncol = 2) fit <- fit_proxymix(gmm_target_from_samples(x), N = 2L, max_iter = 10L) try(to_apsim_scenarios(fit, n = 100L, schema = list()))x <- matrix(stats::rnorm(200), ncol = 2) fit <- fit_proxymix(gmm_target_from_samples(x), N = 2L, max_iter = 10L) try(to_apsim_scenarios(fit, n = 100L, schema = list()))