--- title: "Entropy diagnostics with proxymix" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Entropy diagnostics with proxymix} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") set.seed(20260618) ``` ```{r setup} library(proxymix) ``` `proxymix` fits Kullback--Leibler-optimal Gaussian-mixture proxies, so relative entropy is already at the core of the package. This vignette surfaces the entropy structure of a *fitted* mixture as closed-form diagnostics: the quadratic (order-2) Renyi entropy of a mixture, the Cauchy--Schwarz divergence between two mixtures, and a Monte-Carlo Shannon entropy bracketed by an analytic bound. ## The closed-form spine The integral of a product of two Gaussian densities is itself a Gaussian density evaluation: writing `N(x; m, S)` for a Gaussian density, the integral over `x` of `N(x; a, A) N(x; b, B)` equals `N(a; b, A + B)`. Any quantity that is *quadratic* in a mixture -- the integral of `g(x)^2`, or of `p(x) q(x)` -- is therefore a finite sum of such evaluations, and closed-form. Shannon entropy carries the logarithm of a sum and has no such closed form, so it is estimated by Monte Carlo. ## Renyi-2 entropy `gmm_entropy()` returns the closed-form order-2 Renyi entropy by default. ```{r} g <- gmm(weights = c(0.5, 0.5), means = list(c(-2, 0), c(2, 0)), covariances = list(diag(2), diag(2))) gmm_entropy(g) ``` For a single Gaussian the value is exact and matches the analytic form `H2(N(mu, S)) = (p / 2) log(4 pi) + (1 / 2) log det(S)`: ```{r} S <- matrix(c(1, 0.3, 0.3, 1), 2, 2) one <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(S)) c(closed_form = gmm_entropy(one), analytic = 0.5 * (2 * log(4 * pi) + as.numeric(determinant(S, logarithm = TRUE)$modulus))) ``` ## Shannon entropy, bracketed Shannon entropy is returned as a Monte-Carlo estimate with its standard error and an analytic upper bound that brackets it from above. ```{r} gmm_entropy(g, order = "shannon", n_mc = 5000L, seed = 1L) ``` The estimate sits at or below the bound up to Monte-Carlo error, which is the coherence check the bound is there to provide. ## Cauchy--Schwarz divergence `gmm_divergence()` returns the closed-form, symmetric Cauchy--Schwarz divergence by default. It is non-negative and zero exactly when the two mixtures are proportional. ```{r} q <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(diag(2) * 2)) gmm_divergence(g, q) # closed-form, symmetric gmm_divergence(g, g) # zero ``` When the asymmetric Kullback--Leibler divergence is wanted instead, `type = "kl"` delegates to the package's existing Monte-Carlo estimator `gmm_kld()`. ```{r} gmm_divergence(g, q, type = "kl", n_mc = 2000L)$mc ``` ## Mutual information and predictive entropy Two further reads come from the same closed-form spine. `gmm_mutual_information()` measures the dependence between two coordinate blocks of a fitted joint as the Cauchy--Schwarz divergence between the joint and the product of the marginals; it is non-negative and zero exactly when the blocks are independent. ```{r} s <- matrix(c(1, 0.7, 0.7, 1), 2, 2) joint <- gmm(weights = 1, means = list(c(0, 0)), covariances = list(s)) gmm_mutual_information(joint, 1L, 2L) ``` `gmm_conditional_entropy()` returns the predictive uncertainty of the target coordinates given the conditioned ones -- the order-2 Renyi entropy of the conditional mixture from `gmm_conditionalise()`, evaluated row by row, with `NA` marking the target coordinate. ```{r} gmm_conditional_entropy(joint, given = rbind(c(NA, 0), c(NA, 1), c(NA, 2))) ``` ## When to use which The Renyi-2 entropy and the Cauchy--Schwarz divergence are exact and cheap (`O(K^2)` Gaussian evaluations), and are the recommended defaults for reporting the spread of a fitted mixture and the discrepancy between two fits. The Shannon entropy and the Kullback--Leibler divergence are the more familiar quantities but are Monte-Carlo estimates; reach for them when a downstream consumer requires those specific definitions. ## Deterministic annealing and phase transitions The same statistical-mechanics framing -- the fitting objective is a variational free energy $F = \langle E \rangle - T H$ -- yields a robustness tool. Softening the E-step responsibilities by a temperature $T$, so that $\gamma_{ik} \propto \pi_k\, \mathcal{N}(x_i;\ \mu_k, \Sigma_k)^{1/T}$, and cooling $T$ from a high value toward one, turns the fit into a homotopy: at high $T$ the objective has a single smooth basin, and as $T$ falls the components bifurcate at critical temperatures. Setting `anneal = TRUE` on `fit_em_samples()` or `fit_kld_em()` uses this annealed path as a warm-start for the unchanged cold EM loop, which is far less sensitive to local optima than a cold multi-start. ```{r} x <- rbind( matrix(rnorm(200), ncol = 2) + matrix(rep(c(-7, -7), each = 100), ncol = 2), matrix(rnorm(200), ncol = 2) + matrix(rep(c( 7, -7), each = 100), ncol = 2), matrix(rnorm(200), ncol = 2) + matrix(rep(c( 0, 8), each = 100), ncol = 2) ) tgt <- gmm_target_from_samples(x) fit <- fit_em_samples(tgt, N = 3L, anneal = TRUE, seed = 1L) fit@diagnostics$annealed ``` Cooling also discovers the component count. `gmm_anneal_path()` tracks the number of distinct centroids as the temperature falls; the count occupying the widest temperature range is the discovered number of components, and the first bifurcation has the closed-form critical temperature $T_c = \lambda_{\max}(\Sigma^{-1} C)$, returned as `t_critical_analytic` for an independent check. ```{r} path <- gmm_anneal_path(x, k_max = 6L, n_steps = 60L, seed = 1L) path$k_selected c(empirical = path$first_critical_temperature, analytic = path$t_critical_analytic) ``` ## Maximum-entropy targets Entropy also points the other way -- from a set of constraints to the least-committal density consistent with them. `maxent_target()` builds that maximum-entropy density: the Gaussian under first- and second-moment constraints on the full space, the uniform under a support constraint alone, and a truncated Gaussian under second moments on a box. The bounded cases declare their support, so regime (iii) fits them under an automatically selected support-matched proposal. ```{r} g <- maxent_target(moments = list(mean = c(0, 0), cov = diag(2))) g@metadata$family u <- maxent_target(support = list(lower = c(0, 0), upper = c(1, 1))) exp(u@log_density(matrix(c(0.5, 0.5), nrow = 1L))) # 1 / area = 1 ``` ## Selecting a component count with ICL `bic_aic()` reports the integrated completed likelihood alongside the BIC and AIC. The ICL adds to the BIC twice the entropy of the fitted classification, $\mathrm{ICL} = \mathrm{BIC} + 2 E_N$ with $E_N = -\sum_{i,k} \gamma_{ik} \log \gamma_{ik} \ge 0$, so it penalises mixtures whose components overlap and favours well-separated solutions. It equals the BIC for a single component and is never smaller. ```{r} xs <- rbind(matrix(rnorm(200, -4), ncol = 2), matrix(rnorm(200, 4), ncol = 2)) fit <- fit_em_samples(gmm_target_from_samples(xs), N = 2L) unlist(bic_aic(fit)[c("bic", "icl", "classification_entropy")]) ``` ## Dependency structure of a mixture `gmm_independence_graph()` reads the undirected second-order conditional-independence (Gaussian graphical model) structure of a fitted mixture: an edge $i - j$ is absent exactly when the partial correlation of coordinates $i$ and $j$, given all the others, is below a threshold -- the Markov statement $x_i \perp x_j \mid x_{\mathrm{rest}}$ at second order. The overall covariance of a mixture is closed-form in its parameters, so no sampling is needed. ```{r} omega <- diag(4) # a tridiagonal precision = a chain for (i in 1:3) omega[i, i + 1L] <- omega[i + 1L, i] <- -0.5 g <- gmm(weights = 1, means = list(rep(0, 4)), covariances = list(solve(omega))) gmm_independence_graph(g) # recovers x1 - x2 - x3 - x4 ``` The distinctive use is **regime (iii)**. Composed with `fit_kld_em()`, the diagnostic recovers the dependency structure of a target you can only *evaluate* -- an unnormalised energy density -- where no sample exists to drive a conventional structure estimator. The continuous field below couples only neighbouring coordinates; the recovered graph is the chain. ```{r} energy <- function(X) { X <- matrix(X, ncol = 3) rowSums((X^2 - 1)^2) - 0.7 * (X[, 1] * X[, 2] + X[, 2] * X[, 3]) } target <- gmm_target(n_dim = 3L, log_density = function(X) -energy(X)) g <- fit_kld_em(target, N = 8L, proposal = is_uniform(3L, -3, 3), is_size = 6000L, anneal = TRUE, seed = 1L, support_warn = FALSE) gmm_independence_graph(g) ``` This is a graphical-model diagnostic, not a causal-discovery method: it returns the undirected Markov skeleton, not edge directions, and being second order it sees only dependence that enters the covariance.