kernR::estimate_density_ratio() is the entry point for
every kernR estimator that needs to reweight observational samples to an
interventional distribution — most notably bd_hsic_test()
for the backdoor-HSIC causal test. The function offers four
density-ratio backends behind a single signature:
| Backend | Family | When to use |
|---|---|---|
logistic |
Noise-contrastive classifier | Default; robust on smooth, unimodal densities |
ranger |
Random-forest classifier | Flexible non-linear NCE; needs ranger |
xgboost |
Gradient-boosted classifier | Strong on tabular interactions; needs xgboost |
proxymix |
Gaussian-mixture density ratio | Multimodal/skewed densities; parametric alternative |
The proxymix backend is the cross-package wedge between
kernR (the distributional verdict layer of the UQ ag stack) and proxymix
(the Gaussian-mixture proxy / KL-density-ratio bridge, Hoek &
Elliott 2024). It fits one GMM to the joint sample cloud
(x, z), a second GMM to the product-of-marginals cloud
(x_perm, z), then evaluates the analytic ratio of the two
mixture densities at each observation. No classifier calibration step —
the ratio is closed-form in the fitted parameters.
A toy confounded design: z is a 2-D Gaussian confounder;
x is a linear-Gaussian function of z;
y carries a real causal effect from x plus a
confounded path through z.
suppressPackageStartupMessages(library(kernR))
set.seed(2026L)
n <- 200L
z <- matrix(rnorm(n * 2L), n, 2L)
x <- z[, 1L] + rnorm(n, sd = 0.5)
y <- 0.7 * x + z[, 2L] + rnorm(n, sd = 0.4)Each backend produces a density_ratio_fit object
exposing ESS and a weight vector. We tabulate them side-by-side.
dr_logistic <- estimate_density_ratio(x, z, method = "logistic", seed = 1L)
dr_ranger <- if (requireNamespace("ranger", quietly = TRUE)) {
estimate_density_ratio(x, z, method = "ranger", seed = 1L)
} else NULL
dr_xgb <- if (requireNamespace("xgboost", quietly = TRUE)) {
estimate_density_ratio(x, z, method = "xgboost", seed = 1L)
} else NULLdr_proxymix <- estimate_density_ratio(
x, z,
method = "proxymix",
proxymix_components = 2L,
seed = 1L
)| backend | ess | min_weight | max_weight |
|---|---|---|---|
| logistic | 199.4 | 0.8860 | 1.20 |
| ranger | 156.0 | 0.2380 | 3.21 |
| xgboost | 81.3 | 0.0918 | 7.08 |
| proxymix | 14.3 | 0.0171 | 47.00 |
res_logistic <- bd_hsic_test(
x, y, z, density_ratio = "logistic",
n_permutations = 199L, seed = 1L
)res_proxymix <- bd_hsic_test(
x, y, z, density_ratio = "proxymix",
n_permutations = 199L, seed = 1L
)
#> Warning: bd_hsic_test(): ESS (3.4) is below 10% of n_test (100). The weighted
#> test statistic is dominated by a small number of high-weight observations; the
#> resulting p-value is not a reliable verdict. Increase n, switch density_ratio
#> backend, or tighten the design.| backend | statistic | p_value | ess |
|---|---|---|---|
| logistic | 0.0211 | 0.155 | 99.9 |
| proxymix | 0.0062 | 0.320 | 3.4 |
The classifier-based backends (logistic,
ranger, xgboost) are the default for a reason:
they tolerate misspecified densities, scale to high-dimensional
z, and have well-understood calibration. Reach for
method = "proxymix" when:
proxymix::gmm_target_from_posterior()), or use as the seed
of a KLD-EM refinement on a target you can evaluate but not sample
from;n, sharp class imbalance after the joint-vs-marginal split,
pathological feature scaling).The proxymix package is GRDC-firewalled (MIT, no GRDC IP
flows in) and ships its full Gaussian-mixture proxy API independently of
kernR. kernR consumes it as a soft dependency via
requireNamespace() — the binding is one-way and rebuildable
from the local proxymix_*.tar.gz source.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] PESTO_0.6.0.9000 kernR_0.7.0.9000 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.3 cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [5] xfun_0.58 S7_0.2.2 jsonlite_2.0.0 data.table_1.18.4
#> [9] glue_1.8.1 buildtools_1.0.0 ranger_0.18.0 proxymix_0.3.0
#> [13] htmltools_0.5.9 maketools_1.3.2 sys_3.4.3 sass_0.4.10
#> [17] scales_1.4.0 grid_4.6.0 evaluate_1.0.5 jquerylib_0.1.4
#> [21] fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5 compiler_4.6.0
#> [25] mvnfast_0.2.8 RColorBrewer_1.1-3 Rcpp_1.1.1-1.1 lattice_0.22-9
#> [29] farver_2.1.2 digest_0.6.39 xgboost_3.2.1.1 R6_2.6.1
#> [33] Matrix_1.7-5 bslib_0.11.0 withr_3.0.2 tools_4.6.0
#> [37] gtable_0.3.6 ggplot2_4.0.3 cachem_1.1.0