| Title: | Kernel-Based Causal Distributional Testing |
|---|---|
| Description: | Kernel-based hypothesis tests for causal inference and distributional treatment effects. Implements backdoor-adjusted HSIC (bd-HSIC) for testing causal association, and doubly robust kernel statistics (DR-DATE, DR-DETT) for testing distributional treatment effects beyond mean shifts. Supports binary, continuous, and mixed treatments, hierarchical/nested data structures, and scales to large datasets via Nystrom approximation and random Fourier features. This package depends on 'PESTO' (GPL (>= 3)); kernR's own sources are released under the MIT licence, but the installed combination with 'PESTO' is a combined work subject to the terms of the GPL (>= 3). |
| Authors: | Max Moldovan [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-9680-8474>) |
| Maintainer: | Max Moldovan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.7.0.9000 |
| Built: | 2026-06-03 03:22:51 UTC |
| Source: | https://github.com/max578/kernR |
Inverts a known aggregation operator Y = T(X) + eps to recover the
fine-scale latent X from coarse / aggregate observations Y, using
a Gaussian-mixture prior on the latent. Implements the
aggregate-likelihood / kernel-downsizing framework (Sejdinovic et
al.) as a kernR-side method consuming an optional
proxymix::fit_proxymix() latent prior.
aggregate_downscale( y, aggregator, latent_prior, sigma_y = 0.1, n_samples_per_component = 200L, min_ess_fraction = 0.1, seed = NULL )aggregate_downscale( y, aggregator, latent_prior, sigma_y = 0.1, n_samples_per_component = 200L, min_ess_fraction = 0.1, seed = NULL )
y |
Numeric vector of length |
aggregator |
Either a numeric |
latent_prior |
Either (a) a list with elements |
sigma_y |
Numeric. Observation noise standard deviation
(scalar; |
n_samples_per_component |
Integer. Importance-sampling sample
count per prior component (non-linear path only). Default |
min_ess_fraction |
Numeric in |
seed |
Integer or |
This is the third downscaling method in kernR, structurally different from the existing pair:
kernel_downscale() (CME, Park-Muandet-Fukumizu-Sejdinovic 2013) –
paired (coarse, fine) training data; supervised regression.
dist_regression() (Szabo-Sriperumbudur-Poczos-Gretton 2016) –
distribution-to-distribution mapping from a bag-of-points design.
aggregate_downscale() (this function) – only aggregate
observations + known aggregator + parametric latent prior. Used when
no paired training data exists (only coarse-grid observations) but
the aggregation operator is known (spatial averaging, temporal
averaging, linear or non-linear projection).
Two computational paths, selected on aggregator's class:
Linear-Gaussian closed form (when aggregator is a matrix A):
each prior component's posterior is a Kalman update;
K_k = Sigma_k A^T (A Sigma_k A^T + sigma_y^2 I)^{-1};
mu_k|y = mu_k + K_k (y - A mu_k),
Sigma_k|y = (I - K_k A) Sigma_k;
posterior mixture weights reweight by per-component evidence
N(y | A mu_k, A Sigma_k A^T + sigma_y^2 I).
Non-linear importance sampling (when aggregator is a
function): draw n_samples_per_component samples from each prior
component, evaluate T(.), weight by Gaussian likelihood
N(y | T(x), sigma_y^2 I), recover posterior moments + reweighted
mixture weights from the importance-weighted samples. Reports
per-component effective sample size; warns when below a stated
floor.
The aggregate-likelihood / GMM-proxy direction is shared with the
companion proxymix Tier-2 stub proxymix::from_aggregate_likelihood(),
which targets the same problem from the prior-fitting side; this
function targets it from the consumption side (inversion given a
fitted prior).
An object of class "aggregate_downscale" with components:
posterior_mean – length-dim_x numeric, E[X | y].
posterior_cov – dim_x x dim_x matrix, Cov[X | y]
(law-of-total-covariance over mixture components).
posterior_weights – length-N numeric, posterior mixture
weights (sum to 1).
posterior_components_means – list of N length-dim_x
posterior component means.
posterior_components_covariances – list of N posterior
component covariances.
aggregator_type – "linear" or "nonlinear".
method – "linear_closed_form" or "nonlinear_is".
ess_per_component – length-N per-component ESS (IS path
only; NA for closed form).
ess_warning – TRUE if any per-component ESS fell below the
floor (IS path only).
n_components, sigma_y, n_samples_per_component, call.
kernel_downscale(), dist_regression().
Other downscaling and embeddings:
dist_regression(),
fit_cme(),
kernel_downscale(),
posterior_sample_aggregate()
set.seed(1L) # Linear-Gaussian: spatial averaging of two adjacent cells. A <- matrix(c(0.5, 0.5), nrow = 1L) prior <- list( means = list(c(0, 0), c(2, 2)), covariances = list(diag(2L), diag(2L)), weights = c(0.5, 0.5) ) fit <- aggregate_downscale(y = 1.0, aggregator = A, latent_prior = prior, sigma_y = 0.2) fit$posterior_mean # Non-linear: aggregator is sin of the sum. agg_fn <- function(x) matrix(sin(rowSums(x)), ncol = 1L) fit2 <- aggregate_downscale(y = 0.5, aggregator = agg_fn, latent_prior = prior, sigma_y = 0.1, n_samples_per_component = 300L, seed = 1L) fit2$posterior_meanset.seed(1L) # Linear-Gaussian: spatial averaging of two adjacent cells. A <- matrix(c(0.5, 0.5), nrow = 1L) prior <- list( means = list(c(0, 0), c(2, 2)), covariances = list(diag(2L), diag(2L)), weights = c(0.5, 0.5) ) fit <- aggregate_downscale(y = 1.0, aggregator = A, latent_prior = prior, sigma_y = 0.2) fit$posterior_mean # Non-linear: aggregator is sin of the sum. agg_fn <- function(x) matrix(sin(rowSums(x)), ncol = 1L) fit2 <- aggregate_downscale(y = 0.5, aggregator = agg_fn, latent_prior = prior, sigma_y = 0.1, n_samples_per_component = 300L, seed = 1L) fit2$posterior_mean
Diagnoses overlap (positivity) between treated and control groups by summarising the propensity score distributions.
assess_overlap(propensity, treatment = NULL)assess_overlap(propensity, treatment = NULL)
propensity |
A |
treatment |
Binary treatment vector. Required if |
A list of class "overlap_diagnostic" with:
Summary statistics of propensity scores for treated.
Summary statistics for controls.
Logical. TRUE if overlap is poor.
Other density ratio and propensity:
effective_sample_size(),
estimate_density_ratio(),
estimate_propensity(),
fit_density_ratio(),
plot_weights(),
predict_density_ratio()
set.seed(1L) n <- 200L treatment <- rbinom(n, 1L, 0.5) scores <- plogis(rnorm(n) + 0.6 * treatment) assess_overlap(scores, treatment)set.seed(1L) n <- 200L treatment <- rbinom(n, 1L, 0.5) scores <- plogis(rnorm(n) + 0.6 * treatment) assess_overlap(scores, treatment)
Tests the do-null hypothesis H_0: p(y | do(x)) = p*(y) using a kernel-based test with backdoor adjustment via density ratio estimation. Detects causal associations including non-linear effects that standard linear methods miss.
bd_hsic_test( x, y, z, kernel_x = kernel_spec(), kernel_y = kernel_spec(), density_ratio = c("logistic", "ranger", "xgboost", "proxymix", "rulsif"), n_permutations = 500L, n_clusters = "auto", split_ratio = 0.5, alpha = 0.05, seed = NULL, verbose = FALSE, cluster_id = NULL, permutation = c("auto", "within_cluster", "naive"), min_ess_fraction = 0.1 )bd_hsic_test( x, y, z, kernel_x = kernel_spec(), kernel_y = kernel_spec(), density_ratio = c("logistic", "ranger", "xgboost", "proxymix", "rulsif"), n_permutations = 500L, n_clusters = "auto", split_ratio = 0.5, alpha = 0.05, seed = NULL, verbose = FALSE, cluster_id = NULL, permutation = c("auto", "within_cluster", "naive"), min_ess_fraction = 0.1 )
x |
Numeric vector or matrix. Treatment variable. |
y |
Numeric vector or matrix. Outcome variable. |
z |
Numeric matrix, data.frame, or data.table. Confounders. |
kernel_x |
Kernel specification for treatment space. Default is RBF with median heuristic. |
kernel_y |
Kernel specification for outcome space. Default is RBF with median heuristic. |
density_ratio |
Character. Method for density ratio estimation:
|
n_permutations |
Integer. Number of permutations for the null distribution. Default is 500. |
n_clusters |
Integer or |
split_ratio |
Numeric in (0, 1). Proportion of data for training the density ratio estimator. Default is 0.5. |
alpha |
Numeric. Significance level. Default is 0.05. |
seed |
Integer or |
verbose |
Logical. Print progress. Default is |
cluster_id |
Optional vector of length |
permutation |
Character. Permutation scheme:
|
min_ess_fraction |
Numeric in |
The bd-HSIC test (Hu, Sejdinovic & Evans, 2024) tests whether treatment X has a causal effect on outcome Y after adjusting for confounders Z via the backdoor criterion.
The test works by:
Estimating density ratios w(x, z) = p*(x) / p(x|z) to reweight observational samples to the interventional distribution.
Computing a weighted HSIC statistic between X and Y.
Obtaining p-values via permutation of Y within exchangeability
clusters – propensity-similarity clusters by default, or external
design clusters (site / season / paddock) when cluster_id is
supplied.
Hierarchical extension. When the design is naturally clustered
(multi-site agricultural trials, paddock x season factorial designs,
patient x hospital data), supplying cluster_id activates
within-cluster permutation: indices of y are reshuffled only within
each cluster, preserving cluster-level effects in the null. This is
the safer default for clustered data; naive permutation across
clusters can inflate Type I error when cluster effects exist.
Unlike PDS or Double ML, bd-HSIC can detect non-linear causal effects (e.g., U-shaped relationships) where the treatment affects higher moments of the outcome but not necessarily the mean.
An object of class "kernel_test_result". When cluster_id
is supplied, the result additionally carries:
Character: which scheme was used.
Integer cluster assignment on the test split.
Character cluster labels.
Per-cluster weighted HSIC (stratified
contributions); NA for clusters with < 2 test observations.
The density-ratio estimator is now fit on the train split and
predicted on the held-out test split via fit_density_ratio() +
predict_density_ratio(). The documented split_ratio is honoured
end-to-end for all four classifier / proxymix backends. The 0.0.0.9013
sample-split leak warning is therefore retired. RuLSIF, the
kernel-based closed-form backend, still uses estimate_rulsif() on
the train/test split natively.
The fitted density-ratio model is preserved on
result$density_ratio_fit for callers that want backend
diagnostics (see ?fit_density_ratio Value; proxymix exposes BIC,
AIC, log-likelihood, convergence per GMM).
Hu, R., Sejdinovic, D., & Evans, R. J. (2024). A kernel test for causal association via noise contrastive backdoor adjustment. JMLR, 25(160), 1-56.
Other causal association tests:
hierarchical_test(),
kernel_causal_test()
set.seed(42) n <- 300 z <- matrix(rnorm(n * 2), n, 2) x <- z[, 1] + rnorm(n) y <- 0.5 * x + z[, 2] + rnorm(n, sd = 0.5) result <- bd_hsic_test(x, y, z, n_permutations = 200, seed = 1) print(result)set.seed(42) n <- 300 z <- matrix(rnorm(n * 2), n, 2) x <- z[, 1] + rnorm(n) y <- 0.5 * x + z[, 2] + rnorm(n, sd = 0.5) result <- bd_hsic_test(x, y, z, n_permutations = 200, seed = 1) print(result)
Tests whether two or more samples come from a common distribution, using the
summed pairwise Maximum Mean Discrepancy with a joint-permutation null. The
samples are typically posterior draws from different inference engines, or
scenario ensembles from different simulators; the test asks whether they are
mutually concordant. Unlike repeated two-sample mmd_test() calls, the null
is a single shared relabeling of the pooled sample, so the family-wise error
is controlled and the overall verdict is one calibrated p-value.
concordance_test( x, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL, n_exact_max = 5000L )concordance_test( x, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL, n_exact_max = 5000L )
x |
A list of two or more samples to compare. Each element is a numeric
vector, matrix, or data.frame with |
kernel |
Kernel specification. Default is RBF with the median heuristic over the pooled sample. |
n_permutations |
Integer. Number of joint permutations for the null.
Default |
alpha |
Numeric in |
seed |
Integer or |
n_exact_max |
Integer or |
The returned object carries the full pairwise MMD discrepancy matrix, so a
rejection can be read down to the offending pair: convergence across sources
is corroborating evidence, and divergence localises which source departs and
on which margin. This is the cross-engine concordance role – a sample-based
complement to the score-based ksd_test().
The exact test materialises the pooled n x n kernel matrix (O(n^2)). To
keep large ensembles tractable without a silent loss of exactness, a pooled
sample with more than n_exact_max rows is delegated to
concordance_test_nystrom() – a low-rank approximation that is announced
by a message and recorded in the returned object's approximation and m
fields. Set n_exact_max = Inf to force the exact test, or call
concordance_test_nystrom() directly to control the approximation rank m.
An object of class c("concordance_test", "kernel_test_result")
carrying the standard kernel_test_result fields plus:
Summed pairwise unbiased MMD-squared.
Upper-tail joint-permutation p-value (with +1
correction).
Number of sources compared.
Named integer vector of per-source sample sizes.
Symmetric K x K matrix of pairwise unbiased
MMD-squared, row/column-named by source.
Verdict level and p_value <= alpha.
Max Moldovan, [email protected]
Gretton, A., Borgwardt, K. M., Rasch, M. J., Scholkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723-773.
mmd_test(), ksd_test(), mmd_ppc()
Other goodness-of-fit tests:
concordance_test_nystrom(),
coverage_test(),
gaussian_score(),
ksd_test(),
ksd_test_nystrom(),
numeric_score()
set.seed(1) # Three concordant sources (same distribution): not rejected draws <- list( engine_a = matrix(stats::rnorm(400L), ncol = 2L), engine_b = matrix(stats::rnorm(400L), ncol = 2L), engine_c = matrix(stats::rnorm(400L), ncol = 2L) ) fit_ok <- concordance_test(draws, n_permutations = 199L, seed = 1L) fit_ok # One source departs (mean-shifted): rejected, and the pairwise matrix # localises engine_c draws$engine_c <- draws$engine_c + 1 fit_bad <- concordance_test(draws, n_permutations = 199L, seed = 1L) fit_bad$pairwiseset.seed(1) # Three concordant sources (same distribution): not rejected draws <- list( engine_a = matrix(stats::rnorm(400L), ncol = 2L), engine_b = matrix(stats::rnorm(400L), ncol = 2L), engine_c = matrix(stats::rnorm(400L), ncol = 2L) ) fit_ok <- concordance_test(draws, n_permutations = 199L, seed = 1L) fit_ok # One source departs (mean-shifted): rejected, and the pairwise matrix # localises engine_c draws$engine_c <- draws$engine_c + 1 fit_bad <- concordance_test(draws, n_permutations = 199L, seed = 1L) fit_bad$pairwise
Low-rank counterpart to concordance_test() for large ensembles. The
pooled sample is factorised once – by the Nystrom method (default) or
random Fourier features – into an n x m factor F with
; the summed pairwise unbiased MMD-squared and its
joint-permutation null are then computed from F in O(n m) per
permutation rather than O(n^2), with m << n controlling the
speed / accuracy trade-off. The verdict object and its interpretation are
identical to concordance_test(); only the cost scales differently.
concordance_test_nystrom( x, kernel = kernel_spec(), method = c("nystrom", "rff"), m = 100L, n_permutations = 500L, alpha = 0.05, seed = NULL, regularise = 1e-06 )concordance_test_nystrom( x, kernel = kernel_spec(), method = c("nystrom", "rff"), m = 100L, n_permutations = 500L, alpha = 0.05, seed = NULL, regularise = 1e-06 )
x |
A list of two or more samples to compare. Each element is a numeric
vector, matrix, or data.frame with |
kernel |
Kernel specification. Default is RBF with the median heuristic over the pooled sample. |
method |
Character. |
m |
Integer. Rank of the approximation: the number of Nystrom
landmarks or RFF features. Larger |
n_permutations |
Integer. Number of joint permutations for the null.
Default |
alpha |
Numeric in |
seed |
Integer or |
regularise |
Small positive numeric. Ridge added before the Nystrom
Cholesky for numerical stability; ignored under |
The factorisation of the pooled sample preserves the per-source mean
embeddings (per-source column sums of F), so the pairwise discrepancy
matrix still localises which source departs. The joint-permutation null is
built by relabelling the rows of F – the low-rank analogue of permuting
the pooled-sample labels in concordance_test().
Use concordance_test() for exact results at moderate n; reach for this
function when the pooled sample is large enough that the O(n^2) kernel
matrix is the bottleneck. RFF (method = "rff") requires an RBF kernel;
Nystrom supports any kernel_spec().
An object of class c("concordance_test", "kernel_test_result")
carrying the same fields as concordance_test() plus:
"nystrom" or "rff".
Effective rank used for the factorisation.
Max Moldovan, [email protected]
Gretton, A., Borgwardt, K. M., Rasch, M. J., Scholkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723-773.
Williams, C. K. I., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. NeurIPS, 13.
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. NeurIPS, 20.
concordance_test(), nystrom_factor(), rff_features(),
hsic_test_nystrom()
Other goodness-of-fit tests:
concordance_test(),
coverage_test(),
gaussian_score(),
ksd_test(),
ksd_test_nystrom(),
numeric_score()
Other low-rank acceleration:
hsic_test_nystrom(),
ksd_test_nystrom(),
nystrom_factor(),
rff_features()
set.seed(1) big <- list( engine_a = matrix(stats::rnorm(4000L), ncol = 2L), engine_b = matrix(stats::rnorm(4000L), ncol = 2L), engine_c = matrix(stats::rnorm(4000L), ncol = 2L) + 0.4 ) fit <- concordance_test_nystrom(big, m = 80L, n_permutations = 199L, seed = 1L) fit fit$pairwiseset.seed(1) big <- list( engine_a = matrix(stats::rnorm(4000L), ncol = 2L), engine_b = matrix(stats::rnorm(4000L), ncol = 2L), engine_c = matrix(stats::rnorm(4000L), ncol = 2L) + 0.4 ) fit <- concordance_test_nystrom(big, m = 80L, n_permutations = 199L, seed = 1L) fit fit$pairwise
Quantifies how a predictive ensemble is calibrated against held-out observations, rather than only testing whether it differs from them. Each observation is mapped to its probability integral transform (PIT) within the ensemble; calibrated draws give uniform PITs. The result reports empirical coverage at nominal interval levels, a signed dispersion ratio, a bias indicator, and a rank-histogram uniformity test, and classifies the ensemble as calibrated, under-dispersed (over-confident), over-dispersed, or biased.
coverage_test(x, ...) ## Default S3 method: coverage_test( x, observed, levels = c(0.5, 0.8, 0.9), n_bins = 10L, alpha = 0.05, ... ) ## S3 method for class 'pesto_ensemble' coverage_test(x, observed = NULL, ...) ## S3 method for class 'pesto_ensemble_manifest' coverage_test(x, observed, outputs = NULL, ...)coverage_test(x, ...) ## Default S3 method: coverage_test( x, observed, levels = c(0.5, 0.8, 0.9), n_bins = 10L, alpha = 0.05, ... ) ## S3 method for class 'pesto_ensemble' coverage_test(x, observed = NULL, ...) ## S3 method for class 'pesto_ensemble_manifest' coverage_test(x, observed, outputs = NULL, ...)
x |
Numeric matrix |
... |
Additional arguments passed between methods (currently unused). |
observed |
Numeric matrix |
levels |
Numeric vector in |
n_bins |
Integer. Number of equal-width bins for the rank-histogram
uniformity test. Default |
alpha |
Numeric in |
outputs |
Optional character vector of manifest output columns to test
(the |
This is the graded complement to the binary kernel verdicts mmd_ppc() and
ksd_test(). Its motivating use is ensemble over-confidence: an
under-dispersed ensemble (for example a collapsed iterative-ensemble-smoother
posterior) gives a U-shaped rank histogram, a dispersion ratio above one, and
empirical coverage below nominal – all of which this function names and
measures.
Calibration is assessed per output dimension and pooled across dimensions
(marginal calibration); it does not test the joint dependence structure, for
which the two-sample mmd_ppc() is the right tool. With few held-out
observations the uniformity test has low power; read the coverage and
dispersion summaries (which are informative at any sample size) alongside it.
An object of class "coverage_test" with components:
Data frame of nominal vs pooled empirical coverage at
each requested level.
Matrix of empirical coverage per dimension x level.
Var(PIT) / (1/12): above one under-dispersed,
below one over-dispersed.
Pooled mean PIT (0.5 under calibration; a bias indicator).
List with the rank-histogram chi-squared statistic,
p_value, and n_bins.
Character: the calibration classification.
Logical: calibration$p_value <= alpha.
The n_obs x d matrix of PIT values.
Inputs / sizes.
Provenance carried from a pesto_ensemble_manifest
(including fidelity), or NULL.
Max Moldovan, [email protected]
Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society B, 69(2), 243-268.
Hamill, T. M. (2001). Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129(3), 550-560.
mmd_ppc(), ksd_test(), pesto_ensemble()
Other goodness-of-fit tests:
concordance_test(),
concordance_test_nystrom(),
gaussian_score(),
ksd_test(),
ksd_test_nystrom(),
numeric_score()
set.seed(1) obs <- matrix(stats::rnorm(120L), ncol = 2L) # Calibrated: predictive ensemble from the same law ens_ok <- matrix(stats::rnorm(2000L), ncol = 2L) coverage_test(ens_ok, obs) # Under-dispersed (over-confident): ensemble too narrow ens_tight <- matrix(stats::rnorm(2000L, sd = 0.4), ncol = 2L) coverage_test(ens_tight, obs)set.seed(1) obs <- matrix(stats::rnorm(120L), ncol = 2L) # Calibrated: predictive ensemble from the same law ens_ok <- matrix(stats::rnorm(2000L), ncol = 2L) coverage_test(ens_ok, obs) # Under-dispersed (over-confident): ensemble too narrow ens_tight <- matrix(stats::rnorm(2000L, sd = 0.4), ncol = 2L) coverage_test(ens_tight, obs)
Regression where each input is a bag of points (a sample from a distribution) rather than a single feature vector. Each bag is implicitly mapped to its empirical mean embedding in the RKHS of an inner kernel; the outer (between-bag) kernel acts on those embeddings, and kernel ridge regression predicts a scalar or multivariate output. This is the Szabó-Sriperumbudur-Póczos-Gretton (2016) "learning theory for distribution regression" setup; kernel-mean-embedding background is from Muandet et al. (2017).
dist_regression( bags, y, inner_kernel = kernel_spec(), outer = c("linear", "rbf"), outer_bandwidth = "median", lambda = "cv" )dist_regression( bags, y, inner_kernel = kernel_spec(), outer = c("linear", "rbf"), outer_bandwidth = "median", lambda = "cv" )
bags |
A list of length |
y |
Numeric vector of length |
inner_kernel |
A |
outer |
Character. Outer-kernel form: |
outer_bandwidth |
Outer-kernel bandwidth (RBF only). |
lambda |
Ridge regularisation for kernel ridge regression. If
|
Outer kernels supported.
"linear": .
"rbf": , where the embedding-space
distance is recovered from the inner Gram via
.
Typical ag-systems use: each paddock contributes a bag of soil-core
measurements (variable depth, multiple cores per paddock); the
regression predicts paddock-level yield from the distributional
shape of the soil profile. Distinct from kernel_downscale() in
that the input is itself a distribution, not a fixed-length vector.
An object of class "dist_regression" with components:
Ridge weights (length M or M x d_y).
Training bag-inner-mean Gram (M x M).
Outer kernel matrix (M x M) actually used.
Bags used (kept for prediction).
Targets used.
Resolved inner kernel.
Outer kernel choice and resolved
bandwidth (NA for linear outer).
Ridge parameter used.
Metadata.
Szabó, Z., Sriperumbudur, B. K., Póczos, B., & Gretton, A. (2016). Learning theory for distribution regression. Journal of Machine Learning Research, 17(152), 1-40.
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Scholkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2).
predict.dist_regression(), kernel_downscale()
Other downscaling and embeddings:
aggregate_downscale(),
fit_cme(),
kernel_downscale(),
posterior_sample_aggregate()
set.seed(1) # 30 bags, each a sample from N(mu_i, 1); predict mu_i M <- 30L mu <- stats::rnorm(M) bags <- lapply(mu, function(m) matrix(stats::rnorm(40, mean = m), ncol = 1L)) fit <- dist_regression(bags, y = mu, outer = "linear") fit # Predict at new bags new_mu <- stats::rnorm(5L) new_bags <- lapply(new_mu, function(m) matrix(stats::rnorm(40, mean = m), ncol = 1L)) predict(fit, new_bags)set.seed(1) # 30 bags, each a sample from N(mu_i, 1); predict mu_i M <- 30L mu <- stats::rnorm(M) bags <- lapply(mu, function(m) matrix(stats::rnorm(40, mean = m), ncol = 1L)) fit <- dist_regression(bags, y = mu, outer = "linear") fit # Predict at new bags new_mu <- stats::rnorm(5L) new_bags <- lapply(new_mu, function(m) matrix(stats::rnorm(40, mean = m), ncol = 1L)) predict(fit, new_bags)
Tests whether the distribution of simulated outputs differs between two APSIM (or other) scenario ensembles – e.g. baseline management vs an intervention such as stubble retention – by running the doubly robust DR-DATE statistic of Fawkes, Hu, Evans & Sejdinovic (2024) over the pooled ensembles.
dr_date_scenario( baseline, intervention, output = NULL, propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), n_permutations = 500L, n_bins = 10L, regularisation = "cv", alpha = 0.05, seed = NULL, verbose = FALSE, strict_fidelity = FALSE, ... )dr_date_scenario( baseline, intervention, output = NULL, propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), n_permutations = 500L, n_bins = 10L, regularisation = "cv", alpha = 0.05, seed = NULL, verbose = FALSE, strict_fidelity = FALSE, ... )
baseline |
A |
intervention |
A |
output |
Optional character vector of observation column names
to test against. Defaults to all numeric output columns shared by
the two manifests (the |
propensity_model |
Forwarded to |
outcome_model |
Forwarded to |
n_permutations |
Forwarded to |
n_bins |
Forwarded to |
regularisation |
Forwarded to |
alpha |
Forwarded to |
seed |
Integer or |
verbose |
Logical. Default |
strict_fidelity |
Logical. If |
... |
Reserved. |
This is a thin scenario-facing wrapper around dr_date_test():
parameters are treated as covariates (so the test adjusts for any
systematic difference in the parameter posteriors that came from the
two PESTO runs), outputs are the outcome, and the scenario label is
the binary treatment. Sensitive to distributional differences
(variance, shape, tails), not just mean shifts.
The PESTO 0.3.0 PESTO::pesto_ensemble_manifest S7 contract is the
supported input shape; the per-realisation file-I/O for ingestion is
handled by PESTO::read_manifest() upstream of this call.
An object of class
c("dr_date_scenario", "kernel_test_result") with the standard
kernel_test_result fields plus:
Run id from the baseline manifest.
Run id from the intervention manifest.
Realisations in baseline ensemble.
Realisations in intervention ensemble.
Character vector of output columns used.
Named character – baseline / intervention.
List with baseline / intervention fidelity
provenance from the PESTO manifests (a
list(type, schedule, final_level, n_levels, costs) for a
multi-fidelity run, or NULL for a single-fidelity run).
A PESTO multi-fidelity run records, in the manifest fidelity slot,
which fidelity levels produced the ensemble. Comparing a baseline and
an intervention that were calibrated at different fidelities risks
confounding the distributional contrast with fidelity bias. This
wrapper therefore surfaces a provenance mismatch – one scenario
single-fidelity and the other multi-fidelity, or two multi-fidelity
runs with different stack shapes / final levels – as a warning
(default) or, with strict_fidelity = TRUE, a hard error. Matched or
both-single-fidelity provenance passes silently. The check is
forward-compatible: manifests from PESTO versions that do not populate
the slot read as NULL and pass.
Fawkes, J., Hu, R., Evans, R. J., & Sejdinovic, D. (2024). Doubly robust kernel statistics for testing distributional treatment effects. Transactions on Machine Learning Research.
dr_date_test() for the underlying observational-causal
test; PESTO::pesto_ensemble_manifest for the input contract;
PESTO::pesto_ies_callback() for producing the ensembles upstream.
Other distributional treatment effects:
dr_date_test(),
dr_dett_test()
# Requires PESTO (>= 0.4.1) -- wired through Imports. library(PESTO) npar <- 2L; nobs <- 4L; nreal <- 60L G <- matrix(stats::rnorm(nobs * npar), nobs, npar) y0 <- as.numeric(G %*% c(1.0, -0.5)) + stats::rnorm(nobs, sd = 0.05) y1 <- y0 + c(0.6, 0.6, 0.6, 0.6) # intervention shifts outputs names(y0) <- names(y1) <- paste0("o", seq_len(nobs)) prior <- matrix(stats::rnorm(nreal * npar), nreal, npar, dimnames = list(NULL, c("p1", "p2"))) fit0 <- pesto_ies_callback(function(t) t %*% t(G), prior, y0, 0.05, noptmax = 3, verbose = FALSE) fit1 <- pesto_ies_callback(function(t) t %*% t(G), prior, y1, 0.05, noptmax = 3, verbose = FALSE) m_base <- as_manifest(fit0, run_id = "baseline") m_intv <- as_manifest(fit1, run_id = "intervention") res <- dr_date_scenario(m_base, m_intv, n_permutations = 200L, seed = 1L) print(res)# Requires PESTO (>= 0.4.1) -- wired through Imports. library(PESTO) npar <- 2L; nobs <- 4L; nreal <- 60L G <- matrix(stats::rnorm(nobs * npar), nobs, npar) y0 <- as.numeric(G %*% c(1.0, -0.5)) + stats::rnorm(nobs, sd = 0.05) y1 <- y0 + c(0.6, 0.6, 0.6, 0.6) # intervention shifts outputs names(y0) <- names(y1) <- paste0("o", seq_len(nobs)) prior <- matrix(stats::rnorm(nreal * npar), nreal, npar, dimnames = list(NULL, c("p1", "p2"))) fit0 <- pesto_ies_callback(function(t) t %*% t(G), prior, y0, 0.05, noptmax = 3, verbose = FALSE) fit1 <- pesto_ies_callback(function(t) t %*% t(G), prior, y1, 0.05, noptmax = 3, verbose = FALSE) m_base <- as_manifest(fit0, run_id = "baseline") m_intv <- as_manifest(fit1, run_id = "intervention") res <- dr_date_scenario(m_base, m_intv, n_permutations = 200L, seed = 1L) print(res)
Tests whether the distributions of potential outcomes Y(1) and Y(0) differ using a doubly robust kernel MMD statistic. Detects distributional effects (variance, shape) that mean-based tests miss.
dr_date_test( y, treatment, covariates, kernel_y = kernel_spec(), propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), cross_fit = TRUE, n_folds = 5L, n_permutations = 500L, n_bins = 10L, regularisation = "cv", min_ess_fraction = 0.1, alpha = 0.05, seed = NULL, verbose = FALSE )dr_date_test( y, treatment, covariates, kernel_y = kernel_spec(), propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), cross_fit = TRUE, n_folds = 5L, n_permutations = 500L, n_bins = 10L, regularisation = "cv", min_ess_fraction = 0.1, alpha = 0.05, seed = NULL, verbose = FALSE )
y |
Numeric vector or matrix. Outcome variable. |
treatment |
Binary vector (0/1). Treatment indicator. |
covariates |
Numeric matrix, data.frame, or data.table. Confounders. |
kernel_y |
Kernel specification for outcome space. Default is RBF. |
propensity_model |
Character. Method for propensity estimation:
|
outcome_model |
Character. |
cross_fit |
Logical. If |
n_folds |
Integer. Number of cross-fitting folds. Default is 5. |
n_permutations |
Integer. Number of permutations. Default is 500. |
n_bins |
Integer. Propensity score bins for permutation. Default is 10. |
regularisation |
Numeric or |
min_ess_fraction |
Numeric in (0, 1). If the effective sample
size of either arm's inverse-probability weights falls below this
fraction of |
alpha |
Numeric. Significance level. Default is 0.05. |
seed |
Integer or |
verbose |
Logical. Print progress. Default is |
The DR-DATE test (Fawkes, Hu, Evans & Sejdinovic, 2024) constructs
doubly robust (augmented inverse-probability-weighted) estimators for
the counterfactual mean embeddings of Y(1) and Y(0) in a reproducing
kernel Hilbert space. For arm the augmented embedding is
where is the conditional mean embedding fitted on arm
and are stabilised inverse-probability
weights. The statistic is in the
RKHS. Setting outcome_model = "zero" sets
and recovers the inverse-probability-weighted statistic.
Double robustness: the test is consistent if either the
propensity model or the outcome (CME) model is correctly specified.
Cross-fitting (cross_fit = TRUE) makes this hold under flexible
machine-learning nuisances by removing own-observation overfitting
bias (Chernozhukov et al., 2018).
Permutation null: the reference distribution permutes treatment labels within propensity-score bins, holding the fitted nuisances fixed. This is valid under within-bin exchangeability of treatment given the (binned) propensity score; calibration degrades as the bins coarsen relative to the propensity variation inside them.
Key advantage: unlike DML or TMLE which test only for mean shifts, DR-DATE detects any distributional difference including changes in variance, skewness, or shape.
An object of class "kernel_test_result". The ess element
holds the smaller of the two per-arm effective sample sizes and
ess_warning records whether the reliability floor was hit.
Fawkes, J., Hu, R., Evans, R. J., & Sejdinovic, D. (2024). Doubly robust kernel statistics for testing distributional treatment effects. Transactions on Machine Learning Research.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1-C68.
Other distributional treatment effects:
dr_date_scenario(),
dr_dett_test()
set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) logit_p <- 0.5 * x[, 1] t <- rbinom(n, 1, plogis(logit_p)) y <- t * 1.0 + x[, 1] + rnorm(n, sd = 0.5) result <- dr_date_test(y, t, x, n_permutations = 200, seed = 1) print(result)set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) logit_p <- 0.5 * x[, 1] t <- rbinom(n, 1, plogis(logit_p)) y <- t * 1.0 + x[, 1] + rnorm(n, sd = 0.5) result <- dr_date_test(y, t, x, n_permutations = 200, seed = 1) print(result)
Tests whether the distribution of the treated potential outcome Y(1) differs from the control potential outcome Y(0) among the treated subpopulation. Requires only one-sided overlap.
dr_dett_test( y, treatment, covariates, kernel_y = kernel_spec(), propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), cross_fit = TRUE, n_folds = 5L, n_permutations = 500L, n_bins = 10L, regularisation = "cv", min_ess_fraction = 0.1, alpha = 0.05, seed = NULL, verbose = FALSE )dr_dett_test( y, treatment, covariates, kernel_y = kernel_spec(), propensity_model = c("logistic", "ranger", "xgboost"), outcome_model = c("krr", "zero"), cross_fit = TRUE, n_folds = 5L, n_permutations = 500L, n_bins = 10L, regularisation = "cv", min_ess_fraction = 0.1, alpha = 0.05, seed = NULL, verbose = FALSE )
y |
Numeric vector or matrix. Outcome variable. |
treatment |
Binary vector (0/1). Treatment indicator. |
covariates |
Numeric matrix, data.frame, or data.table. Confounders. |
kernel_y |
Kernel specification for outcome space. Default is RBF. |
propensity_model |
Character. Method for propensity estimation:
|
outcome_model |
Character. |
cross_fit |
Logical. If |
n_folds |
Integer. Number of cross-fitting folds. Default is 5. |
n_permutations |
Integer. Number of permutations. Default is 500. |
n_bins |
Integer. Propensity score bins for permutation. Default is 10. |
regularisation |
Numeric or |
min_ess_fraction |
Numeric in (0, 1). If the effective sample
size of either arm's inverse-probability weights falls below this
fraction of |
alpha |
Numeric. Significance level. Default is 0.05. |
seed |
Integer or |
verbose |
Logical. Print progress. Default is |
DR-DETT is analogous to DR-DATE but focuses on the effect on the
treated (ETT) rather than the average treatment effect. The treated
counterfactual Y(1) | T = 1 is observed directly, so only the
control arm needs an outcome model. The control counterfactual
Y(0) | T = 1 is reconstructed by augmented inverse-probability
weighting, reweighting controls by the treatment odds
so that their covariate distribution matches
the treated. With outcome_model = "krr" the control conditional mean
embedding supplies the doubly robust augmentation;
outcome_model = "zero" returns the inverse-probability-weighted
statistic. It requires only one-sided overlap: P(T = 1 | X) bounded
away from 0 (not necessarily from 1), so it applies where positivity
fails for the control group.
An object of class "kernel_test_result". The ess element
holds the effective sample size of the control reconstruction
weights and ess_warning records whether the reliability floor was
hit.
Fawkes, J., Hu, R., Evans, R. J., & Sejdinovic, D. (2024). Doubly robust kernel statistics for testing distributional treatment effects. Transactions on Machine Learning Research.
Other distributional treatment effects:
dr_date_scenario(),
dr_date_test()
set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.5 * x[, 1])) y <- t * rnorm(n, sd = 2) + (1 - t) * rnorm(n, sd = 1) + x[, 1] result <- dr_dett_test(y, t, x, n_permutations = 200, seed = 1) print(result)set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.5 * x[, 1])) y <- t * rnorm(n, sd = 2) + (1 - t) * rnorm(n, sd = 1) + x[, 1] result <- dr_dett_test(y, t, x, n_permutations = 200, seed = 1) print(result)
Computes ESS from importance weights: ESS = (sum(w))^2 / sum(w^2).
effective_sample_size(w)effective_sample_size(w)
w |
Numeric vector of weights. |
Scalar effective sample size.
Other density ratio and propensity:
assess_overlap(),
estimate_density_ratio(),
estimate_propensity(),
fit_density_ratio(),
plot_weights(),
predict_density_ratio()
w <- runif(100, 0.5, 2) effective_sample_size(w)w <- runif(100, 0.5, 2) effective_sample_size(w)
Wraps fit_density_ratio() + predict_density_ratio() on the same
data. Preserved for backwards compatibility with kernR 0.0.0.901x
callers; new code should prefer the explicit fit/predict pair so
train/test splits are honoured.
estimate_density_ratio( x, z, method = c("logistic", "ranger", "xgboost", "proxymix"), n_noise = 1L, proxymix_components = 2L, seed = NULL )estimate_density_ratio( x, z, method = c("logistic", "ranger", "xgboost", "proxymix"), n_noise = 1L, proxymix_components = 2L, seed = NULL )
x |
Numeric vector or matrix. Treatment variable (training). |
z |
Numeric matrix or data.frame. Confounders (training). |
method |
Character. Backend: |
n_noise |
Integer. Noise samples per real sample for
classifier backends. Default |
proxymix_components |
Integer. Mixture components per density
when |
seed |
Integer or |
The return shape (weights, ratios, ess, method, n) is
unchanged from previous versions. Internally, ratios are now
computed in log-space (which fixes pathological tail behaviour
that the classifier-based 0.0.0.9012 implementation occasionally
showed under extreme imbalance).
A list of class density_ratio_fit_estimate with components
weights, ratios, ess, method, n, and fit (the
underlying density_ratio_fit for callers that want diagnostics).
fit_density_ratio() for the fit/predict surface.
Other density ratio and propensity:
assess_overlap(),
effective_sample_size(),
estimate_propensity(),
fit_density_ratio(),
plot_weights(),
predict_density_ratio()
set.seed(42) n <- 200 z <- matrix(rnorm(n * 2), n, 2) x <- z[, 1] + rnorm(n) dr <- estimate_density_ratio(x, z) dr$essset.seed(42) n <- 200 z <- matrix(rnorm(n * 2), n, 2) x <- z[, 1] + rnorm(n) dr <- estimate_density_ratio(x, z) dr$ess
Estimates P(T = 1 | X) using the specified model, with built-in cross-fitting support.
estimate_propensity( treatment, covariates, method = c("logistic", "ranger", "xgboost"), cross_fit = TRUE, n_folds = 5L, trim = 0.01, seed = NULL )estimate_propensity( treatment, covariates, method = c("logistic", "ranger", "xgboost"), cross_fit = TRUE, n_folds = 5L, trim = 0.01, seed = NULL )
treatment |
Binary vector (0/1). Treatment indicator. |
covariates |
Numeric matrix or data.frame. Confounders. |
method |
Character. |
cross_fit |
Logical. If |
n_folds |
Integer. Number of cross-fitting folds. Default is 5. |
trim |
Numeric. Trim extreme propensity scores to |
seed |
Integer or |
A list of class "propensity_fit" with components:
Estimated propensity scores P(T=1|X).
Method used.
Trimming threshold applied.
Number of scores that were trimmed.
Other density ratio and propensity:
assess_overlap(),
effective_sample_size(),
estimate_density_ratio(),
fit_density_ratio(),
plot_weights(),
predict_density_ratio()
set.seed(42) n <- 300 x <- matrix(rnorm(n * 3), n, 3) logit_p <- 0.5 * x[, 1] - 0.3 * x[, 2] t <- rbinom(n, 1, plogis(logit_p)) ps <- estimate_propensity(t, x) summary(ps$scores)set.seed(42) n <- 300 x <- matrix(rnorm(n * 3), n, 3) logit_p <- 0.5 * x[, 1] - 0.3 * x[, 2] t <- rbinom(n, 1, plogis(logit_p)) ps <- estimate_propensity(t, x) summary(ps$scores)
Estimates the conditional mean embedding in the
RKHS using kernel ridge regression (Park, Muandet, Fukumizu &
Sejdinovic, 2013; Muandet et al., 2017).
fit_cme( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), lambda = "cv" )fit_cme( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), lambda = "cv" )
x |
Numeric matrix of conditioning variables (n x d_x). |
y |
Numeric matrix of target variables (n x d_y). |
kernel_x |
Kernel specification for |
kernel_y |
Kernel specification for |
lambda |
Ridge regularisation parameter. If |
This is the lower-level building block used by kernel_downscale().
Most users should call kernel_downscale(); use fit_cme() directly
when you need access to the trained operator (the weight matrix W)
for custom downstream computations – e.g. constructing a kernel
Bayes' rule update, plugging into a manuscript figure pipeline, or
composing with other RKHS operators.
A list of class "cme_fit" with components:
Operator matrix (K_x + n lambda I)^{-1} (n x n).
Kernel matrix of y.
Training x data.
Resolved kernel specifications.
Regularisation parameter used.
Park, J., Muandet, K., Fukumizu, K., & Sejdinovic, D. (2013). Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models. IEEE Signal Processing Magazine.
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Scholkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2).
predict.cme_fit(), kernel_downscale()
Other downscaling and embeddings:
aggregate_downscale(),
dist_regression(),
kernel_downscale(),
posterior_sample_aggregate()
set.seed(1L) x <- matrix(rnorm(60L), ncol = 2L) y <- matrix(x[, 1L] + rnorm(30L, sd = 0.2), ncol = 1L) fit <- fit_cme(x, y, lambda = 1e-2) dim(fit$W)set.seed(1L) x <- matrix(rnorm(60L), ncol = 2L) y <- matrix(x[, 1L] + rnorm(30L, sd = 0.2), ncol = 1L) fit <- fit_cme(x, y, lambda = 1e-2) dim(fit$W)
Trains a density-ratio estimator for the do-null reweighting
w(x, z) = p*(x) / p(x | z) used by bd_hsic_test(). The fitted
model is decoupled from evaluation: predict_density_ratio() applies
it to held-out rows, so train/test splits are honoured cleanly.
fit_density_ratio( x, z, method = c("logistic", "ranger", "xgboost", "proxymix"), n_noise = 1L, proxymix_components = 2L, seed = NULL )fit_density_ratio( x, z, method = c("logistic", "ranger", "xgboost", "proxymix"), n_noise = 1L, proxymix_components = 2L, seed = NULL )
x |
Numeric vector or matrix. Treatment variable (training). |
z |
Numeric matrix or data.frame. Confounders (training). |
method |
Character. Backend: |
n_noise |
Integer. Noise samples per real sample for
classifier backends. Default |
proxymix_components |
Integer. Mixture components per density
when |
seed |
Integer or |
Four backends are supported (the method argument):
"logistic" (default), "ranger", "xgboost" – classifier-based
noise-contrastive estimation. The classifier is trained to
distinguish joint samples (x, z) from product-of-marginals
samples (x_perm, z); the density ratio is recovered from the
calibrated class probabilities. Log-ratios are stored internally
for numerical stability.
"proxymix" – Gaussian-mixture density-ratio. Fits one GMM to the
joint sample cloud (x, z) and one to a permuted
product-of-marginals cloud via proxymix::fit_proxymix(regime = "sample"); ratios are evaluated in log-space from
proxymix::dgmm(). Per-GMM convergence diagnostics (BIC, AIC,
final log-likelihood, iteration count) are surfaced on the
returned fit; query them via fit$diagnostics.
Introduced in kernR 0.0.0.9014 to close the documented-but-
unimplemented sample-split gap in bd_hsic_test() (see NEWS).
estimate_density_ratio() is now a thin backwards-compatible
wrapper that fits and predicts on the same data.
An object of class density_ratio_fit (plus
density_ratio_fit_<method> as the dispatch class). Carries:
method, the backend-specific fit (model for classifiers;
fit_joint + fit_marg for proxymix), diagnostics,
n_train, ncol_x, ncol_z, seed.
predict_density_ratio(), estimate_density_ratio(),
bd_hsic_test().
Other density ratio and propensity:
assess_overlap(),
effective_sample_size(),
estimate_density_ratio(),
estimate_propensity(),
plot_weights(),
predict_density_ratio()
set.seed(1L) n <- 200L z <- matrix(rnorm(n * 2L), n, 2L) x <- z[, 1L] + rnorm(n) fit <- fit_density_ratio(x, z, method = "logistic", seed = 1L) fit$diagnosticsset.seed(1L) n <- 200L z <- matrix(rnorm(n * 2L), n, 2L) x <- z[, 1L] + rnorm(n) fit <- fit_density_ratio(x, z, method = "logistic", seed = 1L) fit$diagnostics
Builds a score function – the gradient of the log density,
– for a multivariate
normal target, suitable for the score argument of ksd_test().
gaussian_score(mean = NULL, sigma = NULL)gaussian_score(mean = NULL, sigma = NULL)
mean |
Numeric vector of length |
sigma |
Numeric |
The returned closure accepts the sample matrix and returns the score
evaluated row-wise. Leaving mean or sigma at NULL defaults them to
the zero vector and the identity matrix of the dimension seen at call
time, so gaussian_score() with no arguments is the standard-normal
score in any dimension.
A function of one argument (an n x d numeric matrix) returning
the n x d matrix of scores. The mean and covariance are captured by
the closure.
Max Moldovan, [email protected]
Liu, Q., Lee, J. D., & Jordan, M. I. (2016). A kernelized Stein discrepancy for goodness-of-fit tests. Proceedings of the 33rd International Conference on Machine Learning, PMLR 48, 276-284.
Other goodness-of-fit tests:
concordance_test(),
concordance_test_nystrom(),
coverage_test(),
ksd_test(),
ksd_test_nystrom(),
numeric_score()
set.seed(1) x <- matrix(stats::rnorm(200L), ncol = 2L) # Standard-normal target in two dimensions s0 <- gaussian_score() str(s0(x)) # Correlated-normal target sig <- matrix(c(1, 0.5, 0.5, 1), nrow = 2L) s1 <- gaussian_score(mean = c(0, 0), sigma = sig)set.seed(1) x <- matrix(stats::rnorm(200L), ncol = 2L) # Standard-normal target in two dimensions s0 <- gaussian_score() str(s0(x)) # Correlated-normal target sig <- matrix(c(1, 0.5, 0.5, 1), nrow = 2L) s1 <- gaussian_score(mean = c(0, 0), sigma = sig)
Extends bd-HSIC and DR-DATE/DR-DETT to hierarchical (nested/clustered) data by decomposing the test statistic into within-cluster and between-cluster components.
hierarchical_test( y, treatment, covariates, cluster_id, method = c("dr-date", "dr-dett", "bd-hsic"), kernel_y = kernel_spec(), n_permutations = 500L, weight_method = c("equal", "icc", "within_only"), seed = NULL, verbose = FALSE, ... )hierarchical_test( y, treatment, covariates, cluster_id, method = c("dr-date", "dr-dett", "bd-hsic"), kernel_y = kernel_spec(), n_permutations = 500L, weight_method = c("equal", "icc", "within_only"), seed = NULL, verbose = FALSE, ... )
y |
Numeric vector or matrix. Outcome. |
treatment |
Treatment variable (binary for DR tests, any for bd-HSIC). |
covariates |
Numeric matrix of confounders. |
cluster_id |
Factor or integer vector identifying clusters. |
method |
Character. |
kernel_y |
Kernel specification for outcomes. |
n_permutations |
Integer. Number of permutations. Default is 500. |
weight_method |
Character. How to weight within/between components:
|
seed |
Integer or |
verbose |
Logical. |
... |
Additional arguments passed to the underlying test. Do not
pass |
For clustered data (e.g., patients within hospitals, plots within farms), standard kernel tests may have inflated type I error because observations within the same cluster are not independent.
This function decomposes the test into:
Within-cluster: Average of within-cluster test statistics (tests for treatment effects within each cluster).
Between-cluster: Test on cluster-level mean embeddings (tests for treatment effects across clusters).
The combined statistic is a weighted sum, with weights determined
by weight_method. Permutation is performed within clusters to
preserve the hierarchical structure.
An object of class "kernel_test_result" with additional
hierarchical component containing within/between statistics.
Other causal association tests:
bd_hsic_test(),
kernel_causal_test()
set.seed(42) n_clusters <- 20 n_per <- 30 n <- n_clusters * n_per cluster_id <- rep(1:n_clusters, each = n_per) # Cluster-level random effects cluster_effect <- rnorm(n_clusters, sd = 1)[cluster_id] x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.3 * x[, 1])) y <- 0.5 * t + cluster_effect + x[, 1] + rnorm(n) result <- hierarchical_test(y, t, x, cluster_id, method = "dr-date", n_permutations = 100, seed = 1 ) print(result)set.seed(42) n_clusters <- 20 n_per <- 30 n <- n_clusters * n_per cluster_id <- rep(1:n_clusters, each = n_per) # Cluster-level random effects cluster_effect <- rnorm(n_clusters, sd = 1)[cluster_id] x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.3 * x[, 1])) y <- 0.5 * t + cluster_effect + x[, 1] + rnorm(n) result <- hierarchical_test(y, t, x, cluster_id, method = "dr-date", n_permutations = 100, seed = 1 ) print(result)
Pre-PESTO (or pre-IES) screening: for each parameter theta[, j] and
each output y[, k], computes an HSIC permutation test of independence
and flags parameters with no detectable association to any output as
unidentifiable. Useful for trimming the parameter space before
ensemble-smoother calibration of mechanistic ag-system models such as
APSIM.
hsic_identifiability( theta, y, alpha = 0.05, p_adjust = c("BH", "holm", "hochberg", "bonferroni", "BY", "fdr", "none"), n_permutations = 500L, kernel_theta = kernel_spec(), kernel_y = kernel_spec(), seed = NULL )hsic_identifiability( theta, y, alpha = 0.05, p_adjust = c("BH", "holm", "hochberg", "bonferroni", "BY", "fdr", "none"), n_permutations = 500L, kernel_theta = kernel_spec(), kernel_y = kernel_spec(), seed = NULL )
theta |
Numeric matrix |
y |
Numeric matrix |
alpha |
Numeric in |
p_adjust |
Character. Across-grid p-value adjustment method passed
to |
n_permutations |
Integer. Permutations per HSIC test. Default 500. |
kernel_theta |
A |
kernel_y |
A |
seed |
Integer or |
Kernel matrices are computed once per parameter and once per output, so
the total cost is O((p + q) n^2) (kernel construction) plus
O(p q n_permutations n^2) (permutation null), where p is the number
of parameters, q the number of outputs and n the design size.
A parameter is identifiable at level alpha when its smallest
(optionally adjusted) p-value across outputs satisfies
min_p <= alpha. Across-grid p-value adjustment defaults to
Benjamini-Hochberg, which is the natural FDR control for screening
applications.
An object of class "hsic_identifiability" with components:
p x q matrix of HSIC statistics.
p x q matrix of raw permutation p-values.
p x q matrix of adjusted p-values (same as
p_value when p_adjust = "none").
Length-p vector: per-parameter maximum HSIC
across outputs.
Length-p vector: per-parameter minimum adjusted
p-value across outputs.
Length-p logical: min_p_value <= alpha.
Parameter indices ordered by descending max_statistic.
Inputs / metadata.
Character vectors.
The matched call.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Scholkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. NeurIPS, 20.
Da Veiga, S. (2015). Global sensitivity analysis with dependence measures. Journal of Statistical Computation and Simulation, 85(7), 1283-1305.
Other sensitivity and identifiability:
hsic_sensitivity(),
lhs_design()
set.seed(1) n <- 60 # 3 active parameters + 1 inert theta <- matrix(stats::runif(n * 4), nrow = n, dimnames = list(NULL, paste0("p", 1:4))) y1 <- theta[, 1] + 0.5 * theta[, 2]^2 + stats::rnorm(n, sd = 0.1) y2 <- sin(2 * pi * theta[, 3]) + stats::rnorm(n, sd = 0.1) y <- cbind(yield = y1, biomass = y2) fit <- hsic_identifiability(theta, y, n_permutations = 199, seed = 1) print(fit)set.seed(1) n <- 60 # 3 active parameters + 1 inert theta <- matrix(stats::runif(n * 4), nrow = n, dimnames = list(NULL, paste0("p", 1:4))) y1 <- theta[, 1] + 0.5 * theta[, 2]^2 + stats::rnorm(n, sd = 0.1) y2 <- sin(2 * pi * theta[, 3]) + stats::rnorm(n, sd = 0.1) y <- cbind(yield = y1, biomass = y2) fit <- hsic_identifiability(theta, y, n_permutations = 199, seed = 1) print(fit)
Computes HSIC-based sensitivity indices (Da Veiga, 2015) for each
input parameter against each output. By default returns first-order
indices only; total_order = TRUE adds the complementary total-order
decomposition.
hsic_sensitivity( theta, y, total_order = FALSE, p_value = TRUE, n_permutations = 500L, total_order_ci = FALSE, n_bootstrap = 200L, ci_level = 0.95, total_order_test = c("none", "cond_perm"), n_clusters_cp = "auto", p_adjust = c("BH", "holm", "hochberg", "bonferroni", "BY", "fdr", "none"), kernel_theta = kernel_spec(), kernel_y = kernel_spec(), seed = NULL, total_order_p_value = NULL )hsic_sensitivity( theta, y, total_order = FALSE, p_value = TRUE, n_permutations = 500L, total_order_ci = FALSE, n_bootstrap = 200L, ci_level = 0.95, total_order_test = c("none", "cond_perm"), n_clusters_cp = "auto", p_adjust = c("BH", "holm", "hochberg", "bonferroni", "BY", "fdr", "none"), kernel_theta = kernel_spec(), kernel_y = kernel_spec(), seed = NULL, total_order_p_value = NULL )
theta |
Numeric matrix |
y |
Numeric matrix |
total_order |
Logical. If |
p_value |
Logical. If |
n_permutations |
Integer. Permutations per HSIC test. Default
|
total_order_ci |
Logical. If |
n_bootstrap |
Integer. Pair-bootstrap resamples used for the
total-order CI. Default |
ci_level |
Numeric in |
total_order_test |
Character. |
n_clusters_cp |
Integer or |
p_adjust |
Character. Across-grid p-value adjustment method
passed to |
kernel_theta |
A |
kernel_y |
A |
seed |
Integer or |
total_order_p_value |
Defunct as of 0.0.0.9013. Passing any
non- |
First-order index (always computed):
is the direct contribution of X_j to Y – analogous to (but not
equal to) the Sobol first-order index.
Total-order index (when total_order = TRUE):
where X_{~j} is the parameter design with column j removed. By
construction the difference T_j - S_j captures the contribution of
X_j through interactions with other parameters. For purely
additive models T_j = S_j; in the presence of interaction
T_j > S_j.
Unlike variance-based Sobol indices, both versions of the
HSIC-Sensitivity Index capture non-linear and distributional
effects: a parameter that affects the variance, skewness, or tail
of Y without shifting its mean is invisible to Sobol but visible
to HSIC. The normalisation bounds each index to [0, 1].
Optional permutation p-values are computed for first-order indices (Benjamini-Hochberg-adjusted across the grid by default).
Total-order uncertainty quantification (total_order_ci = TRUE,
since 0.0.0.9013). A pair-bootstrap percentile CI on the index
T_j itself: resample (theta, y) pairs with replacement
n_bootstrap times, recompute T_j on each resample, report a
ci_level (default 95%) two-sided percentile interval. This is
uncertainty quantification, not a hypothesis test.
Total-order conditional-permutation significance test
(total_order_test = "cond_perm", since 0.0.0.9014). Tests
H_0: X_j _||_ Y | X_{~j} – under the null, X_j adds nothing
beyond what is already in X_{~j} and T_j is concentrated at
zero. The null is generated by conditional permutation:
k-means-cluster the design points by X_{~j} similarity, then
within each cluster permute Y; recompute T_j on each permuted
design. p-value = (1 + #{T_perm >= T_obs}) / (1 + B). The
p_adjust method applies grid-wide.
The 0.0.0.9012 total_order_p_value pair-bootstrap implementation
was not this – it sampled the empirical joint, not a
null-of-no-effect, and was retracted in 0.0.0.9013. The
0.0.0.9014 conditional-permutation test repopulates
p_value_total_order with a properly-calibrated value; the
total_order_test flag on the result distinguishes the new mode
from the retracted one.
Power caveat (empirical calibration 2026-05-16). Repeated-seed
calibration (orchestra_calibration_20260516.R, B = 100, N = 80)
confirms the cond_perm test is null-calibrated (per-parameter
type-I rates 0.01-0.04 at nominal alpha = 0.05) but conservative
on additive designs (rejection rate ~3% on a strong-additive
design where the first-order test rejects above 90%). The reason
is structural: on additive Y = sum_j f_j(X_j), the conditional
permutation of Y within X_{~j} bins preserves all of
X_{~j}'s contribution and randomises only the X_j component
(which is independent of X_{~j}), so the permuted
HSIC(X_{~j}, Y_perm) is statistically close to the observed
HSIC(X_{~j}, Y). Use cond_perm for interaction detection;
use the first-order permutation p_value (always computed when
p_value = TRUE) for additive contributions. The two are
complementary, not interchangeable.
Total-order CIs cost B * p * q kernel re-evaluations on resampled
designs; set n_bootstrap accordingly. Use the CI for honest
uncertainty bars on the index magnitude; do not interpret it as a
significance verdict.
An object of class "hsic_sensitivity" with components:
p x q matrix of first-order HSIC-Sensitivity Indices
in [0, 1].
p x q matrix of raw first-order HSIC statistics.
p x q matrix of raw permutation p-values
(first-order), or NULL if p_value = FALSE.
p x q matrix of adjusted p-values
(first-order), or NULL.
p x q matrix of total-order indices,
or NULL when total_order = FALSE.
p x q matrix of HSIC(X_{~j},
Y_k) statistics, or NULL.
Logical flag: was total-order requested?
Length-p vector: per-parameter maximum
first-order index across outputs (used for ranking; kept
under this historical name for backwards compatibility).
Parameter indices ordered by descending
total_index.
Metadata.
The matched call.
Da Veiga, S. (2015). Global sensitivity analysis with dependence measures. Journal of Statistical Computation and Simulation, 85(7), 1283-1305.
Gretton, A., Bousquet, O., Smola, A., & Scholkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. ALT, 63-77.
hsic_identifiability(), hsic_test()
Other sensitivity and identifiability:
hsic_identifiability(),
lhs_design()
set.seed(1) n <- 80L theta <- matrix(stats::runif(n * 3L), nrow = n, dimnames = list(NULL, c("active", "weak", "inert"))) y <- 2 * theta[, "active"] + 0.5 * theta[, "weak"] + stats::rnorm(n, sd = 0.1) fit <- hsic_sensitivity(theta, y, total_order = TRUE, n_permutations = 199L, seed = 1L) fitset.seed(1) n <- 80L theta <- matrix(stats::runif(n * 3L), nrow = n, dimnames = list(NULL, c("active", "weak", "inert"))) y <- 2 * theta[, "active"] + 0.5 * theta[, "weak"] + stats::rnorm(n, sd = 0.1) fit <- hsic_sensitivity(theta, y, total_order = TRUE, n_permutations = 199L, seed = 1L) fit
Tests whether two variables are independent using the Hilbert-Schmidt Independence Criterion (HSIC). Uses a permutation test for inference.
hsic_test( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL )hsic_test( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL )
x |
Numeric vector, matrix, or data.frame. First variable. |
y |
Numeric vector, matrix, or data.frame. Second variable. |
kernel_x |
Kernel specification for |
kernel_y |
Kernel specification for |
n_permutations |
Integer. Number of permutations for the null distribution. Default is 500. |
alpha |
Numeric. Significance level. Default is 0.05. |
seed |
Integer or |
An object of class "kernel_test_result" with components:
The observed HSIC test statistic.
Permutation p-value.
"HSIC".
Sample size.
Number of permutations used.
Vector of permuted HSIC values.
Kernel specifications used (with resolved bandwidths).
The matched call.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Scholkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. NeurIPS, 20.
Other independence and two-sample tests:
mmd_test()
set.seed(42) n <- 200 x <- rnorm(n) # Dependent case y_dep <- x^2 + rnorm(n, sd = 0.5) result <- hsic_test(x, y_dep) print(result) # Independent case y_ind <- rnorm(n) result <- hsic_test(x, y_ind) print(result)set.seed(42) n <- 200 x <- rnorm(n) # Dependent case y_dep <- x^2 + rnorm(n, sd = 0.5) result <- hsic_test(x, y_dep) print(result) # Independent case y_ind <- rnorm(n) result <- hsic_test(x, y_ind) print(result)
Accelerated HSIC test using either Nystrom (default) or random
Fourier features (RFF) factorisations of the input kernel matrices.
For large n this scales as O(n m) per permutation instead of
O(n^2), with m << n controlling the speed / accuracy trade-off.
hsic_test_nystrom( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), method = c("nystrom", "rff"), m = 100L, m_x = NULL, m_y = NULL, n_permutations = 500L, alpha = 0.05, seed = NULL, regularise = 1e-06 )hsic_test_nystrom( x, y, kernel_x = kernel_spec(), kernel_y = kernel_spec(), method = c("nystrom", "rff"), m = 100L, m_x = NULL, m_y = NULL, n_permutations = 500L, alpha = 0.05, seed = NULL, regularise = 1e-06 )
x, y
|
Numeric matrices (or vectors). Same number of rows. |
kernel_x, kernel_y
|
|
method |
Character. |
m |
Integer. Rank used for the approximation (number of Nystrom
landmarks or RFF features). Used for both factors unless |
m_x, m_y
|
Optional integers overriding |
n_permutations |
Integer. Default |
alpha |
Numeric in |
seed |
Integer or |
regularise |
Ridge for Nystrom Cholesky (ignored under
|
The test uses the biased HSIC estimator
, which is the form that
factorises cleanly through low-rank approximations. The bias is
O(1/n) and negligible in the large-n regime where Nystrom / RFF
are useful.
The permutation null is built by row-permuting the (centred) y
factor; per-permutation cost is O(n m_x m_y).
An object of class "kernel_test_result" with the standard
fields (statistic, p_value, method, n, n_permutations,
null_distribution, kernel_x, kernel_y, call) plus:
"nystrom" or "rff".
Effective ranks used.
Williams, C. K. I., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. NeurIPS, 13.
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. NeurIPS, 20.
Gretton, A., Bousquet, O., Smola, A., & Scholkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. ALT, 63-77.
hsic_test(), nystrom_factor(), rff_features()
Other low-rank acceleration:
concordance_test_nystrom(),
ksd_test_nystrom(),
nystrom_factor(),
rff_features()
set.seed(1) n <- 1500L x <- stats::rnorm(n) y <- x^2 + stats::rnorm(n, sd = 0.5) fit <- hsic_test_nystrom(x, y, m = 80L, n_permutations = 99L, seed = 1L) fitset.seed(1) n <- 1500L x <- stats::rnorm(n) y <- x^2 + stats::rnorm(n, sd = 0.5) fit <- hsic_test_nystrom(x, y, m = 80L, n_permutations = 99L, seed = 1L) fit
A convenience wrapper that dispatches to the appropriate test function
based on the method argument.
kernel_causal_test( formula, data, method = c("dr-date", "dr-dett", "bd-hsic"), ... )kernel_causal_test( formula, data, method = c("dr-date", "dr-dett", "bd-hsic"), ... )
formula |
A formula of the form |
data |
A data.frame or data.table. |
method |
Character. Test method: |
... |
Additional arguments passed to the specific test function. |
An object of class "kernel_test_result".
Other causal association tests:
bd_hsic_test(),
hierarchical_test()
set.seed(42) n <- 200 dat <- data.frame( y = rnorm(n), treatment = rbinom(n, 1, 0.5), x1 = rnorm(n), x2 = rnorm(n) ) dat$y <- dat$y + 0.5 * dat$treatment + 0.3 * dat$x1 result <- kernel_causal_test(y ~ treatment | x1 + x2, data = dat, method = "dr-date", n_permutations = 100, seed = 1 ) print(result)set.seed(42) n <- 200 dat <- data.frame( y = rnorm(n), treatment = rbinom(n, 1, 0.5), x1 = rnorm(n), x2 = rnorm(n) ) dat$y <- dat$y + 0.5 * dat$treatment + 0.3 * dat$x1 result <- kernel_causal_test(y ~ treatment | x1 + x2, data = dat, method = "dr-date", n_permutations = 100, seed = 1 ) print(result)
Predicts fine-resolution outputs at new coarse-resolution inputs via
conditional mean embedding (CME) regression in an RKHS. Given paired
coarse-fine training data (coarse, fine), fits the operator
in closed form via kernel ridge regression
and returns predictions at new_coarse. This is the
Park-Muandet-Fukumizu-Sejdinovic conditional-mean-embedding scheme,
specialised to the regression form needed for spatial / temporal
downscaling.
kernel_downscale( coarse, fine, new_coarse, kernel_coarse = kernel_spec(), kernel_fine = kernel_spec(), lambda = "cv", return_weights = FALSE )kernel_downscale( coarse, fine, new_coarse, kernel_coarse = kernel_spec(), kernel_fine = kernel_spec(), lambda = "cv", return_weights = FALSE )
coarse |
Numeric matrix |
fine |
Numeric matrix |
new_coarse |
Numeric matrix |
kernel_coarse, kernel_fine
|
|
lambda |
Ridge regularisation parameter for the CME ridge
regression. If |
return_weights |
Logical. If |
Typical ag-systems use: coarse climate-grid inputs (e.g. monthly temperature, rainfall on a 25 km grid) -> fine-resolution outputs (paddock yield, biomass) at the same time index. Train on years where both coarse and fine are observed; predict fine outputs at new coarse inputs.
Compared to a linear regression baseline, the kernel approach captures non-linear coarse-fine relationships without specifying the functional form. Compared to deep-learning downscalers, it has a closed-form solution, uses orders-of-magnitude less data, and carries an interpretable kernel-bandwidth degrees-of-freedom knob.
An object of class "kernel_downscale" with components:
n_new x d_fine matrix of predicted fine
outputs at new_coarse.
Number of training pairs used.
Number of prediction points.
Regularisation used (CV-selected when
lambda = "cv").
Resolved kernel specs.
Optional n_new x n_train weight matrix.
The matched call.
Park, J., Muandet, K., Fukumizu, K., & Sejdinovic, D. (2013). Kernel embeddings of conditional distributions. IEEE Signal Processing Magazine.
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Scholkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2).
fit_cme(), dist_regression() (for downscaling when
each coarse "input" is a bag of points rather than a single
vector).
Other downscaling and embeddings:
aggregate_downscale(),
dist_regression(),
fit_cme(),
posterior_sample_aggregate()
set.seed(1) n <- 80L coarse <- matrix(stats::rnorm(n * 2L), n, 2L, dimnames = list(NULL, c("temp", "rainfall"))) fine <- cbind( yield = 2 * coarse[, "rainfall"] - 0.5 * coarse[, "temp"]^2 + stats::rnorm(n, sd = 0.2), biomass = coarse[, "temp"] + coarse[, "rainfall"]^2 + stats::rnorm(n, sd = 0.2) ) # Predict at a held-out grid new_coarse <- matrix(stats::rnorm(20L * 2L), 20L, 2L, dimnames = list(NULL, c("temp", "rainfall"))) fit <- kernel_downscale(coarse, fine, new_coarse) print(fit) head(fit$prediction)set.seed(1) n <- 80L coarse <- matrix(stats::rnorm(n * 2L), n, 2L, dimnames = list(NULL, c("temp", "rainfall"))) fine <- cbind( yield = 2 * coarse[, "rainfall"] - 0.5 * coarse[, "temp"]^2 + stats::rnorm(n, sd = 0.2), biomass = coarse[, "temp"] + coarse[, "rainfall"]^2 + stats::rnorm(n, sd = 0.2) ) # Predict at a held-out grid new_coarse <- matrix(stats::rnorm(20L * 2L), 20L, 2L, dimnames = list(NULL, c("temp", "rainfall"))) fit <- kernel_downscale(coarse, fine, new_coarse) print(fit) head(fit$prediction)
Computes the kernel (Gram) matrix between two sets of observations.
kernel_matrix(x, y = NULL, kernel = kernel_spec())kernel_matrix(x, y = NULL, kernel = kernel_spec())
x |
Numeric matrix (n x d) or vector. |
y |
Numeric matrix (m x d) or vector. If |
kernel |
A |
An n x m numeric matrix.
Other kernel primitives:
kernel_spec(),
select_bandwidth()
x <- matrix(rnorm(100), 50, 2) K <- kernel_matrix(x) dim(K) # 50 x 50x <- matrix(rnorm(100), 50, 2) K <- kernel_matrix(x) dim(K) # 50 x 50
Constructs a kernel specification object used throughout kernR for
computing kernel matrices. Supports RBF (Gaussian), Matern, linear,
and polynomial kernels.
kernel_spec( type = c("rbf", "matern", "linear", "polynomial"), bandwidth = "median", nu = 2.5, degree = 2L, offset = 1 )kernel_spec( type = c("rbf", "matern", "linear", "polynomial"), bandwidth = "median", nu = 2.5, degree = 2L, offset = 1 )
type |
Character. Kernel type: |
bandwidth |
Numeric or |
nu |
Numeric. Smoothness parameter for the Matern kernel. Common choices: 0.5 (Laplace), 1.5, 2.5, Inf (RBF). Default is 2.5. |
degree |
Integer. Degree for polynomial kernel. Default is 2. |
offset |
Numeric. Offset for polynomial kernel. Default is 1. |
An object of class "kernel_spec".
Other kernel primitives:
kernel_matrix(),
select_bandwidth()
# Default RBF kernel with median heuristic bandwidth k <- kernel_spec() # RBF with fixed bandwidth k <- kernel_spec("rbf", bandwidth = 1.0) # Matern kernel k <- kernel_spec("matern", nu = 1.5) # Linear kernel (no bandwidth needed) k <- kernel_spec("linear")# Default RBF kernel with median heuristic bandwidth k <- kernel_spec() # RBF with fixed bandwidth k <- kernel_spec("rbf", bandwidth = 1.0) # Matern kernel k <- kernel_spec("matern", nu = 1.5) # Linear kernel (no bandwidth needed) k <- kernel_spec("linear")
Tests whether a sample is consistent with a target distribution p, where
p is supplied through its score rather than a
reference sample. The statistic is the (unbiased) kernel Stein discrepancy
(KSD); calibration uses a wild bootstrap of the degenerate U-statistic
null. Because only the score enters, the target may be known up to an
unknown normalising constant.
ksd_test( x, score = NULL, kernel = c("imq", "rbf"), beta = -0.5, bandwidth = "median", n_boot = 1000L, alpha = 0.05, seed = NULL, n_exact_max = 5000L )ksd_test( x, score = NULL, kernel = c("imq", "rbf"), beta = -0.5, bandwidth = "median", n_boot = 1000L, alpha = 0.05, seed = NULL, n_exact_max = 5000L )
x |
Numeric vector, matrix, or data.frame. The |
score |
Either |
kernel |
Character. Base kernel for the Stein kernel: |
beta |
Numeric in |
bandwidth |
Numeric or |
n_boot |
Integer. Number of wild-bootstrap replicates for the null.
Default |
alpha |
Numeric in |
seed |
Integer or |
n_exact_max |
Integer or |
KSD is the score-based, one-sample complement to mmd_test(): where MMD
compares two samples, KSD compares a sample against a density. The
calibration framing is direct – given posterior or ensemble draws and the
score of the distribution they claim to represent, KSD asks whether the
draws actually follow that distribution. It is sensitive to mean,
variance, and tail mis-specification.
The default base kernel is the inverse multi-quadric (IMQ),
with
. Gorham & Mackey (2017) show the IMQ Stein
discrepancy detects non-convergence in regimes where the Gaussian (RBF)
Stein discrepancy is blind, particularly as dimension grows; the RBF base
kernel remains available via kernel = "rbf". The offset c (IMQ) and
bandwidth h (RBF) default to the median heuristic over the sample.
Reproducibility: the wild-bootstrap multipliers are drawn through R's RNG,
so a non-NULL seed makes the p-value reproducible under the active RNG
kind (the R default Mersenne-Twister unless changed by the caller).
The exact test materialises the n x n Stein-kernel matrix, so memory and
compute scale as O(n^2). To keep large samples tractable without a silent
loss of exactness, a sample with more than n_exact_max rows is delegated
to ksd_test_nystrom() – a low-rank approximation that is announced by a
message and recorded in the returned object's approximation and m
fields, so the result stays reproducible from the object. Set
n_exact_max = Inf to force the exact O(n^2) test at any size, or call
ksd_test_nystrom() directly to control the approximation rank m.
An object of class c("ksd_test", "kernel_test_result") carrying
the standard kernel_test_result fields plus:
The unbiased KSD U-statistic.
Upper-tail wild-bootstrap p-value (with +1
correction).
Base kernel used ("imq" or "rbf").
IMQ exponent, or NA for the RBF kernel.
Resolved IMQ offset or RBF bandwidth.
Shannon-information surprise -log2(p_value).
Verdict level and p_value <= alpha.
Max Moldovan, [email protected]
Liu, Q., Lee, J. D., & Jordan, M. I. (2016). A kernelized Stein discrepancy for goodness-of-fit tests. Proceedings of the 33rd International Conference on Machine Learning, PMLR 48, 276-284.
Chwialkowski, K., Strathmann, H., & Gretton, A. (2016). A kernel test of goodness of fit. Proceedings of the 33rd International Conference on Machine Learning, PMLR 48, 2606-2615.
Gorham, J., & Mackey, L. (2017). Measuring sample quality with kernels. Proceedings of the 34th International Conference on Machine Learning, PMLR 70, 1292-1301.
mmd_test(), mmd_ppc(), gaussian_score()
Other goodness-of-fit tests:
concordance_test(),
concordance_test_nystrom(),
coverage_test(),
gaussian_score(),
ksd_test_nystrom(),
numeric_score()
set.seed(1) # Well-specified: standard-normal sample against standard-normal target x_ok <- matrix(stats::rnorm(400L), ncol = 2L) fit_ok <- ksd_test(x_ok, n_boot = 199L, seed = 1L) fit_ok # Mis-specified: mean-shifted sample against the same target x_bad <- x_ok + 1 fit_bad <- ksd_test(x_bad, n_boot = 199L, seed = 1L) fit_bad # Explicit non-standard target via the Gaussian score factory sig <- matrix(c(1, 0.6, 0.6, 1), nrow = 2L) x_cor <- x_ok %*% chol(sig) ksd_test(x_cor, score = gaussian_score(sigma = sig), n_boot = 199L, seed = 1L)set.seed(1) # Well-specified: standard-normal sample against standard-normal target x_ok <- matrix(stats::rnorm(400L), ncol = 2L) fit_ok <- ksd_test(x_ok, n_boot = 199L, seed = 1L) fit_ok # Mis-specified: mean-shifted sample against the same target x_bad <- x_ok + 1 fit_bad <- ksd_test(x_bad, n_boot = 199L, seed = 1L) fit_bad # Explicit non-standard target via the Gaussian score factory sig <- matrix(c(1, 0.6, 0.6, 1), nrow = 2L) x_cor <- x_ok %*% chol(sig) ksd_test(x_cor, score = gaussian_score(sigma = sig), n_boot = 199L, seed = 1L)
Low-rank counterpart to ksd_test() for large samples. The n x n
Stein-kernel matrix is never materialised: it is replaced by a Nystrom
factor F (n x m, m << n) with , and both the
KSD U-statistic and its wild-bootstrap null are computed from F in
O(n m) rather than O(n^2). Because the Langevin Stein kernel is itself
positive semi-definite, is a valid Stein kernel in its own
right, so this is exactly the ksd_test() procedure applied to the rank-m
kernel : the wild-bootstrap calibration of the degenerate
U-statistic null is preserved, and the approximation trades statistical
power – not test validity – for speed.
ksd_test_nystrom( x, score = NULL, kernel = c("imq", "rbf"), beta = -0.5, bandwidth = "median", method = c("nystrom"), m = 100L, n_boot = 1000L, alpha = 0.05, seed = NULL, regularise = 1e-06 )ksd_test_nystrom( x, score = NULL, kernel = c("imq", "rbf"), beta = -0.5, bandwidth = "median", method = c("nystrom"), m = 100L, n_boot = 1000L, alpha = 0.05, seed = NULL, regularise = 1e-06 )
x |
Numeric vector, matrix, or data.frame. The |
score |
Either |
kernel |
Character. Base kernel for the Stein kernel: |
beta |
Numeric in |
bandwidth |
Numeric or |
method |
Character. Factorisation method; currently |
m |
Integer. Number of Nystrom landmarks (the approximation rank);
capped at |
n_boot |
Integer. Number of wild-bootstrap replicates for the null.
Default |
alpha |
Numeric in |
seed |
Integer or |
regularise |
Small positive numeric. Ridge added to the landmark
Stein block before its Cholesky, for numerical stability. Default |
Currently Nystrom-only. Random-Fourier-feature factorisation of the Stein
kernel requires the analytic Fourier derivatives of the base kernel and is
deferred; the method argument is reserved for that extension.
Use ksd_test() for exact results at moderate n; reach for this when the
O(n^2) Stein matrix is the bottleneck. The verdict object and its fields
match ksd_test() plus approximation and m.
An object of class c("ksd_test", "kernel_test_result") carrying
the same fields as ksd_test() plus:
"nystrom".
Effective rank used for the factorisation.
Max Moldovan, [email protected]
Liu, Q., Lee, J. D., & Jordan, M. I. (2016). A kernelized Stein discrepancy for goodness-of-fit tests. ICML, PMLR 48, 276-284.
Chwialkowski, K., Strathmann, H., & Gretton, A. (2016). A kernel test of goodness of fit. ICML, PMLR 48, 2606-2615.
Williams, C. K. I., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. NeurIPS, 13.
ksd_test(), nystrom_factor(), hsic_test_nystrom(),
concordance_test_nystrom()
Other goodness-of-fit tests:
concordance_test(),
concordance_test_nystrom(),
coverage_test(),
gaussian_score(),
ksd_test(),
numeric_score()
Other low-rank acceleration:
concordance_test_nystrom(),
hsic_test_nystrom(),
nystrom_factor(),
rff_features()
set.seed(1) x_ok <- matrix(stats::rnorm(4000L), ncol = 2L) fit_ok <- ksd_test_nystrom(x_ok, m = 80L, n_boot = 199L, seed = 1L) fit_ok x_bad <- x_ok + 1 ksd_test_nystrom(x_bad, m = 80L, n_boot = 199L, seed = 1L)set.seed(1) x_ok <- matrix(stats::rnorm(4000L), ncol = 2L) fit_ok <- ksd_test_nystrom(x_ok, m = 80L, n_boot = 199L, seed = 1L) fit_ok x_bad <- x_ok + 1 ksd_test_nystrom(x_bad, m = 80L, n_boot = 199L, seed = 1L)
Generates a Latin-hypercube sample of size n over the parameter
bounds in bounds. Each column is a random permutation of stratified
uniform draws (one per equal-width bin in (0, 1]), then scaled to the
supplied parameter range. The result is reproducible when seed is
supplied.
lhs_design(n, bounds, seed = NULL)lhs_design(n, bounds, seed = NULL)
n |
Integer. Number of design points (rows). Must be |
bounds |
Two-column numeric matrix or data.frame of |
seed |
Integer or |
This is a lightweight helper aimed at pre-PESTO screening: produce a
design matrix to feed an APSIM (or any) simulator, then pass the
resulting input/output pairs to hsic_identifiability() to flag
unidentifiable parameters before ensemble-smoother calibration.
A numeric matrix of dimension n x nrow(bounds). Column names
are inherited from rownames(bounds) when available, otherwise
theta1, theta2, ....
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239-245.
Other sensitivity and identifiability:
hsic_identifiability(),
hsic_sensitivity()
bounds <- rbind( slope = c(0.1, 2.0), intercept = c(-1, 1), noise_sd = c(0.05, 0.5) ) design <- lhs_design(50, bounds, seed = 1) head(design)bounds <- rbind( slope = c(0.1, 2.0), intercept = c(-1, 1), noise_sd = c(0.05, 0.5) ) design <- lhs_design(50, bounds, seed = 1) head(design)
Model-free verdict on whether a posterior-predictive ensemble is
consistent with held-out observations, via the Maximum Mean Discrepancy
(MMD) two-sample test. Wraps mmd_test() with a posterior-predictive
framing and adds a Shannon-information surprise diagnostic
(-log2(p)) for intuitive interpretation: 0 bits = no surprise
(p = 1); ~4.32 bits = p = 0.05; the maximum achievable surprise
at n_permutations = B is log2(B + 1).
mmd_ppc(x, ...) ## Default S3 method: mmd_ppc( x, observed, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL, ... ) ## S3 method for class 'pesto_ensemble' mmd_ppc(x, observed = NULL, ...) ## S3 method for class 'pesto_ensemble_manifest' mmd_ppc(x, observed, outputs = NULL, ...)mmd_ppc(x, ...) ## Default S3 method: mmd_ppc( x, observed, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL, ... ) ## S3 method for class 'pesto_ensemble' mmd_ppc(x, observed = NULL, ...) ## S3 method for class 'pesto_ensemble_manifest' mmd_ppc(x, observed, outputs = NULL, ...)
x |
Either a numeric matrix |
... |
Additional arguments (currently unused; reserved for future Nystrom acceleration). |
observed |
Numeric matrix |
kernel |
Kernel specification. Default is RBF with median heuristic over the pooled posterior + observed sample. |
n_permutations |
Integer. Permutations for the null. Default 500. |
alpha |
Numeric in |
seed |
Integer or |
outputs |
Optional character vector of output column names from
the manifest to test against. Defaults to all numeric output columns
(the |
Use after an ensemble-smoother run (PESTO IES, EnKF, etc.) to ask: does the calibrated model produce predictive draws that match the held-out year / paddock / season at the distributional level? The MMD test is sensitive to mean, variance, and tail differences – strictly more informative than RMSE on the posterior predictive mean.
The pesto_ensemble_manifest method records provenance from the input
manifest in result$pesto_metadata: run_id, pesto_version,
method, outputs_used, and fidelity (the multi-fidelity provenance
record, or NULL for a single-fidelity run), so the PPC verdict traces
back to the producing ensemble and the fidelity it was calibrated at.
An object of class c("mmd_ppc", "kernel_test_result") with
the standard kernel_test_result fields plus:
Number of posterior-predictive draws.
Number of held-out observations.
Shannon-information surprise -log2(p_value).
Verdict significance level.
Logical: p_value <= alpha.
Carried through from pesto_ensemble input,
when provided; otherwise NULL.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Scholkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR, 13, 723-773.
Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733-760.
Other posterior predictive checks:
pesto_ensemble()
set.seed(1) # Calibrated model: posterior matches truth post <- matrix(stats::rnorm(400L), ncol = 2L) obs <- matrix(stats::rnorm(40L), ncol = 2L) fit_ok <- mmd_ppc(post, obs, n_permutations = 199L, seed = 1L) fit_ok # Miscalibrated model: posterior is mean-shifted obs_shift <- obs + 1.5 fit_bad <- mmd_ppc(post, obs_shift, n_permutations = 199L, seed = 1L) fit_badset.seed(1) # Calibrated model: posterior matches truth post <- matrix(stats::rnorm(400L), ncol = 2L) obs <- matrix(stats::rnorm(40L), ncol = 2L) fit_ok <- mmd_ppc(post, obs, n_permutations = 199L, seed = 1L) fit_ok # Miscalibrated model: posterior is mean-shifted obs_shift <- obs + 1.5 fit_bad <- mmd_ppc(post, obs_shift, n_permutations = 199L, seed = 1L) fit_bad
Tests whether two samples come from the same distribution using the Maximum Mean Discrepancy (MMD). Uses a permutation test for inference.
mmd_test( x, y, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL )mmd_test( x, y, kernel = kernel_spec(), n_permutations = 500L, alpha = 0.05, seed = NULL )
x |
Numeric vector, matrix, or data.frame. First sample. |
y |
Numeric vector, matrix, or data.frame. Second sample. |
kernel |
Kernel specification. Default is RBF with median heuristic. |
n_permutations |
Integer. Number of permutations. Default is 500. |
alpha |
Numeric. Significance level. Default is 0.05. |
seed |
Integer or |
An object of class "kernel_test_result" with components:
The observed MMD^2 statistic (unbiased).
Permutation p-value.
"MMD".
Total sample size (n_x + n_y).
Number of permutations used.
Vector of permuted MMD^2 values.
Kernel specification used.
The matched call.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Scholkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR, 13, 723-773.
Other independence and two-sample tests:
hsic_test()
set.seed(42) # Same distribution x <- matrix(rnorm(200), 100, 2) y <- matrix(rnorm(200), 100, 2) result <- mmd_test(x, y) print(result) # Different distributions y_shifted <- matrix(rnorm(200, mean = 1), 100, 2) result <- mmd_test(x, y_shifted) print(result)set.seed(42) # Same distribution x <- matrix(rnorm(200), 100, 2) y <- matrix(rnorm(200), 100, 2) result <- mmd_test(x, y) print(result) # Different distributions y_shifted <- matrix(rnorm(200, mean = 1), 100, 2) result <- mmd_test(x, y_shifted) print(result)
Builds a score function – the gradient of the log density – by central
finite differences, for use as the score argument of ksd_test(). The
target need only be expressible as a function returning a (possibly
unnormalised) log density; any additive normalising constant cancels in the
gradient, so the constant may be omitted.
numeric_score(log_density, h = 1e-04)numeric_score(log_density, h = 1e-04)
log_density |
A function accepting an |
h |
Positive numeric. Finite-difference step. Default |
This makes ksd_test() usable against any target a user can write down as a
log density, not only targets with a hand-derived score. It is also the
kernR-side adapter for a log-posterior contract: wrapping an exported
log-posterior evaluator in numeric_score() yields the score
ksd_test() needs to check whether posterior draws are calibrated against
that posterior, with no dependency added on the producer – the evaluator is
passed in as an ordinary function.
Central differences cost 2 * d evaluations of log_density per call, where
d is the dimension. For a cheap closed-form target prefer gaussian_score()
or a hand-written score; numeric_score() is the general fallback.
A function of one argument (an n x d numeric matrix) returning the
n x d matrix of finite-difference scores, suitable as the score
argument of ksd_test().
Max Moldovan, [email protected]
Other goodness-of-fit tests:
concordance_test(),
concordance_test_nystrom(),
coverage_test(),
gaussian_score(),
ksd_test(),
ksd_test_nystrom()
set.seed(1) x <- matrix(stats::rnorm(200L), ncol = 2L) # Standard-normal log density (unnormalised): -||x||^2 / 2 ld <- function(z) -0.5 * rowSums(z^2) s <- numeric_score(ld) # Matches the closed-form standard-normal score -x to finite-difference order max(abs(s(x) - (-x))) # Use directly in a goodness-of-fit test ksd_test(x, score = numeric_score(ld), n_boot = 199L, seed = 1L)set.seed(1) x <- matrix(stats::rnorm(200L), ncol = 2L) # Standard-normal log density (unnormalised): -||x||^2 / 2 ld <- function(z) -0.5 * rowSums(z^2) s <- numeric_score(ld) # Matches the closed-form standard-normal score -x to finite-difference order max(abs(s(x) - (-x))) # Use directly in a goodness-of-fit test ksd_test(x, score = numeric_score(ld), n_boot = 199L, seed = 1L)
Builds a Nystrom approximation of an
n x n kernel matrix using m << n uniformly-sampled landmarks.
The returned factor F is an n x m matrix; downstream kernel
computations (HSIC, MMD, etc.) that can be expressed in F reduce
from O(n^2) to O(n m) cost.
nystrom_factor( x, kernel = kernel_spec(), m = 100L, regularise = 1e-06, seed = NULL )nystrom_factor( x, kernel = kernel_spec(), m = 100L, regularise = 1e-06, seed = NULL )
x |
Numeric matrix |
kernel |
A |
m |
Integer. Number of landmark points. Capped at |
regularise |
Small positive numeric. Ridge added to |
seed |
Integer or |
The construction is:
Sample m landmark indices uniformly without replacement.
Compute the landmark Gram (m x m) and the
cross-Gram (n x m).
Stabilise: .
Cholesky factor .
Return , so that
.
Any kernel_spec() is supported. Bandwidth selection (median
heuristic) is performed on the full dataset before landmark sampling.
An object of class "kernel_factor" with components:
The n x m_eff factor matrix.
"nystrom".
Effective rank (<= m).
Resolved kernel spec.
Number of rows in the input.
Williams, C. K. I., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. NeurIPS, 13.
rff_features(), hsic_test_nystrom()
Other low-rank acceleration:
concordance_test_nystrom(),
hsic_test_nystrom(),
ksd_test_nystrom(),
rff_features()
set.seed(1) x <- matrix(stats::rnorm(2000L), ncol = 2L) f <- nystrom_factor(x, m = 80L, seed = 1L) dim(f$F)set.seed(1) x <- matrix(stats::rnorm(2000L), ncol = 2L) f <- nystrom_factor(x, m = 80L, seed = 1L) dim(f$F)
Lightweight constructor for a posterior-predictive ensemble produced by
PESTO (or any compatible UQ engine). Bundles a posterior-predictive
sample matrix with optional held-out observations and free-form
metadata, providing a stable interface for mmd_ppc() and other
downstream verdict-layer tools.
pesto_ensemble(posterior, observed = NULL, metadata = list())pesto_ensemble(posterior, observed = NULL, metadata = list())
posterior |
Numeric matrix |
observed |
Optional numeric matrix |
metadata |
Optional named list of free-form metadata (run id, ensemble seed, holdout year, etc.). |
This is the kernR-side schema for the cross-package contract; until PESTO ships its native manifest emitter, callers can construct the object directly from in-memory matrices.
An object of class "pesto_ensemble".
Other posterior predictive checks:
mmd_ppc()
set.seed(1) post <- matrix(stats::rnorm(200L), ncol = 2L) obs <- matrix(stats::rnorm(20L), ncol = 2L) ens <- pesto_ensemble(post, obs, metadata = list(holdout_year = 2018)) ensset.seed(1) post <- matrix(stats::rnorm(200L), ncol = 2L) obs <- matrix(stats::rnorm(20L), ncol = 2L) ens <- pesto_ensemble(post, obs, metadata = list(holdout_year = 2018)) ens
Plots the distribution of importance weights with effective sample size annotation.
plot_weights(weights, main = "Weight Distribution")plot_weights(weights, main = "Weight Distribution")
weights |
Numeric vector of importance weights. |
main |
Title. Default is "Weight Distribution". |
Invisibly returns weights.
Other density ratio and propensity:
assess_overlap(),
effective_sample_size(),
estimate_density_ratio(),
estimate_propensity(),
fit_density_ratio(),
predict_density_ratio()
set.seed(1L) weights <- rgamma(200L, shape = 2, rate = 2) plot_weights(weights)set.seed(1L) weights <- rgamma(200L, shape = 2, rate = 2) plot_weights(weights)
Bar plot of per-parameter maximum HSIC across outputs, ordered by magnitude. Bars for identifiable parameters are coloured; non- identifiable parameters are shown in grey.
## S3 method for class 'hsic_identifiability' plot(x, col_yes = "#0072B2", col_no = "grey70", ...)## S3 method for class 'hsic_identifiability' plot(x, col_yes = "#0072B2", col_no = "grey70", ...)
x |
An |
col_yes, col_no
|
Bar colours for identifiable / non-identifiable parameters. |
... |
Additional arguments passed to |
Invisibly returns x. Side effect: produces a base R plot.
set.seed(1) n <- 50 theta <- matrix(stats::runif(n * 3), nrow = n, dimnames = list(NULL, c("active", "active2", "inert"))) y <- theta[, 1] + theta[, 2]^2 + stats::rnorm(n, sd = 0.1) fit <- hsic_identifiability(theta, y, n_permutations = 199, seed = 1) plot(fit)set.seed(1) n <- 50 theta <- matrix(stats::runif(n * 3), nrow = n, dimnames = list(NULL, c("active", "active2", "inert"))) y <- theta[, 1] + theta[, 2]^2 + stats::rnorm(n, sd = 0.1) fit <- hsic_identifiability(theta, y, n_permutations = 199, seed = 1) plot(fit)
Bar plot of per-parameter HSIC-Sensitivity Index, ordered by
first-order index magnitude. When total_order was set on the fit,
which = "total" shows the total-order indices and which = "both"
shows side-by-side bars for S and T.
## S3 method for class 'hsic_sensitivity' plot( x, which = c("first", "total", "both"), alpha = 0.05, col_sig = "#0072B2", col_nonsig = "grey70", col_total = "#D55E00", ... )## S3 method for class 'hsic_sensitivity' plot( x, which = c("first", "total", "both"), alpha = 0.05, col_sig = "#0072B2", col_nonsig = "grey70", col_total = "#D55E00", ... )
x |
An |
which |
Character. |
alpha |
Numeric in |
col_sig, col_nonsig, col_total
|
Bar colours. |
... |
Additional arguments passed to |
Invisibly returns x. Side effect: produces a base R plot.
Plots the permutation null distribution with the observed statistic.
## S3 method for class 'kernel_test_result' plot(x, ...)## S3 method for class 'kernel_test_result' plot(x, ...)
x |
A |
... |
Additional arguments (currently ignored). |
Invisibly returns x. Side effect: produces a base R plot.
set.seed(42) x_data <- rnorm(100) y_data <- x_data + rnorm(100, sd = 0.5) res <- hsic_test(x_data, y_data) plot(res)set.seed(42) x_data <- rnorm(100) y_data <- x_data + rnorm(100, sd = 0.5) res <- hsic_test(x_data, y_data) plot(res)
Draws n samples from the per-component posterior mixture.
Each draw picks a component by posterior_weights, then samples
from N(posterior_components_means[[k]], posterior_components_covariances[[k]]).
posterior_sample_aggregate(object, n = 1000L, seed = NULL)posterior_sample_aggregate(object, n = 1000L, seed = NULL)
object |
An |
n |
Integer. Number of posterior samples. Default |
seed |
Integer or |
An n x dim_x numeric matrix.
Other downscaling and embeddings:
aggregate_downscale(),
dist_regression(),
fit_cme(),
kernel_downscale()
set.seed(1L) A <- matrix(c(0.5, 0.5), nrow = 1L) prior <- list( means = list(c(0, 0), c(2, 2)), covariances = list(diag(2L), diag(2L)), weights = c(0.5, 0.5) ) fit <- aggregate_downscale(y = 1.0, aggregator = A, latent_prior = prior, sigma_y = 0.2) draws <- posterior_sample_aggregate(fit, n = 500L, seed = 1L) colMeans(draws)set.seed(1L) A <- matrix(c(0.5, 0.5), nrow = 1L) prior <- list( means = list(c(0, 0), c(2, 2)), covariances = list(diag(2L), diag(2L)), weights = c(0.5, 0.5) ) fit <- aggregate_downscale(y = 1.0, aggregator = A, latent_prior = prior, sigma_y = 0.2) draws <- posterior_sample_aggregate(fit, n = 500L, seed = 1L) colMeans(draws)
Applies a density_ratio_fit object (from fit_density_ratio())
to new (x, z) rows. All four backends compute ratios in log-space
internally for numerical stability; type controls the returned
representation.
predict_density_ratio( object, new_x, new_z, type = c("log_ratio", "weight", "ratio") )predict_density_ratio( object, new_x, new_z, type = c("log_ratio", "weight", "ratio") )
object |
A |
new_x |
Numeric vector or matrix. Treatment values to evaluate. |
new_z |
Numeric matrix or data.frame. Confounders to evaluate. |
type |
Character. Return type:
|
Numeric vector of length nrow(new_x).
Other density ratio and propensity:
assess_overlap(),
effective_sample_size(),
estimate_density_ratio(),
estimate_propensity(),
fit_density_ratio(),
plot_weights()
set.seed(1L) n <- 200L z <- matrix(rnorm(n * 2L), n, 2L) x <- z[, 1L] + rnorm(n) fit <- fit_density_ratio(x, z, method = "logistic", seed = 1L) weights <- predict_density_ratio(fit, new_x = x, new_z = z, type = "weight") summary(weights)set.seed(1L) n <- 200L z <- matrix(rnorm(n * 2L), n, 2L) x <- z[, 1L] + rnorm(n) fit <- fit_density_ratio(x, z, method = "logistic", seed = 1L) weights <- predict_density_ratio(fit, new_x = x, new_z = z, type = "weight") summary(weights)
Returns the row weights from a fitted fit_cme() object. Each row of the result is the
weight vector for combining training-y quantities (kernel values
or values themselves) to produce a CME prediction at x_new.
## S3 method for class 'cme_fit' predict(object, x_new, ...)## S3 method for class 'cme_fit' predict(object, x_new, ...)
object |
A |
x_new |
Numeric matrix of new conditioning points. |
... |
Currently ignored. |
For a typical "predict Y at new X" workflow use kernel_downscale(),
which combines this with the training Y matrix to return predictions
directly.
An n_new x n_train matrix of embedding weights.
Predict from a Fitted Distribution Regression Model
## S3 method for class 'dist_regression' predict(object, newdata, ...)## S3 method for class 'dist_regression' predict(object, newdata, ...)
object |
A |
newdata |
A list of new bags (matrices with the same number of columns as the training bags). Bag sizes may differ from training. |
... |
Currently ignored. |
Numeric vector (length length(newdata)) when the model was
fitted with a scalar y; numeric matrix (length(newdata) x d_y)
for multivariate y.
Prints a compact summary of a fitted conditional mean embedding: the training-sample size, the input and output dimensions, the resolved kernels, and the ridge regularisation parameter that was used.
## S3 method for class 'cme_fit' print(x, ...)## S3 method for class 'cme_fit' print(x, ...)
x |
A |
... |
Unused; present for S3 generic compatibility. |
The cme_fit object x, invisibly.
set.seed(1L) x <- matrix(rnorm(60L), ncol = 2L) y <- matrix(x[, 1L] + rnorm(30L, sd = 0.2), ncol = 1L) print(fit_cme(x, y, lambda = 1e-2))set.seed(1L) x <- matrix(rnorm(60L), ncol = 2L) y <- matrix(x[, 1L] + rnorm(30L, sd = 0.2), ncol = 1L) print(fit_cme(x, y, lambda = 1e-2))
Builds an n x D feature map such that
for the RBF kernel
, via Rahimi & Recht
(2007): draw and
, then
.
rff_features(x, kernel = kernel_spec("rbf"), D = 200L, seed = NULL)rff_features(x, kernel = kernel_spec("rbf"), D = 200L, seed = NULL)
x |
Numeric matrix |
kernel |
A |
D |
Integer. Number of random features. Larger D -> better
approximation but higher memory / compute. Default |
seed |
Integer or |
Currently RBF-only; other shift-invariant kernels (Matern with
specific nu) require their own Fourier spectra and are not yet
implemented.
An object of class "kernel_factor" with components:
The n x D random-feature matrix.
"rff".
Equal to D.
Resolved kernel spec.
Number of rows in the input.
Random draws (kept for reproducible re-encoding).
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. NeurIPS, 20.
nystrom_factor(), hsic_test_nystrom()
Other low-rank acceleration:
concordance_test_nystrom(),
hsic_test_nystrom(),
ksd_test_nystrom(),
nystrom_factor()
set.seed(1) x <- matrix(stats::rnorm(2000L), ncol = 2L) f <- rff_features(x, D = 150L, seed = 1L) dim(f$F)set.seed(1) x <- matrix(stats::rnorm(2000L), ncol = 2L) f <- rff_features(x, D = 150L, seed = 1L) dim(f$F)
Computes a bandwidth (lengthscale) for RBF or Matern kernels using the specified method.
select_bandwidth(x, method = "median")select_bandwidth(x, method = "median")
x |
Numeric matrix or vector. |
method |
Character. |
"median": The median heuristic sets bandwidth to the square root
of the median of pairwise squared distances. Robust default for most
kernel tests (Gretton et al., 2012).
"scott": Scott's rule: n^(-1/(d+4)) * sd_pooled. Good for
density estimation but may undersmooth for testing.
A positive numeric scalar.
Other kernel primitives:
kernel_matrix(),
kernel_spec()
x <- matrix(rnorm(200), 100, 2) select_bandwidth(x, "median")x <- matrix(rnorm(200), 100, 2) select_bandwidth(x, "median")