Standard independence tests cannot distinguish causal association from confounded association. If treatment X and outcome Y share a common cause Z, they will appear dependent even if X has no causal effect on Y.
The bd-HSIC test (Hu, Sejdinovic & Evans, 2024) solves this by testing the do-null hypothesis:
H_0: p(y | do(x)) = p*(y) for all x
This uses Pearl’s do-operator: after intervening on X, is Y still associated with X?
library(kernR)
set.seed(42)
n <- 300
z <- matrix(rnorm(n * 2), n, 2)
x <- 0.5 * z[, 1] + rnorm(n) # X depends on confounder Z
y <- 0.8 * x + 0.5 * z[, 2] + rnorm(n) # Y depends causally on X and on Z
result <- bd_hsic_test(x, y, z,
n_permutations = 200,
seed = 1
)
result
#>
#> bd-HSIC Test
#>
#> Statistic: 0.0182464
#> P-value: 0.0050
#> N: 150
#> Perms: 200
#> Kernel X: rbf (bw = 1.062)
#> Kernel Y: rbf (bw = 1.425)
#> ESS: 149.7The test detects the causal association between X and Y.
set.seed(42)
n <- 300
z <- matrix(rnorm(n * 2), n, 2)
x <- 0.5 * z[, 1] + rnorm(n)
y <- 0.5 * z[, 1] + z[, 2] + rnorm(n) # Y depends on Z, not on X
result_null <- bd_hsic_test(x, y, z,
n_permutations = 200,
seed = 1
)
result_null
#>
#> bd-HSIC Test
#>
#> Statistic: 0.0022688
#> P-value: 0.4826
#> N: 150
#> Perms: 200
#> Kernel X: rbf (bw = 1.062)
#> Kernel Y: rbf (bw = 1.553)
#> ESS: 149.7The large p-value correctly indicates no causal effect.
A key advantage of bd-HSIC is detecting non-linear effects that linear methods (PDS, Double ML) completely miss:
set.seed(42)
n <- 400
z <- matrix(rnorm(n * 2), n, 2)
x <- z[, 1] + rnorm(n)
y <- x^2 + z[, 2] + rnorm(n, sd = 0.5) # Quadratic causal effect
result_nl <- bd_hsic_test(x, y, z,
n_permutations = 200,
seed = 1
)
result_nl
#>
#> bd-HSIC Test
#>
#> Statistic: 0.0206771
#> P-value: 0.0050
#> N: 200
#> Perms: 200
#> Kernel X: rbf (bw = 1.299)
#> Kernel Y: rbf (bw = 2.029)
#> ESS: 199.6dat <- data.frame(y = y, x = x, z1 = z[, 1], z2 = z[, 2])
result_f <- kernel_causal_test(
y ~ x | z1 + z2,
data = dat,
method = "bd-hsic",
n_permutations = 100,
seed = 1
)
result_f
#>
#> bd-HSIC Test
#>
#> Statistic: 0.0206771
#> P-value: 0.0099
#> N: 200
#> Perms: 100
#> Kernel X: rbf (bw = 1.299)
#> Kernel Y: rbf (bw = 2.029)
#> ESS: 199.6| Scenario | Use bd-HSIC? |
|---|---|
| Testing if X causally affects Y (adjusting for Z) | Yes |
| Non-linear or non-monotone causal effects | Yes – key advantage |
| Continuous, binary, or mixed treatments | Yes |
| Very high-dimensional confounders | Consider using density_ratio = "ranger" |
| Extremely strong confounding | Caution: density ratio estimation may fail |
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kernR_0.7.0.9000 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.3 cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [5] xfun_0.58 S7_0.2.2 jsonlite_2.0.0 data.table_1.18.4
#> [9] glue_1.8.1 buildtools_1.0.0 htmltools_0.5.9 maketools_1.3.2
#> [13] PESTO_0.6.0.9000 sys_3.4.3 sass_0.4.10 scales_1.4.0
#> [17] grid_4.6.0 evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [21] yaml_2.3.12 lifecycle_1.0.5 compiler_4.6.0 RColorBrewer_1.1-3
#> [25] Rcpp_1.1.1-1.1 farver_2.1.2 digest_0.6.39 R6_2.6.1
#> [29] bslib_0.11.0 tools_4.6.0 gtable_0.3.6 ggplot2_4.0.3
#> [33] cachem_1.1.0