--- title: "Hierarchical bd-HSIC for Multi-Site Designs" author: "Max Moldovan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Hierarchical bd-HSIC for Multi-Site Designs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 150 ) set.seed(1) ``` ## Why a hierarchical version of bd-HSIC? Agricultural trials almost always carry design clustering: paddocks within sites, sites within regions, plots within farms, seasons within years. Treat such data as i.i.d. and the permutation null breaks: by shuffling treatment labels across clusters, the null distribution absorbs cluster-level variation as if it were causal signal, inflating Type I error. The original bd-HSIC test (Hu, Sejdinovic & Evans, 2024) handles this by clustering on propensity-score similarity. That works when the design has no natural hierarchy. When the *design itself* tells you which observations should be exchangeable -- *site*, *paddock*, *season* -- you should use the design clusters directly. `bd_hsic_test()` accepts an optional `cluster_id` argument exactly for this case. Supplying it activates **within-cluster permutation**: indices of `y` are reshuffled only within each cluster, preserving cluster-level effects in the null. ## A multi-site stub ```{r design} library(kernR) # 8 sites x 30 plots/site = 240 observations. # Each site has its own random effect on yield. n_sites <- 8L n_per <- 30L n <- n_sites * n_per site <- factor(paste0("site", rep(seq_len(n_sites), each = n_per))) site_id <- as.integer(site) set.seed(101L) site_effect <- stats::rnorm(n_sites, sd = 1.0)[site_id] # Confounders: rainfall, soil_n z <- cbind( rainfall = stats::rnorm(n), soil_n = stats::rnorm(n) ) # Management treatment (e.g. residue retention intensity) x <- 0.4 * z[, "rainfall"] + 0.3 * site_effect + stats::rnorm(n) # Yield: causal effect of management + confounders + site random effect y <- 0.6 * x + 0.5 * z[, "soil_n"] + site_effect + stats::rnorm(n, sd = 0.5) ``` ## Naive vs hierarchical permutation The same data, two permutation schemes: ```{r compare-schemes} fit_naive <- bd_hsic_test( x, y, z, permutation = "naive", n_permutations = 199L, seed = 1L ) fit_hier <- bd_hsic_test( x, y, z, cluster_id = site, # activates within_cluster by default n_permutations = 199L, seed = 1L ) data.frame( scheme = c(fit_naive$permutation_scheme, fit_hier$permutation_scheme), stat = c(fit_naive$statistic, fit_hier$statistic), p_value = c(fit_naive$p_value, fit_hier$p_value), ess = c(fit_naive$ess, fit_hier$ess), row.names = NULL ) ``` In the presence of a true causal effect, both schemes typically reject; under the null, the naive scheme is more prone to false positives. ## Stratified per-cluster contributions When `cluster_id` is supplied, the result carries a per-cluster bd-HSIC breakdown: ```{r per-cluster} fit_hier$per_cluster_statistic ``` Plot the per-cluster contributions to look for a single-site dominant signal vs. a uniformly distributed effect: ```{r plot-per-cluster, fig.height = 4} graphics::barplot( fit_hier$per_cluster_statistic, las = 2L, col = "#0072B2", border = NA, ylab = "weighted bd-HSIC (per site)", main = "Stratified bd-HSIC contributions" ) ``` A site whose stratified statistic dwarfs the others is a candidate for follow-up investigation -- either real site-specific effect modification or a data-quality issue. ## Type-I behaviour under a true null with cluster effects To illustrate why the hierarchical scheme matters, here is the null case: cluster effects exist, but `x` has no causal effect on `y`. ```{r null-case} set.seed(202L) n_sites <- 8L; n_per <- 30L; n <- n_sites * n_per site <- factor(paste0("s", rep(seq_len(n_sites), each = n_per))) site_id <- as.integer(site) site_effect <- stats::rnorm(n_sites, sd = 1.0)[site_id] z <- cbind(stats::rnorm(n), stats::rnorm(n)) x <- 0.4 * z[, 1L] + 0.3 * site_effect + stats::rnorm(n) y <- 0.5 * z[, 2L] + site_effect + stats::rnorm(n, sd = 0.5) # no x fit_null <- bd_hsic_test( x, y, z, cluster_id = site, n_permutations = 199L, seed = 202L ) fit_null$p_value ``` Under the null, the hierarchical scheme should not reject (`p > 0.05`); the naive scheme on the same data is more likely to. ## Choosing the scheme | Scenario | Scheme | |---|---| | Truly i.i.d. observations, no design clustering | `permutation = "auto"`, no `cluster_id` | | Multi-site / multi-season ag design | `cluster_id = site` (default within-cluster) | | Within-cluster independence is plausible *and* you want maximum power | `permutation = "naive"` (rare in practice) | ## Notes on practice - **Cluster size.** Within-cluster permutation needs `>= 2` test observations per cluster (test split = `1 - split_ratio` of `n`). Smaller clusters are reported as `NA` in `per_cluster_statistic` and contribute the identity permutation. - **Many small clusters.** When clusters are small (e.g. paddocks of 3-5 plots), within-cluster permutation has limited power; consider pooling at a coarser level (site instead of paddock). - **Imbalanced design.** Per-cluster statistics are reported on the raw scale, not weighted by cluster size; combine externally if you need a size-weighted summary. ## Reference - Hu, R., Sejdinovic, D., & Evans, R. J. (2024). *A kernel test for causal association via noise contrastive backdoor adjustment.* Journal of Machine Learning Research, 25(160), 1-56.