--- title: "Hierarchical and Nested Data" author: "Max Moldovan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Hierarchical and Nested Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 150 ) ``` ## Why Hierarchy Matters Many real-world datasets have nested structure: - Patients within hospitals - Students within schools - Plots within farms - Repeated measures within subjects Standard kernel tests assume independent observations. When observations are clustered, **within-cluster correlation inflates type I error**. `hierarchical_test()` accounts for this by decomposing the test statistic and permuting within clusters. ## Example: Agriculture Trial Imagine a randomised fertiliser trial across 20 farms, each with 30 plots. ```{r agri-example} library(kernR) set.seed(42) n_farms <- 20 n_plots <- 30 n <- n_farms * n_plots farm_id <- rep(1:n_farms, each = n_plots) # Farm-level random effects farm_effect <- rnorm(n_farms, sd = 2)[farm_id] soil <- matrix(rnorm(n * 2), n, 2) # Treatment assignment (partially confounded by soil) treatment <- rbinom(n, 1, plogis(0.3 * soil[, 1])) # Yield: treatment has a real effect + farm random effect yield <- 0.8 * treatment + farm_effect + 0.5 * soil[, 1] + rnorm(n) result <- hierarchical_test( y = yield, treatment = treatment, covariates = soil, cluster_id = farm_id, method = "dr-date", n_permutations = 100, weight_method = "icc", seed = 1 ) result ``` The test detects the treatment effect while correctly accounting for the farm-level clustering. ## Decomposition: Within vs Between The test provides both components: ```{r decomposition} cat("Within-cluster average statistic:", mean(result$hierarchical$within_stats, na.rm = TRUE), "\n") cat("Between-cluster statistic:", result$hierarchical$between_stat, "\n") cat("Combined statistic:", result$statistic, "\n") cat("Weight method:", result$hierarchical$weight_method, "\n") ``` ## Weight Methods | Method | Behaviour | |--------|-----------| | `"equal"` | Equal weight to within and between components | | `"icc"` | Weight by ICC (more between-weight when clusters differ) | | `"within_only"` | Ignore between-cluster variation | ## Example: No Treatment Effect ```{r null-hierarchical} set.seed(42) yield_null <- farm_effect + 0.5 * soil[, 1] + rnorm(n) # No treatment effect result_null <- hierarchical_test( y = yield_null, treatment = treatment, covariates = soil, cluster_id = farm_id, method = "dr-date", n_permutations = 100, seed = 1 ) cat("P-value under null:", result_null$p_value, "\n") ``` ## When to Use Hierarchical Tests | Scenario | Recommendation | |----------|---------------| | Independent observations | Use standard `dr_date_test()` / `bd_hsic_test()` | | Clustered data (known groups) | Use `hierarchical_test()` with `cluster_id` | | Few large clusters | `weight_method = "icc"` | | Many small clusters | `weight_method = "equal"` | | Unsure about between-cluster effects | `weight_method = "within_only"` (conservative) | ```{r session-info} sessionInfo() ```