--- title: "Distributional Treatment Effect Tests (DR-DATE / DR-DETT)" author: "Max Moldovan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Distributional Treatment Effect Tests (DR-DATE / DR-DETT)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 150 ) ``` ## Beyond Mean Effects Standard causal inference methods (Double ML, TMLE) test whether treatment shifts the *mean* outcome. But many real treatment effects are **distributional** -- they change variance, shape, or modality without necessarily changing the mean. **DR-DATE** and **DR-DETT** (Fawkes, Hu, Evans & Sejdinovic, 2024) test for *any* distributional difference between Y(1) and Y(0), using doubly robust kernel embeddings. ## Key Concepts - **DR-DATE** (Distributional Average Treatment Effect): Tests whether P(Y(1)) = P(Y(0)) over the entire population. - **DR-DETT** (Distributional Effect on the Treated): Tests whether P(Y(1)|T=1) = P(Y(0)|T=1), focusing on the treated subgroup. Requires only one-sided overlap. - **Double robustness**: Consistent if either the propensity model or the outcome model is correctly specified. ## Example: Mean Shift (Detectable by All Methods) ```{r mean-shift} library(kernR) set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) logit_p <- 0.3 * x[, 1] - 0.2 * x[, 2] t <- rbinom(n, 1, plogis(logit_p)) y <- t * 1.0 + 0.5 * x[, 1] + rnorm(n, sd = 0.5) # Mean shift of 1.0 result <- dr_date_test(y, t, x, n_permutations = 200, seed = 1 ) result ``` ## Example: Variance Effect Only (Invisible to Mean-Based Tests) This is where DR-DATE shines. The treatment changes the *variance* of the outcome but not the mean -- DML and TMLE would have zero power here. ```{r variance-effect} set.seed(42) n <- 400 x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.3 * x[, 1])) # Treatment doubles the variance but does NOT shift the mean y <- (1 - t) * rnorm(n, sd = 1) + t * rnorm(n, sd = 2.5) + 0.5 * x[, 1] cat("Mean difference:", mean(y[t == 1]) - mean(y[t == 0]), "\n") cat("SD treated:", sd(y[t == 1]), " SD control:", sd(y[t == 0]), "\n") result_var <- dr_date_test(y, t, x, n_permutations = 200, outcome_model = "zero", seed = 1 ) result_var ``` ## DR-DETT: Effect on the Treated When overlap is imperfect (some covariate regions have nearly all treated or all control units), DR-DETT is more robust because it requires only one-sided overlap. ```{r dett} set.seed(42) n <- 300 x <- matrix(rnorm(n * 2), n, 2) t <- rbinom(n, 1, plogis(0.5 * x[, 1])) y <- t * rnorm(n, mean = 0.5, sd = 1.5) + (1 - t) * rnorm(n) + x[, 1] result_dett <- dr_dett_test(y, t, x, n_permutations = 200, seed = 1 ) result_dett ``` ## Comparing the Tests ```{r comparison} cat("DR-DATE p-value:", result_var$p_value, "\n") cat("DR-DETT p-value:", result_dett$p_value, "\n") ``` ## Using the Formula Interface ```{r formula} dat <- data.frame(y = y, treatment = t, x1 = x[, 1], x2 = x[, 2]) result_f <- kernel_causal_test( y ~ treatment | x1 + x2, data = dat, method = "dr-date", n_permutations = 100, seed = 1 ) result_f ``` ## When to Use Which Test | Test | Detects | Overlap Requirement | Best For | |------|---------|---------------------|----------| | **DR-DATE** | Any distributional difference | Both sides | Population-level effects | | **DR-DETT** | Distributional effect on treated | One-sided only | Imperfect overlap; policy questions about treated | | **DML/TMLE** | Mean shifts only | Both sides | When only mean effects matter | ## References - Fawkes, J., Hu, R., Evans, R. J., & Sejdinovic, D. (2024). Doubly robust kernel statistics for testing distributional treatment effects. *Transactions on Machine Learning Research*. ```{r session-info} sessionInfo() ```