---
title: "Roadmap"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Roadmap}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(width = 100)
```
# Roadmap
This page tracks features that are **deliberately deferred** from the
current release. Each item has a stable position in the long-term
design; future releases will land them behind explicit opt-in flags
and new exports. Version numbers are intentionally not committed: the
order in which deferred features land depends on user demand and the
surface area each one needs.
## What has already landed
- v0.2.0 — first public surface: `mask()`, `recipe`, `audit_mask()`,
`apply_recipe()` / `unmask()` round-trip.
- v0.3.0 — `detect_design()` + `plot_design_summary()` +
`propose_roles(detect = TRUE)` default.
- v0.4.0 — `synthesise_geospatial()` (preserves site-count and
within-site clustering without publishing real coordinates).
- v0.4.1 — contract sharpening: fail-closed unknown-level handling in
`apply_recipe()` / `unmask()`; integrity-fingerprint check on
`apply_recipe()`; atomic numeric pass-through in `unmask()`;
honest `exact_match_pct` denominator in `audit_mask()`;
`synthesise_geospatial()` NA-mask source authority.
## Deferred — joint structure and shareability
### Design-conditional synthesis
The current Gaussian copula uses a single global Pearson covariance
over all numeric outcomes and covariates. For multi-environment
trials where the genotype-by-environment interaction dominates the
variance structure, this is the wrong unit of analysis.
**Planned API**:
```r
mask(df, roles,
mode = "collaborate",
condition_on = c("rep", "site:year"))
```
The copula would be fit *within* each stratum defined by
`condition_on`, preserving the conditional structure that
mixed-effects models will see. Compute cost scales with the number of
strata.
### Mixed-margin copula
The current package deliberately breaks the joint between categorical
covariates and numerics (categoricals are row-permuted independently
of the numeric copula). For datasets where the joint matters — e.g.,
soil type clustering yield outliers — a Gaussian copula with discrete
margins (Smith and Khaled 2012; ordinal probit links) is the natural
extension.
**Planned API**: `roles$kind == "ordinal"` plus a covariate-level
option to opt in.
### `draw_new_synthetic(rec, n)`
Today, regenerating synthetic data requires the original. The recipe
holds enough state to re-translate a pipeline but not to re-simulate.
A future release will persist simulator state under
`save_recipe(rec, path, include_simulator = TRUE)` (currently a
no-op) so:
```r
rec <- read_recipe("path/to/rec.rds")
new_synth <- draw_new_synthetic(rec, n = 5000, seed = 99)
```
This is useful for cross-validation folds and bootstrap pipelines
that need many synthetic draws without revealing the original.
### Joint-treatment masking
`mask()` currently requires at most one treatment column. A future
release will accept multiple treatment columns and produce joint
aliases (factorial trials, treatment combinations) with an
order-of-magnitude larger alias namespace and a corresponding update
to `audit_mask()`'s leakage thresholds.
## Deferred — collaboration ergonomics
### Column-name aliasing in collaborate mode
The `column_name_map` slot in the recipe is currently `NULL`. A
future release will support `mask(..., alias_columns = TRUE)` so
that even column identities are hidden behind opaque names
(`x_001`, `x_002`, ...). `apply_recipe()` and `unmask()` already
invert column-name maps if present.
### Interactive role builder
`propose_roles()` is declarative — the user edits the returned
tibble. A thin interactive wrapper (using `cli` prompts for
ambiguous columns) would lower the barrier for one-off use. The
declarative core stays unchanged so scripts remain reproducible.
### `mask_csv()` convenience verb
A wrapper that reads a CSV, runs `propose_roles()` with sensible
defaults, surfaces the role tibble for the user to confirm, and
returns the masque object. Targets non-R-fluent data custodians.
## Out of scope, permanently
- **Differential-privacy guarantees.** A different package, different
algorithms, different threat model. `masque` will not pretend.
- **Public-release safety claims.** Synthetic from `masque` is for
controlled sharing only; "safe to publish" is never a `masque`
claim.
- **Pipeline source-code rewriting.** Translation happens through
the data via `apply_recipe()` and `unmask()`, not by mutating R or
Python source code.
## References
- Smith, M. and Khaled, M. (2012). Estimation of copula models with
discrete margins via Bayesian data augmentation. *Journal of the
American Statistical Association* 107: 290-303.
- John, J. A. (1987). *Statistical Analysis of Experiments with
Different Numbers of Replicates per Treatment.* CRC Press.
If you have a use case that does not fit cleanly in the current
release, open an issue with a small reproducible example.