---
title: "Roadmap"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Roadmap}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(width = 100)
```

# Roadmap

This page tracks features that are **deliberately deferred** from the
current release. Each item has a stable position in the long-term
design; future releases will land them behind explicit opt-in flags
and new exports. Version numbers are intentionally not committed: the
order in which deferred features land depends on user demand and the
surface area each one needs.

## What has already landed

- v0.2.0 — first public surface: `mask()`, `recipe`, `audit_mask()`,
  `apply_recipe()` / `unmask()` round-trip.
- v0.3.0 — `detect_design()` + `plot_design_summary()` +
  `propose_roles(detect = TRUE)` default.
- v0.4.0 — `synthesise_geospatial()` (preserves site-count and
  within-site clustering without publishing real coordinates).
- v0.4.1 — contract sharpening: fail-closed unknown-level handling in
  `apply_recipe()` / `unmask()`; integrity-fingerprint check on
  `apply_recipe()`; atomic numeric pass-through in `unmask()`;
  honest `exact_match_pct` denominator in `audit_mask()`;
  `synthesise_geospatial()` NA-mask source authority.

## Deferred — joint structure and shareability

### Design-conditional synthesis

The current Gaussian copula uses a single global Pearson covariance
over all numeric outcomes and covariates. For multi-environment
trials where the genotype-by-environment interaction dominates the
variance structure, this is the wrong unit of analysis.

**Planned API**:

```r
mask(df, roles,
     mode = "collaborate",
     condition_on = c("rep", "site:year"))
```

The copula would be fit *within* each stratum defined by
`condition_on`, preserving the conditional structure that
mixed-effects models will see. Compute cost scales with the number of
strata.

### Mixed-margin copula

The current package deliberately breaks the joint between categorical
covariates and numerics (categoricals are row-permuted independently
of the numeric copula). For datasets where the joint matters — e.g.,
soil type clustering yield outliers — a Gaussian copula with discrete
margins (Smith and Khaled 2012; ordinal probit links) is the natural
extension.

**Planned API**: `roles$kind == "ordinal"` plus a covariate-level
option to opt in.

### `draw_new_synthetic(rec, n)`

Today, regenerating synthetic data requires the original. The recipe
holds enough state to re-translate a pipeline but not to re-simulate.
A future release will persist simulator state under
`save_recipe(rec, path, include_simulator = TRUE)` (currently a
no-op) so:

```r
rec       <- read_recipe("path/to/rec.rds")
new_synth <- draw_new_synthetic(rec, n = 5000, seed = 99)
```

This is useful for cross-validation folds and bootstrap pipelines
that need many synthetic draws without revealing the original.

### Joint-treatment masking

`mask()` currently requires at most one treatment column. A future
release will accept multiple treatment columns and produce joint
aliases (factorial trials, treatment combinations) with an
order-of-magnitude larger alias namespace and a corresponding update
to `audit_mask()`'s leakage thresholds.

## Deferred — collaboration ergonomics

### Column-name aliasing in collaborate mode

The `column_name_map` slot in the recipe is currently `NULL`. A
future release will support `mask(..., alias_columns = TRUE)` so
that even column identities are hidden behind opaque names
(`x_001`, `x_002`, ...). `apply_recipe()` and `unmask()` already
invert column-name maps if present.

### Interactive role builder

`propose_roles()` is declarative — the user edits the returned
tibble. A thin interactive wrapper (using `cli` prompts for
ambiguous columns) would lower the barrier for one-off use. The
declarative core stays unchanged so scripts remain reproducible.

### `mask_csv()` convenience verb

A wrapper that reads a CSV, runs `propose_roles()` with sensible
defaults, surfaces the role tibble for the user to confirm, and
returns the masque object. Targets non-R-fluent data custodians.

## Out of scope, permanently

- **Differential-privacy guarantees.** A different package, different
  algorithms, different threat model. `masque` will not pretend.
- **Public-release safety claims.** Synthetic from `masque` is for
  controlled sharing only; "safe to publish" is never a `masque`
  claim.
- **Pipeline source-code rewriting.** Translation happens through
  the data via `apply_recipe()` and `unmask()`, not by mutating R or
  Python source code.

## References

- Smith, M. and Khaled, M. (2012). Estimation of copula models with
  discrete margins via Bayesian data augmentation. *Journal of the
  American Statistical Association* 107: 290-303.
- John, J. A. (1987). *Statistical Analysis of Experiments with
  Different Numbers of Replicates per Treatment.* CRC Press.

If you have a use case that does not fit cleanly in the current
release, open an issue with a small reproducible example.