# rlang Patterns for Data-Masking ## Core Concepts **Data-masking** allows R expressions to refer to data frame columns as if they were variables in the environment. rlang provides the metaprogramming framework that powers tidyverse data-masking. ### Key rlang Tools - **Embracing `{{}}`** - Forward function arguments to data-masking functions - **Injection `!!`** - Inject single expressions or values - **Splicing `!!!`** - Inject multiple arguments from a list - **Dynamic dots** - Programmable `...` with injection support - **Pronouns `.data`/`.env`** - Explicit disambiguation between data and environment variables ## Function Argument Patterns ### Forwarding with `{{}}` Use `{{}}` to forward function arguments to data-masking functions: ```r # Single argument forwarding my_summarise <- function(data, var) { data |> dplyr::summarise(mean = mean({{ var }})) } # Works with any data-masking expression mtcars |> my_summarise(cyl) mtcars |> my_summarise(cyl * am) mtcars |> my_summarise(.data$cyl) # pronoun syntax supported ``` ### Forwarding `...` No special syntax needed for dots forwarding: ```r # Simple dots forwarding my_group_by <- function(.data, ...) { .data |> dplyr::group_by(...) } # Works with tidy selections too my_select <- function(.data, ...) { .data |> dplyr::select(...) } # For single-argument tidy selections, wrap in c() my_pivot_longer <- function(.data, ...) { .data |> tidyr::pivot_longer(c(...)) } ``` ### Names Patterns with `.data` Use `.data` pronoun for programmatic column access: ```r # Single column by name my_mean <- function(data, var) { data |> dplyr::summarise(mean = mean(.data[[var]])) } # Usage - completely insulated from data-masking mtcars |> my_mean("cyl") # No ambiguity, works like regular function # Multiple columns with all_of() my_select_vars <- function(data, vars) { data |> dplyr::select(all_of(vars)) } mtcars |> my_select_vars(c("cyl", "am")) ``` ## Injection Operators ### When to Use Each Operator | Operator | Use Case | Example | |----------|----------|---------| | `{{ }}` | Forward function arguments | `summarise(mean = mean({{ var }}))` | | `!!` | Inject single expression/value | `summarise(mean = mean(!!sym(var)))` | | `!!!` | Inject multiple arguments | `group_by(!!!syms(vars))` | | `.data[[]]` | Access columns by name | `mean(.data[[var]])` | ### Advanced Injection with `!!` ```r # Create symbols from strings var <- "cyl" mtcars |> dplyr::summarise(mean = mean(!!sym(var))) # Inject values to avoid name collisions df <- data.frame(x = 1:3) x <- 100 df |> dplyr::mutate(scaled = x / !!x) # Uses both data and env x # Use data_sym() for tidyeval contexts (more robust) mtcars |> dplyr::summarise(mean = mean(!!data_sym(var))) ``` ### Splicing with `!!!` ```r # Multiple symbols from character vector vars <- c("cyl", "am") mtcars |> dplyr::group_by(!!!syms(vars)) # Or use data_syms() for tidy contexts mtcars |> dplyr::group_by(!!!data_syms(vars)) # Splice lists of arguments args <- list(na.rm = TRUE, trim = 0.1) mtcars |> dplyr::summarise(mean = mean(cyl, !!!args)) ``` ## Dynamic Dots Patterns ### Using `list2()` for Dynamic Dots Support ```r my_function <- function(...) { # Collect with list2() instead of list() for dynamic features dots <- list2(...) # Process dots... } # Enables these features: my_function(a = 1, b = 2) # Normal usage my_function(!!!list(a = 1, b = 2)) # Splice a list my_function("{name}" := value) # Name injection my_function(a = 1, ) # Trailing commas OK ``` ### Name Injection with Glue Syntax ```r # Basic name injection name <- "result" list2("{name}" := 1) # Creates list(result = 1) # In function arguments with {{ my_mean <- function(data, var) { data |> dplyr::summarise("mean_{{ var }}" := mean({{ var }})) } mtcars |> my_mean(cyl) # Creates column "mean_cyl" mtcars |> my_mean(cyl * am) # Creates column "mean_cyl * am" # Allow custom names with englue() my_mean <- function(data, var, name = englue("mean_{{ var }}")) { data |> dplyr::summarise("{name}" := mean({{ var }})) } # User can override default mtcars |> my_mean(cyl, name = "cylinder_mean") ``` ## Pronouns for Disambiguation ### `.data` and `.env` Best Practices ```r # Explicit disambiguation prevents masking issues cyl <- 1000 # Environment variable mtcars |> dplyr::summarise( data_cyl = mean(.data$cyl), # Data frame column env_cyl = mean(.env$cyl), # Environment variable ambiguous = mean(cyl) # Could be either (usually data wins) ) # Use in loops and programmatic contexts vars <- c("cyl", "am") for (var in vars) { result <- mtcars |> dplyr::summarise(mean = mean(.data[[var]])) print(result) } ``` ## Programming Patterns ### Bridge Patterns Converting between data-masking and tidy selection behaviors: ```r # across() as selection-to-data-mask bridge my_group_by <- function(data, vars) { data |> dplyr::group_by(across({{ vars }})) } # Works with tidy selection mtcars |> my_group_by(starts_with("c")) # across(all_of()) as names-to-data-mask bridge my_group_by <- function(data, vars) { data |> dplyr::group_by(across(all_of(vars))) } mtcars |> my_group_by(c("cyl", "am")) ``` ### Transformation Patterns ```r # Transform single arguments by wrapping my_mean <- function(data, var) { data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE)) } # Transform dots with across() my_means <- function(data, ...) { data |> dplyr::summarise(across(c(...), ~ mean(.x, na.rm = TRUE))) } # Manual transformation (advanced) my_means_manual <- function(.data, ...) { vars <- enquos(..., .named = TRUE) vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE))) .data |> dplyr::summarise(!!!vars) } ``` ## Common Patterns Summary ### When to Use What **Use `{{}}` when:** - Forwarding user-provided column references - Building wrapper functions around dplyr/tidyr - Need to support both bare names and expressions **Use `.data[[]]` when:** - Working with character vector column names - Iterating over column names programmatically - Need complete insulation from data-masking **Use `!!` when:** - Need to inject computed expressions - Converting strings to symbols with `sym()` - Avoiding variable name collisions **Use `!!!` when:** - Injecting multiple arguments from a list - Working with variable numbers of columns - Splicing named arguments