248 lines
6.3 KiB
Markdown
248 lines
6.3 KiB
Markdown
# rlang Patterns for Data-Masking
|
|
|
|
## Core Concepts
|
|
|
|
**Data-masking** allows R expressions to refer to data frame columns as if they were variables in the environment. rlang provides the metaprogramming framework that powers tidyverse data-masking.
|
|
|
|
### Key rlang Tools
|
|
|
|
- **Embracing `{{}}`** - Forward function arguments to data-masking functions
|
|
- **Injection `!!`** - Inject single expressions or values
|
|
- **Splicing `!!!`** - Inject multiple arguments from a list
|
|
- **Dynamic dots** - Programmable `...` with injection support
|
|
- **Pronouns `.data`/`.env`** - Explicit disambiguation between data and environment variables
|
|
|
|
## Function Argument Patterns
|
|
|
|
### Forwarding with `{{}}`
|
|
|
|
Use `{{}}` to forward function arguments to data-masking functions:
|
|
|
|
```r
|
|
# Single argument forwarding
|
|
my_summarise <- function(data, var) {
|
|
data |> dplyr::summarise(mean = mean({{ var }}))
|
|
}
|
|
|
|
# Works with any data-masking expression
|
|
mtcars |> my_summarise(cyl)
|
|
mtcars |> my_summarise(cyl * am)
|
|
mtcars |> my_summarise(.data$cyl) # pronoun syntax supported
|
|
```
|
|
|
|
### Forwarding `...`
|
|
|
|
No special syntax needed for dots forwarding:
|
|
|
|
```r
|
|
# Simple dots forwarding
|
|
my_group_by <- function(.data, ...) {
|
|
.data |> dplyr::group_by(...)
|
|
}
|
|
|
|
# Works with tidy selections too
|
|
my_select <- function(.data, ...) {
|
|
.data |> dplyr::select(...)
|
|
}
|
|
|
|
# For single-argument tidy selections, wrap in c()
|
|
my_pivot_longer <- function(.data, ...) {
|
|
.data |> tidyr::pivot_longer(c(...))
|
|
}
|
|
```
|
|
|
|
### Names Patterns with `.data`
|
|
|
|
Use `.data` pronoun for programmatic column access:
|
|
|
|
```r
|
|
# Single column by name
|
|
my_mean <- function(data, var) {
|
|
data |> dplyr::summarise(mean = mean(.data[[var]]))
|
|
}
|
|
|
|
# Usage - completely insulated from data-masking
|
|
mtcars |> my_mean("cyl") # No ambiguity, works like regular function
|
|
|
|
# Multiple columns with all_of()
|
|
my_select_vars <- function(data, vars) {
|
|
data |> dplyr::select(all_of(vars))
|
|
}
|
|
|
|
mtcars |> my_select_vars(c("cyl", "am"))
|
|
```
|
|
|
|
## Injection Operators
|
|
|
|
### When to Use Each Operator
|
|
|
|
| Operator | Use Case | Example |
|
|
|----------|----------|---------|
|
|
| `{{ }}` | Forward function arguments | `summarise(mean = mean({{ var }}))` |
|
|
| `!!` | Inject single expression/value | `summarise(mean = mean(!!sym(var)))` |
|
|
| `!!!` | Inject multiple arguments | `group_by(!!!syms(vars))` |
|
|
| `.data[[]]` | Access columns by name | `mean(.data[[var]])` |
|
|
|
|
### Advanced Injection with `!!`
|
|
|
|
```r
|
|
# Create symbols from strings
|
|
var <- "cyl"
|
|
mtcars |> dplyr::summarise(mean = mean(!!sym(var)))
|
|
|
|
# Inject values to avoid name collisions
|
|
df <- data.frame(x = 1:3)
|
|
x <- 100
|
|
df |> dplyr::mutate(scaled = x / !!x) # Uses both data and env x
|
|
|
|
# Use data_sym() for tidyeval contexts (more robust)
|
|
mtcars |> dplyr::summarise(mean = mean(!!data_sym(var)))
|
|
```
|
|
|
|
### Splicing with `!!!`
|
|
|
|
```r
|
|
# Multiple symbols from character vector
|
|
vars <- c("cyl", "am")
|
|
mtcars |> dplyr::group_by(!!!syms(vars))
|
|
|
|
# Or use data_syms() for tidy contexts
|
|
mtcars |> dplyr::group_by(!!!data_syms(vars))
|
|
|
|
# Splice lists of arguments
|
|
args <- list(na.rm = TRUE, trim = 0.1)
|
|
mtcars |> dplyr::summarise(mean = mean(cyl, !!!args))
|
|
```
|
|
|
|
## Dynamic Dots Patterns
|
|
|
|
### Using `list2()` for Dynamic Dots Support
|
|
|
|
```r
|
|
my_function <- function(...) {
|
|
# Collect with list2() instead of list() for dynamic features
|
|
dots <- list2(...)
|
|
# Process dots...
|
|
}
|
|
|
|
# Enables these features:
|
|
my_function(a = 1, b = 2) # Normal usage
|
|
my_function(!!!list(a = 1, b = 2)) # Splice a list
|
|
my_function("{name}" := value) # Name injection
|
|
my_function(a = 1, ) # Trailing commas OK
|
|
```
|
|
|
|
### Name Injection with Glue Syntax
|
|
|
|
```r
|
|
# Basic name injection
|
|
name <- "result"
|
|
list2("{name}" := 1) # Creates list(result = 1)
|
|
|
|
# In function arguments with {{
|
|
my_mean <- function(data, var) {
|
|
data |> dplyr::summarise("mean_{{ var }}" := mean({{ var }}))
|
|
}
|
|
|
|
mtcars |> my_mean(cyl) # Creates column "mean_cyl"
|
|
mtcars |> my_mean(cyl * am) # Creates column "mean_cyl * am"
|
|
|
|
# Allow custom names with englue()
|
|
my_mean <- function(data, var, name = englue("mean_{{ var }}")) {
|
|
data |> dplyr::summarise("{name}" := mean({{ var }}))
|
|
}
|
|
|
|
# User can override default
|
|
mtcars |> my_mean(cyl, name = "cylinder_mean")
|
|
```
|
|
|
|
## Pronouns for Disambiguation
|
|
|
|
### `.data` and `.env` Best Practices
|
|
|
|
```r
|
|
# Explicit disambiguation prevents masking issues
|
|
cyl <- 1000 # Environment variable
|
|
|
|
mtcars |> dplyr::summarise(
|
|
data_cyl = mean(.data$cyl), # Data frame column
|
|
env_cyl = mean(.env$cyl), # Environment variable
|
|
ambiguous = mean(cyl) # Could be either (usually data wins)
|
|
)
|
|
|
|
# Use in loops and programmatic contexts
|
|
vars <- c("cyl", "am")
|
|
for (var in vars) {
|
|
result <- mtcars |> dplyr::summarise(mean = mean(.data[[var]]))
|
|
print(result)
|
|
}
|
|
```
|
|
|
|
## Programming Patterns
|
|
|
|
### Bridge Patterns
|
|
|
|
Converting between data-masking and tidy selection behaviors:
|
|
|
|
```r
|
|
# across() as selection-to-data-mask bridge
|
|
my_group_by <- function(data, vars) {
|
|
data |> dplyr::group_by(across({{ vars }}))
|
|
}
|
|
|
|
# Works with tidy selection
|
|
mtcars |> my_group_by(starts_with("c"))
|
|
|
|
# across(all_of()) as names-to-data-mask bridge
|
|
my_group_by <- function(data, vars) {
|
|
data |> dplyr::group_by(across(all_of(vars)))
|
|
}
|
|
|
|
mtcars |> my_group_by(c("cyl", "am"))
|
|
```
|
|
|
|
### Transformation Patterns
|
|
|
|
```r
|
|
# Transform single arguments by wrapping
|
|
my_mean <- function(data, var) {
|
|
data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE))
|
|
}
|
|
|
|
# Transform dots with across()
|
|
my_means <- function(data, ...) {
|
|
data |> dplyr::summarise(across(c(...), ~ mean(.x, na.rm = TRUE)))
|
|
}
|
|
|
|
# Manual transformation (advanced)
|
|
my_means_manual <- function(.data, ...) {
|
|
vars <- enquos(..., .named = TRUE)
|
|
vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE)))
|
|
.data |> dplyr::summarise(!!!vars)
|
|
}
|
|
```
|
|
|
|
## Common Patterns Summary
|
|
|
|
### When to Use What
|
|
|
|
**Use `{{}}` when:**
|
|
- Forwarding user-provided column references
|
|
- Building wrapper functions around dplyr/tidyr
|
|
- Need to support both bare names and expressions
|
|
|
|
**Use `.data[[]]` when:**
|
|
- Working with character vector column names
|
|
- Iterating over column names programmatically
|
|
- Need complete insulation from data-masking
|
|
|
|
**Use `!!` when:**
|
|
- Need to inject computed expressions
|
|
- Converting strings to symbols with `sym()`
|
|
- Avoiding variable name collisions
|
|
|
|
**Use `!!!` when:**
|
|
- Injecting multiple arguments from a list
|
|
- Working with variable numbers of columns
|
|
- Splicing named arguments
|