Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:15:04 +08:00
commit ec0d1b5905
19 changed files with 5696 additions and 0 deletions

View File

@@ -0,0 +1,247 @@
# rlang Patterns for Data-Masking
## Core Concepts
**Data-masking** allows R expressions to refer to data frame columns as if they were variables in the environment. rlang provides the metaprogramming framework that powers tidyverse data-masking.
### Key rlang Tools
- **Embracing `{{}}`** - Forward function arguments to data-masking functions
- **Injection `!!`** - Inject single expressions or values
- **Splicing `!!!`** - Inject multiple arguments from a list
- **Dynamic dots** - Programmable `...` with injection support
- **Pronouns `.data`/`.env`** - Explicit disambiguation between data and environment variables
## Function Argument Patterns
### Forwarding with `{{}}`
Use `{{}}` to forward function arguments to data-masking functions:
```r
# Single argument forwarding
my_summarise <- function(data, var) {
data |> dplyr::summarise(mean = mean({{ var }}))
}
# Works with any data-masking expression
mtcars |> my_summarise(cyl)
mtcars |> my_summarise(cyl * am)
mtcars |> my_summarise(.data$cyl) # pronoun syntax supported
```
### Forwarding `...`
No special syntax needed for dots forwarding:
```r
# Simple dots forwarding
my_group_by <- function(.data, ...) {
.data |> dplyr::group_by(...)
}
# Works with tidy selections too
my_select <- function(.data, ...) {
.data |> dplyr::select(...)
}
# For single-argument tidy selections, wrap in c()
my_pivot_longer <- function(.data, ...) {
.data |> tidyr::pivot_longer(c(...))
}
```
### Names Patterns with `.data`
Use `.data` pronoun for programmatic column access:
```r
# Single column by name
my_mean <- function(data, var) {
data |> dplyr::summarise(mean = mean(.data[[var]]))
}
# Usage - completely insulated from data-masking
mtcars |> my_mean("cyl") # No ambiguity, works like regular function
# Multiple columns with all_of()
my_select_vars <- function(data, vars) {
data |> dplyr::select(all_of(vars))
}
mtcars |> my_select_vars(c("cyl", "am"))
```
## Injection Operators
### When to Use Each Operator
| Operator | Use Case | Example |
|----------|----------|---------|
| `{{ }}` | Forward function arguments | `summarise(mean = mean({{ var }}))` |
| `!!` | Inject single expression/value | `summarise(mean = mean(!!sym(var)))` |
| `!!!` | Inject multiple arguments | `group_by(!!!syms(vars))` |
| `.data[[]]` | Access columns by name | `mean(.data[[var]])` |
### Advanced Injection with `!!`
```r
# Create symbols from strings
var <- "cyl"
mtcars |> dplyr::summarise(mean = mean(!!sym(var)))
# Inject values to avoid name collisions
df <- data.frame(x = 1:3)
x <- 100
df |> dplyr::mutate(scaled = x / !!x) # Uses both data and env x
# Use data_sym() for tidyeval contexts (more robust)
mtcars |> dplyr::summarise(mean = mean(!!data_sym(var)))
```
### Splicing with `!!!`
```r
# Multiple symbols from character vector
vars <- c("cyl", "am")
mtcars |> dplyr::group_by(!!!syms(vars))
# Or use data_syms() for tidy contexts
mtcars |> dplyr::group_by(!!!data_syms(vars))
# Splice lists of arguments
args <- list(na.rm = TRUE, trim = 0.1)
mtcars |> dplyr::summarise(mean = mean(cyl, !!!args))
```
## Dynamic Dots Patterns
### Using `list2()` for Dynamic Dots Support
```r
my_function <- function(...) {
# Collect with list2() instead of list() for dynamic features
dots <- list2(...)
# Process dots...
}
# Enables these features:
my_function(a = 1, b = 2) # Normal usage
my_function(!!!list(a = 1, b = 2)) # Splice a list
my_function("{name}" := value) # Name injection
my_function(a = 1, ) # Trailing commas OK
```
### Name Injection with Glue Syntax
```r
# Basic name injection
name <- "result"
list2("{name}" := 1) # Creates list(result = 1)
# In function arguments with {{
my_mean <- function(data, var) {
data |> dplyr::summarise("mean_{{ var }}" := mean({{ var }}))
}
mtcars |> my_mean(cyl) # Creates column "mean_cyl"
mtcars |> my_mean(cyl * am) # Creates column "mean_cyl * am"
# Allow custom names with englue()
my_mean <- function(data, var, name = englue("mean_{{ var }}")) {
data |> dplyr::summarise("{name}" := mean({{ var }}))
}
# User can override default
mtcars |> my_mean(cyl, name = "cylinder_mean")
```
## Pronouns for Disambiguation
### `.data` and `.env` Best Practices
```r
# Explicit disambiguation prevents masking issues
cyl <- 1000 # Environment variable
mtcars |> dplyr::summarise(
data_cyl = mean(.data$cyl), # Data frame column
env_cyl = mean(.env$cyl), # Environment variable
ambiguous = mean(cyl) # Could be either (usually data wins)
)
# Use in loops and programmatic contexts
vars <- c("cyl", "am")
for (var in vars) {
result <- mtcars |> dplyr::summarise(mean = mean(.data[[var]]))
print(result)
}
```
## Programming Patterns
### Bridge Patterns
Converting between data-masking and tidy selection behaviors:
```r
# across() as selection-to-data-mask bridge
my_group_by <- function(data, vars) {
data |> dplyr::group_by(across({{ vars }}))
}
# Works with tidy selection
mtcars |> my_group_by(starts_with("c"))
# across(all_of()) as names-to-data-mask bridge
my_group_by <- function(data, vars) {
data |> dplyr::group_by(across(all_of(vars)))
}
mtcars |> my_group_by(c("cyl", "am"))
```
### Transformation Patterns
```r
# Transform single arguments by wrapping
my_mean <- function(data, var) {
data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE))
}
# Transform dots with across()
my_means <- function(data, ...) {
data |> dplyr::summarise(across(c(...), ~ mean(.x, na.rm = TRUE)))
}
# Manual transformation (advanced)
my_means_manual <- function(.data, ...) {
vars <- enquos(..., .named = TRUE)
vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE)))
.data |> dplyr::summarise(!!!vars)
}
```
## Common Patterns Summary
### When to Use What
**Use `{{}}` when:**
- Forwarding user-provided column references
- Building wrapper functions around dplyr/tidyr
- Need to support both bare names and expressions
**Use `.data[[]]` when:**
- Working with character vector column names
- Iterating over column names programmatically
- Need complete insulation from data-masking
**Use `!!` when:**
- Need to inject computed expressions
- Converting strings to symbols with `sym()`
- Avoiding variable name collisions
**Use `!!!` when:**
- Injecting multiple arguments from a list
- Working with variable numbers of columns
- Splicing named arguments