Initial commit
This commit is contained in:
214
skills/r-development/SKILL.md
Normal file
214
skills/r-development/SKILL.md
Normal file
@@ -0,0 +1,214 @@
|
||||
---
|
||||
name: r-development
|
||||
description: Modern R development practices emphasizing tidyverse patterns (dplyr 1.1 and later, native pipe, join_by, .by grouping), rlang metaprogramming, performance optimization, and package development. Use when Claude needs to write R code, create R packages, optimize R performance, or provide R programming guidance.
|
||||
---
|
||||
|
||||
# R Development
|
||||
|
||||
This skill provides comprehensive guidance for modern R development, emphasizing current best practices with tidyverse, performance optimization, and professional package development.
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Use modern tidyverse patterns** - Prioritize dplyr 1.1+ features, native pipe, and current APIs
|
||||
2. **Profile before optimizing** - Use profvis and bench to identify real bottlenecks
|
||||
3. **Write readable code first** - Optimize only when necessary and after profiling
|
||||
4. **Follow tidyverse style guide** - Consistent naming, spacing, and structure
|
||||
|
||||
## Modern Tidyverse Essentials
|
||||
|
||||
### Native Pipe (`|>` not `%>%`)
|
||||
|
||||
Always use native pipe `|>` instead of magrittr `%>%` (R 4.1+):
|
||||
|
||||
```r
|
||||
# Modern
|
||||
data |>
|
||||
filter(year >= 2020) |>
|
||||
summarise(mean_value = mean(value))
|
||||
|
||||
# Avoid legacy pipe
|
||||
data %>% filter(year >= 2020)
|
||||
```
|
||||
|
||||
### Join Syntax (dplyr 1.1+)
|
||||
|
||||
Use `join_by()` for all joins:
|
||||
|
||||
```r
|
||||
# Modern join syntax with equality
|
||||
transactions |>
|
||||
inner_join(companies, by = join_by(company == id))
|
||||
|
||||
# Inequality joins
|
||||
transactions |>
|
||||
inner_join(companies, join_by(company == id, year >= since))
|
||||
|
||||
# Rolling joins (closest match)
|
||||
transactions |>
|
||||
inner_join(companies, join_by(company == id, closest(year >= since)))
|
||||
```
|
||||
|
||||
Control match behavior:
|
||||
|
||||
```r
|
||||
# Expect 1:1 matches
|
||||
inner_join(x, y, by = join_by(id), multiple = "error")
|
||||
|
||||
# Ensure all rows match
|
||||
inner_join(x, y, by = join_by(id), unmatched = "error")
|
||||
```
|
||||
|
||||
### Per-Operation Grouping with `.by`
|
||||
|
||||
Use `.by` instead of `group_by() |> ... |> ungroup()`:
|
||||
|
||||
```r
|
||||
# Modern approach (always returns ungrouped)
|
||||
data |>
|
||||
summarise(mean_value = mean(value), .by = category)
|
||||
|
||||
# Multiple grouping variables
|
||||
data |>
|
||||
summarise(total = sum(revenue), .by = c(company, year))
|
||||
```
|
||||
|
||||
### Column Operations
|
||||
|
||||
Use modern column selection and transformation functions:
|
||||
|
||||
```r
|
||||
# pick() for column selection in data-masking contexts
|
||||
data |>
|
||||
summarise(
|
||||
n_x_cols = ncol(pick(starts_with("x"))),
|
||||
n_y_cols = ncol(pick(starts_with("y")))
|
||||
)
|
||||
|
||||
# across() for applying functions to multiple columns
|
||||
data |>
|
||||
summarise(across(where(is.numeric), mean, .names = "mean_{.col}"), .by = group)
|
||||
|
||||
# reframe() for multi-row results per group
|
||||
data |>
|
||||
reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group)
|
||||
```
|
||||
|
||||
## rlang Metaprogramming
|
||||
|
||||
For comprehensive rlang patterns, see [references/rlang-patterns.md](references/rlang-patterns.md).
|
||||
|
||||
### Quick Reference
|
||||
|
||||
- **`{{}}`** - Forward function arguments to data-masking functions
|
||||
- **`!!`** - Inject single expressions or values
|
||||
- **`!!!`** - Inject multiple arguments from a list
|
||||
- **`.data[[]]`** - Access columns by name (character vectors)
|
||||
- **`pick()`** - Select columns inside data-masking functions
|
||||
|
||||
Example function with embracing:
|
||||
|
||||
```r
|
||||
my_summary <- function(data, group_var, summary_var) {
|
||||
data |>
|
||||
summarise(mean_val = mean({{ summary_var }}), .by = {{ group_var }})
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
For detailed performance guidance, see [references/performance.md](references/performance.md).
|
||||
|
||||
### Key Strategies
|
||||
|
||||
1. **Profile first**: Use `profvis::profvis()` and `bench::mark()`
|
||||
2. **Vectorize operations**: Avoid loops when vectorized alternatives exist
|
||||
3. **Use dtplyr**: For large data operations (lazy evaluation with data.table backend)
|
||||
4. **Parallel processing**: Use `furrr::future_map()` for parallelizable work
|
||||
5. **Memory efficiency**: Pre-allocate, use appropriate data types
|
||||
|
||||
Quick example:
|
||||
|
||||
```r
|
||||
# Profile code
|
||||
profvis::profvis({
|
||||
result <- data |>
|
||||
complex_operation() |>
|
||||
another_operation()
|
||||
})
|
||||
|
||||
# Benchmark alternatives
|
||||
bench::mark(
|
||||
approach_1 = method1(data),
|
||||
approach_2 = method2(data),
|
||||
check = FALSE
|
||||
)
|
||||
```
|
||||
|
||||
## Package Development
|
||||
|
||||
For complete package development guidance, see [references/package-development.md](references/package-development.md).
|
||||
|
||||
### Quick Guidelines
|
||||
|
||||
**API Design:**
|
||||
- Use `.by` parameter for per-operation grouping
|
||||
- Use `{{}}` for column arguments
|
||||
- Return tibbles consistently
|
||||
- Validate user-facing function inputs thoroughly
|
||||
|
||||
**Dependencies:**
|
||||
- Add dependencies for significant functionality gains
|
||||
- Core tidyverse packages usually worth including: dplyr, purrr, stringr, tidyr
|
||||
- Minimize dependencies for widely-used packages
|
||||
|
||||
**Testing:**
|
||||
- Unit tests for individual functions
|
||||
- Integration tests for workflows
|
||||
- Test edge cases and error conditions
|
||||
|
||||
**Documentation:**
|
||||
- Document all exported functions
|
||||
- Provide usage examples
|
||||
- Explain non-obvious parameter interactions
|
||||
|
||||
## Common Migration Patterns
|
||||
|
||||
### Base R → Tidyverse
|
||||
|
||||
```r
|
||||
# Data manipulation
|
||||
subset(data, condition) → filter(data, condition)
|
||||
data[order(data$x), ] → arrange(data, x)
|
||||
aggregate(x ~ y, data, mean) → summarise(data, mean(x), .by = y)
|
||||
|
||||
# Functional programming
|
||||
sapply(x, f) → map(x, f) # type-stable
|
||||
lapply(x, f) → map(x, f)
|
||||
|
||||
# Strings
|
||||
grepl("pattern", text) → str_detect(text, "pattern")
|
||||
gsub("old", "new", text) → str_replace_all(text, "old", "new")
|
||||
```
|
||||
|
||||
### Old → New Tidyverse
|
||||
|
||||
```r
|
||||
# Pipes
|
||||
%>% → |>
|
||||
|
||||
# Grouping
|
||||
group_by() |> ... |> ungroup() → summarise(..., .by = x)
|
||||
|
||||
# Joins
|
||||
by = c("a" = "b") → by = join_by(a == b)
|
||||
|
||||
# Reshaping
|
||||
gather()/spread() → pivot_longer()/pivot_wider()
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **rlang patterns**: See [references/rlang-patterns.md](references/rlang-patterns.md) for comprehensive data-masking and metaprogramming guidance
|
||||
- **Performance optimization**: See [references/performance.md](references/performance.md) for profiling, benchmarking, and optimization strategies
|
||||
- **Package development**: See [references/package-development.md](references/package-development.md) for complete package creation guidance
|
||||
- **Object systems**: See [references/object-systems.md](references/object-systems.md) for S3, S4, S7, R6, and vctrs guidance
|
||||
310
skills/r-development/references/object-systems.md
Normal file
310
skills/r-development/references/object-systems.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# Object-Oriented Programming in R
|
||||
|
||||
## S7: Modern OOP for New Projects
|
||||
|
||||
S7 combines S3 simplicity with S4 structure:
|
||||
- Formal class definitions with automatic validation
|
||||
- Compatible with existing S3 code
|
||||
- Better error messages and discoverability
|
||||
|
||||
```r
|
||||
# S7 class definition
|
||||
Range <- new_class("Range",
|
||||
properties = list(
|
||||
start = class_double,
|
||||
end = class_double
|
||||
),
|
||||
validator = function(self) {
|
||||
if (self@end < self@start) {
|
||||
"@end must be >= @start"
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# Usage - constructor and property access
|
||||
x <- Range(start = 1, end = 10)
|
||||
x@start # 1
|
||||
x@end <- 20 # automatic validation
|
||||
|
||||
# Methods
|
||||
inside <- new_generic("inside", "x")
|
||||
method(inside, Range) <- function(x, y) {
|
||||
y >= x@start & y <= x@end
|
||||
}
|
||||
```
|
||||
|
||||
## OOP System Decision Matrix
|
||||
|
||||
### Decision Tree: What Are You Building?
|
||||
|
||||
#### 1. Vector-like Objects
|
||||
|
||||
**Use vctrs when:**
|
||||
- ✓ Need data frame integration (columns/rows)
|
||||
- ✓ Want type-stable vector operations
|
||||
- ✓ Building factor-like, date-like, or numeric-like classes
|
||||
- ✓ Need consistent coercion/casting behavior
|
||||
- ✓ Working with existing tidyverse infrastructure
|
||||
|
||||
**Examples:** custom date classes, units, categorical data
|
||||
|
||||
```r
|
||||
# Vector-like behavior in data frames
|
||||
percent <- new_vctr(0.5, class = "percentage")
|
||||
data.frame(x = 1:3, pct = percent(c(0.1, 0.2, 0.3))) # works seamlessly
|
||||
|
||||
# Type-stable operations
|
||||
vec_c(percent(0.1), percent(0.2)) # predictable behavior
|
||||
vec_cast(0.5, percent()) # explicit, safe casting
|
||||
```
|
||||
|
||||
#### 2. General Objects (Complex Data Structures)
|
||||
|
||||
**Use S7 when:**
|
||||
- ✓ NEW projects that need formal classes
|
||||
- ✓ Want property validation and safe property access (@)
|
||||
- ✓ Need multiple dispatch (beyond S3's double dispatch)
|
||||
- ✓ Converting from S3 and want better structure
|
||||
- ✓ Building class hierarchies with inheritance
|
||||
- ✓ Want better error messages and discoverability
|
||||
|
||||
```r
|
||||
# Complex validation needs
|
||||
Range <- new_class("Range",
|
||||
properties = list(start = class_double, end = class_double),
|
||||
validator = function(self) {
|
||||
if (self@end < self@start) "@end must be >= @start"
|
||||
}
|
||||
)
|
||||
|
||||
# Multiple dispatch needs
|
||||
method(generic, list(ClassA, ClassB)) <- function(x, y) ...
|
||||
|
||||
# Class hierarchies with clear inheritance
|
||||
Child <- new_class("Child", parent = Parent)
|
||||
```
|
||||
|
||||
**Use S3 when:**
|
||||
- ✓ Simple classes with minimal structure needs
|
||||
- ✓ Maximum compatibility and minimal dependencies
|
||||
- ✓ Quick prototyping or internal classes
|
||||
- ✓ Contributing to existing S3-based ecosystems
|
||||
- ✓ Performance is absolutely critical (minimal overhead)
|
||||
|
||||
```r
|
||||
# Simple classes without complex needs
|
||||
new_simple <- function(x) structure(x, class = "simple")
|
||||
print.simple <- function(x, ...) cat("Simple:", x)
|
||||
```
|
||||
|
||||
**Use S4 when:**
|
||||
- ✓ Working in Bioconductor ecosystem
|
||||
- ✓ Need complex multiple inheritance (S7 doesn't support this)
|
||||
- ✓ Existing S4 codebase that works well
|
||||
|
||||
**Use R6 when:**
|
||||
- ✓ Need reference semantics (mutable objects)
|
||||
- ✓ Building stateful objects
|
||||
- ✓ Coming from OOP languages like Python/Java
|
||||
- ✓ Need encapsulation and private methods
|
||||
|
||||
## Detailed S7 vs S3 Comparison
|
||||
|
||||
| Feature | S3 | S7 | When S7 wins |
|
||||
|---------|----|----|---------------|
|
||||
| **Class definition** | Informal (convention) | Formal (`new_class()`) | Need guaranteed structure |
|
||||
| **Property access** | `$` or `attr()` (unsafe) | `@` (safe, validated) | Property validation matters |
|
||||
| **Validation** | Manual, inconsistent | Built-in validators | Data integrity important |
|
||||
| **Method discovery** | Hard to find methods | Clear method printing | Developer experience matters |
|
||||
| **Multiple dispatch** | Limited (base generics) | Full multiple dispatch | Complex method dispatch needed |
|
||||
| **Inheritance** | Informal, `NextMethod()` | Explicit `super()` | Predictable inheritance needed |
|
||||
| **Migration cost** | - | Low (1-2 hours) | Want better structure |
|
||||
| **Performance** | Fastest | ~Same as S3 | Performance difference negligible |
|
||||
| **Compatibility** | Full S3 | Full S3 + S7 | Need both old and new patterns |
|
||||
|
||||
## vctrs for Vector Classes
|
||||
|
||||
### Basic Vector Class
|
||||
|
||||
```r
|
||||
# Constructor (low-level)
|
||||
new_percent <- function(x = double()) {
|
||||
vec_assert(x, double())
|
||||
new_vctr(x, class = "pkg_percent")
|
||||
}
|
||||
|
||||
# Helper (user-facing)
|
||||
percent <- function(x = double()) {
|
||||
x <- vec_cast(x, double())
|
||||
new_percent(x)
|
||||
}
|
||||
|
||||
# Format method
|
||||
format.pkg_percent <- function(x, ...) {
|
||||
paste0(vec_data(x) * 100, "%")
|
||||
}
|
||||
```
|
||||
|
||||
### Coercion Methods
|
||||
|
||||
```r
|
||||
# Self-coercion
|
||||
vec_ptype2.pkg_percent.pkg_percent <- function(x, y, ...) {
|
||||
new_percent()
|
||||
}
|
||||
|
||||
# With double
|
||||
vec_ptype2.pkg_percent.double <- function(x, y, ...) double()
|
||||
vec_ptype2.double.pkg_percent <- function(x, y, ...) double()
|
||||
|
||||
# Casting
|
||||
vec_cast.pkg_percent.double <- function(x, to, ...) {
|
||||
new_percent(x)
|
||||
}
|
||||
vec_cast.double.pkg_percent <- function(x, to, ...) {
|
||||
vec_data(x)
|
||||
}
|
||||
```
|
||||
|
||||
## S3 Basics
|
||||
|
||||
### Creating S3 Classes
|
||||
|
||||
```r
|
||||
# Constructor
|
||||
new_myclass <- function(x, y) {
|
||||
structure(
|
||||
list(x = x, y = y),
|
||||
class = "myclass"
|
||||
)
|
||||
}
|
||||
|
||||
# Methods
|
||||
print.myclass <- function(x, ...) {
|
||||
cat("myclass object\n")
|
||||
cat("x:", x$x, "\n")
|
||||
cat("y:", x$y, "\n")
|
||||
}
|
||||
|
||||
summary.myclass <- function(object, ...) {
|
||||
list(x = object$x, y = object$y)
|
||||
}
|
||||
```
|
||||
|
||||
### Generic Functions
|
||||
|
||||
```r
|
||||
# Create generic
|
||||
my_generic <- function(x, ...) {
|
||||
UseMethod("my_generic")
|
||||
}
|
||||
|
||||
# Default method
|
||||
my_generic.default <- function(x, ...) {
|
||||
stop("No method for class ", class(x))
|
||||
}
|
||||
|
||||
# Specific method
|
||||
my_generic.myclass <- function(x, ...) {
|
||||
# Implementation
|
||||
}
|
||||
```
|
||||
|
||||
## R6 Classes
|
||||
|
||||
### Basic R6 Class
|
||||
|
||||
```r
|
||||
library(R6)
|
||||
|
||||
MyClass <- R6Class("MyClass",
|
||||
public = list(
|
||||
x = NULL,
|
||||
y = NULL,
|
||||
|
||||
initialize = function(x, y) {
|
||||
self$x <- x
|
||||
self$y <- y
|
||||
},
|
||||
|
||||
add = function() {
|
||||
self$x + self$y
|
||||
}
|
||||
),
|
||||
|
||||
private = list(
|
||||
internal_value = NULL
|
||||
)
|
||||
)
|
||||
|
||||
# Usage
|
||||
obj <- MyClass$new(1, 2)
|
||||
obj$add() # 3
|
||||
```
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### S3 → S7
|
||||
|
||||
Usually 1-2 hours work, keeps full compatibility:
|
||||
|
||||
```r
|
||||
# S3 version
|
||||
new_range <- function(start, end) {
|
||||
structure(
|
||||
list(start = start, end = end),
|
||||
class = "range"
|
||||
)
|
||||
}
|
||||
|
||||
# S7 version
|
||||
Range <- new_class("Range",
|
||||
properties = list(
|
||||
start = class_double,
|
||||
end = class_double
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### S4 → S7
|
||||
|
||||
More complex, evaluate if S4 features are actually needed.
|
||||
|
||||
### Base R → vctrs
|
||||
|
||||
For vector-like classes, significant benefits in type stability and data frame integration.
|
||||
|
||||
### Combining Approaches
|
||||
|
||||
S7 classes can use vctrs principles internally for vector-like properties.
|
||||
|
||||
## When to Use Each System
|
||||
|
||||
### Use S7 for:
|
||||
- New projects needing formal OOP
|
||||
- Class validation and type safety
|
||||
- Multiple dispatch
|
||||
- Better developer experience
|
||||
|
||||
### Use vctrs for:
|
||||
- Vector-like classes
|
||||
- Data frame columns
|
||||
- Type-stable operations
|
||||
- Tidyverse integration
|
||||
|
||||
### Use S3 for:
|
||||
- Simple classes
|
||||
- Maximum compatibility
|
||||
- Existing S3 ecosystems
|
||||
- Quick prototypes
|
||||
|
||||
### Use S4 for:
|
||||
- Bioconductor packages
|
||||
- Complex multiple inheritance
|
||||
- Existing S4 codebases
|
||||
|
||||
### Use R6 for:
|
||||
- Mutable state
|
||||
- Reference semantics
|
||||
- Encapsulation needs
|
||||
- Coming from OOP languages
|
||||
393
skills/r-development/references/package-development.md
Normal file
393
skills/r-development/references/package-development.md
Normal file
@@ -0,0 +1,393 @@
|
||||
# Package Development
|
||||
|
||||
## Dependency Strategy
|
||||
|
||||
### When to Add Dependencies vs Base R
|
||||
|
||||
```r
|
||||
# Add dependency when:
|
||||
✓ Significant functionality gain
|
||||
✓ Maintenance burden reduction
|
||||
✓ User experience improvement
|
||||
✓ Complex implementation (regex, dates, web)
|
||||
|
||||
# Use base R when:
|
||||
✓ Simple utility functions
|
||||
✓ Package will be widely used (minimize deps)
|
||||
✓ Dependency is large for small benefit
|
||||
✓ Base R solution is straightforward
|
||||
|
||||
# Example decisions:
|
||||
str_detect(x, "pattern") # Worth stringr dependency
|
||||
length(x) > 0 # Don't need purrr for this
|
||||
parse_dates(x) # Worth lubridate dependency
|
||||
x + 1 # Don't need dplyr for this
|
||||
```
|
||||
|
||||
### Tidyverse Dependency Guidelines
|
||||
|
||||
```r
|
||||
# Core tidyverse (usually worth it):
|
||||
dplyr # Complex data manipulation
|
||||
purrr # Functional programming, parallel
|
||||
stringr # String manipulation
|
||||
tidyr # Data reshaping
|
||||
|
||||
# Specialized tidyverse (evaluate carefully):
|
||||
lubridate # If heavy date manipulation
|
||||
forcats # If many categorical operations
|
||||
readr # If specific file reading needs
|
||||
ggplot2 # If package creates visualizations
|
||||
|
||||
# Heavy dependencies (use sparingly):
|
||||
tidyverse # Meta-package, very heavy
|
||||
shiny # Only for interactive apps
|
||||
```
|
||||
|
||||
## API Design Patterns
|
||||
|
||||
### Function Design Strategy
|
||||
|
||||
```r
|
||||
# Modern tidyverse API patterns
|
||||
|
||||
# 1. Use .by for per-operation grouping
|
||||
my_summarise <- function(.data, ..., .by = NULL) {
|
||||
# Support modern grouped operations
|
||||
}
|
||||
|
||||
# 2. Use {{ }} for user-provided columns
|
||||
my_select <- function(.data, cols) {
|
||||
.data |> select({{ cols }})
|
||||
}
|
||||
|
||||
# 3. Use ... for flexible arguments
|
||||
my_mutate <- function(.data, ..., .by = NULL) {
|
||||
.data |> mutate(..., .by = {{ .by }})
|
||||
}
|
||||
|
||||
# 4. Return consistent types (tibbles, not data.frames)
|
||||
my_function <- function(.data) {
|
||||
result |> tibble::as_tibble()
|
||||
}
|
||||
```
|
||||
|
||||
### Input Validation Strategy
|
||||
|
||||
```r
|
||||
# Validation level by function type:
|
||||
|
||||
# User-facing functions - comprehensive validation
|
||||
user_function <- function(x, threshold = 0.5) {
|
||||
# Check all inputs thoroughly
|
||||
if (!is.numeric(x)) stop("x must be numeric")
|
||||
if (!is.numeric(threshold) || length(threshold) != 1) {
|
||||
stop("threshold must be a single number")
|
||||
}
|
||||
# ... function body
|
||||
}
|
||||
|
||||
# Internal functions - minimal validation
|
||||
.internal_function <- function(x, threshold) {
|
||||
# Assume inputs are valid (document assumptions)
|
||||
# Only check critical invariants
|
||||
# ... function body
|
||||
}
|
||||
|
||||
# Package functions with vctrs - type-stable validation
|
||||
safe_function <- function(x, y) {
|
||||
x <- vec_cast(x, double())
|
||||
y <- vec_cast(y, double())
|
||||
# Automatic type checking and coercion
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling Patterns
|
||||
|
||||
```r
|
||||
# Good error messages - specific and actionable
|
||||
if (length(x) == 0) {
|
||||
cli::cli_abort(
|
||||
"Input {.arg x} cannot be empty.",
|
||||
"i" = "Provide a non-empty vector."
|
||||
)
|
||||
}
|
||||
|
||||
# Include function name in errors
|
||||
validate_input <- function(x, call = caller_env()) {
|
||||
if (!is.numeric(x)) {
|
||||
cli::cli_abort("Input must be numeric", call = call)
|
||||
}
|
||||
}
|
||||
|
||||
# Use consistent error styling
|
||||
# cli package for user-friendly messages
|
||||
# rlang for developer tools
|
||||
```
|
||||
|
||||
## When to Create Internal vs Exported Functions
|
||||
|
||||
### Export Function When:
|
||||
|
||||
```r
|
||||
✓ Users will call it directly
|
||||
✓ Other packages might want to extend it
|
||||
✓ Part of the core package functionality
|
||||
✓ Stable API that won't change often
|
||||
|
||||
# Example: main data processing functions
|
||||
export_these <- function(.data, ...) {
|
||||
# Comprehensive input validation
|
||||
# Full documentation required
|
||||
# Stable API contract
|
||||
}
|
||||
```
|
||||
|
||||
### Keep Function Internal When:
|
||||
|
||||
```r
|
||||
✓ Implementation detail that may change
|
||||
✓ Only used within package
|
||||
✓ Complex implementation helpers
|
||||
✓ Would clutter user-facing API
|
||||
|
||||
# Example: helper functions
|
||||
.internal_helper <- function(x, y) {
|
||||
# Minimal documentation
|
||||
# Can change without breaking users
|
||||
# Assume inputs are pre-validated
|
||||
}
|
||||
```
|
||||
|
||||
## Testing and Documentation Strategy
|
||||
|
||||
### Testing Levels
|
||||
|
||||
```r
|
||||
# Unit tests - individual functions
|
||||
test_that("function handles edge cases", {
|
||||
expect_equal(my_func(c()), expected_empty_result)
|
||||
expect_error(my_func(NULL), class = "my_error_class")
|
||||
})
|
||||
|
||||
# Integration tests - workflow combinations
|
||||
test_that("pipeline works end-to-end", {
|
||||
result <- data |>
|
||||
step1() |>
|
||||
step2() |>
|
||||
step3()
|
||||
expect_s3_class(result, "expected_class")
|
||||
})
|
||||
|
||||
# Property-based tests for package functions
|
||||
test_that("function properties hold", {
|
||||
# Test invariants across many inputs
|
||||
})
|
||||
```
|
||||
|
||||
### Testing rlang Functions
|
||||
|
||||
```r
|
||||
# Test data-masking behavior
|
||||
test_that("function supports data masking", {
|
||||
result <- my_function(mtcars, cyl)
|
||||
expect_equal(names(result), "mean_cyl")
|
||||
|
||||
# Test with expressions
|
||||
result2 <- my_function(mtcars, cyl * 2)
|
||||
expect_true("mean_cyl * 2" %in% names(result2))
|
||||
})
|
||||
|
||||
# Test injection behavior
|
||||
test_that("function supports injection", {
|
||||
var <- "cyl"
|
||||
result <- my_function(mtcars, !!sym(var))
|
||||
expect_true(nrow(result) > 0)
|
||||
})
|
||||
```
|
||||
|
||||
### Documentation Priorities
|
||||
|
||||
```r
|
||||
# Must document:
|
||||
✓ All exported functions
|
||||
✓ Complex algorithms or formulas
|
||||
✓ Non-obvious parameter interactions
|
||||
✓ Examples of typical usage
|
||||
|
||||
# Can skip documentation:
|
||||
✗ Simple internal helpers
|
||||
✗ Obvious parameter meanings
|
||||
✗ Functions that just call other functions
|
||||
```
|
||||
|
||||
### Documentation Tags for rlang
|
||||
|
||||
```r
|
||||
#' @param var <[`data-masked`][dplyr::dplyr_data_masking]> Column to summarize
|
||||
#' @param ... <[`dynamic-dots`][rlang::dyn-dots]> Additional grouping variables
|
||||
#' @param cols <[`tidy-select`][dplyr::dplyr_tidy_select]> Columns to select
|
||||
```
|
||||
|
||||
## Package Structure
|
||||
|
||||
### DESCRIPTION File
|
||||
|
||||
```r
|
||||
Package: mypackage
|
||||
Title: What the Package Does (One Line, Title Case)
|
||||
Version: 0.1.0
|
||||
Authors@R: person("First", "Last", email = "email@example.com", role = c("aut", "cre"))
|
||||
Description: What the package does (one paragraph).
|
||||
License: MIT + file LICENSE
|
||||
Encoding: UTF-8
|
||||
Roxygen: list(markdown = TRUE)
|
||||
RoxygenNote: 7.2.3
|
||||
Imports:
|
||||
dplyr (>= 1.1.0),
|
||||
rlang (>= 1.1.0),
|
||||
cli
|
||||
Suggests:
|
||||
testthat (>= 3.0.0)
|
||||
Config/testthat/edition: 3
|
||||
```
|
||||
|
||||
### NAMESPACE Management
|
||||
|
||||
Use roxygen2 for NAMESPACE management:
|
||||
|
||||
```r
|
||||
# Import specific functions
|
||||
#' @importFrom rlang := enquo enquos
|
||||
#' @importFrom dplyr mutate filter
|
||||
|
||||
# Or import entire packages (use sparingly)
|
||||
#' @import dplyr
|
||||
```
|
||||
|
||||
### rlang Import Strategy
|
||||
|
||||
```r
|
||||
# In DESCRIPTION:
|
||||
Imports: rlang
|
||||
|
||||
# In NAMESPACE, import specific functions:
|
||||
importFrom(rlang, enquo, enquos, expr, !!!, :=)
|
||||
|
||||
# Or import key functions:
|
||||
#' @importFrom rlang := enquo enquos
|
||||
```
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
```r
|
||||
# Good naming: snake_case for variables/functions
|
||||
calculate_mean_score <- function(data, score_col) {
|
||||
# Function body
|
||||
}
|
||||
|
||||
# Prefix non-standard arguments with .
|
||||
my_function <- function(.data, ...) {
|
||||
# Reduces argument conflicts
|
||||
}
|
||||
|
||||
# Internal functions start with .
|
||||
.internal_helper <- function(x, y) {
|
||||
# Not exported
|
||||
}
|
||||
```
|
||||
|
||||
## Style Guide Essentials
|
||||
|
||||
### Object Names
|
||||
|
||||
- Use snake_case for all names
|
||||
- Variable names = nouns, function names = verbs
|
||||
- Avoid dots except for S3 methods
|
||||
|
||||
```r
|
||||
# Good
|
||||
day_one
|
||||
calculate_mean
|
||||
user_data
|
||||
|
||||
# Avoid
|
||||
DayOne
|
||||
calculate.mean
|
||||
userData
|
||||
```
|
||||
|
||||
### Spacing and Layout
|
||||
|
||||
```r
|
||||
# Good spacing
|
||||
x[, 1]
|
||||
mean(x, na.rm = TRUE)
|
||||
if (condition) {
|
||||
action()
|
||||
}
|
||||
|
||||
# Pipe formatting
|
||||
data |>
|
||||
filter(year >= 2020) |>
|
||||
group_by(category) |>
|
||||
summarise(
|
||||
mean_value = mean(value),
|
||||
count = n()
|
||||
)
|
||||
```
|
||||
|
||||
## Package Development Workflow
|
||||
|
||||
1. **Setup**: Use `usethis::create_package()`
|
||||
2. **Add functions**: Place in `R/` directory
|
||||
3. **Document**: Use roxygen2 comments
|
||||
4. **Test**: Write tests in `tests/testthat/`
|
||||
5. **Check**: Run `devtools::check()`
|
||||
6. **Build**: Use `devtools::build()`
|
||||
7. **Install**: Use `devtools::install()`
|
||||
|
||||
### Key usethis Functions
|
||||
|
||||
```r
|
||||
# Initial setup
|
||||
usethis::create_package("mypackage")
|
||||
usethis::use_git()
|
||||
usethis::use_mit_license()
|
||||
|
||||
# Add dependencies
|
||||
usethis::use_package("dplyr")
|
||||
usethis::use_package("testthat", "Suggests")
|
||||
|
||||
# Add infrastructure
|
||||
usethis::use_readme_md()
|
||||
usethis::use_news_md()
|
||||
usethis::use_testthat()
|
||||
|
||||
# Add files
|
||||
usethis::use_r("my_function")
|
||||
usethis::use_test("my_function")
|
||||
usethis::use_vignette("introduction")
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### What to Avoid
|
||||
|
||||
```r
|
||||
# Don't use library() in packages
|
||||
# Use Imports in DESCRIPTION instead
|
||||
|
||||
# Don't use source()
|
||||
# Use proper function dependencies
|
||||
|
||||
# Don't use attach()
|
||||
# Always use explicit :: notation
|
||||
|
||||
# Don't modify global options without restoring
|
||||
old <- options(stringsAsFactors = FALSE)
|
||||
on.exit(options(old), add = TRUE)
|
||||
|
||||
# Don't use setwd()
|
||||
# Use here::here() or relative paths
|
||||
```
|
||||
311
skills/r-development/references/performance.md
Normal file
311
skills/r-development/references/performance.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# Performance Optimization
|
||||
|
||||
## Performance Tool Selection Guide
|
||||
|
||||
### Profiling Tools Decision Matrix
|
||||
|
||||
| Tool | Use When | Don't Use When | What It Shows |
|
||||
|------|----------|----------------|---------------|
|
||||
| **`profvis`** | Complex code, unknown bottlenecks | Simple functions, known issues | Time per line, call stack |
|
||||
| **`bench::mark()`** | Comparing alternatives | Single approach | Relative performance, memory |
|
||||
| **`system.time()`** | Quick checks | Detailed analysis | Total runtime only |
|
||||
| **`Rprof()`** | Base R only environments | When profvis available | Raw profiling data |
|
||||
|
||||
### Step-by-Step Performance Workflow
|
||||
|
||||
```r
|
||||
# 1. Profile first - find the actual bottlenecks
|
||||
library(profvis)
|
||||
profvis({
|
||||
# Your slow code here
|
||||
})
|
||||
|
||||
# 2. Focus on the slowest parts (80/20 rule)
|
||||
# Don't optimize until you know where time is spent
|
||||
|
||||
# 3. Benchmark alternatives for hot spots
|
||||
library(bench)
|
||||
bench::mark(
|
||||
current = current_approach(data),
|
||||
vectorized = vectorized_approach(data),
|
||||
parallel = map(data, in_parallel(func))
|
||||
)
|
||||
|
||||
# 4. Consider tool trade-offs based on bottleneck type
|
||||
```
|
||||
|
||||
## When Each Tool Helps vs Hurts
|
||||
|
||||
### Parallel Processing (`in_parallel()`)
|
||||
|
||||
```r
|
||||
# Helps when:
|
||||
✓ CPU-intensive computations
|
||||
✓ Embarrassingly parallel problems
|
||||
✓ Large datasets with independent operations
|
||||
✓ I/O bound operations (file reading, API calls)
|
||||
|
||||
# Hurts when:
|
||||
✗ Simple, fast operations (overhead > benefit)
|
||||
✗ Memory-intensive operations (may cause thrashing)
|
||||
✗ Operations requiring shared state
|
||||
✗ Small datasets
|
||||
|
||||
# Example decision point:
|
||||
expensive_func <- function(x) Sys.sleep(0.1) # 100ms per call
|
||||
fast_func <- function(x) x^2 # microseconds per call
|
||||
|
||||
# Good for parallel
|
||||
map(1:100, in_parallel(expensive_func)) # ~10s -> ~2.5s on 4 cores
|
||||
|
||||
# Bad for parallel (overhead > benefit)
|
||||
map(1:100, in_parallel(fast_func)) # 100μs -> 50ms (500x slower!)
|
||||
```
|
||||
|
||||
### vctrs Backend Tools
|
||||
|
||||
```r
|
||||
# Use vctrs when:
|
||||
✓ Type safety matters more than raw speed
|
||||
✓ Building reusable package functions
|
||||
✓ Complex coercion/combination logic
|
||||
✓ Consistent behavior across edge cases
|
||||
|
||||
# Avoid vctrs when:
|
||||
✗ One-off scripts where speed matters most
|
||||
✗ Simple operations where base R is sufficient
|
||||
✗ Memory is extremely constrained
|
||||
|
||||
# Decision point:
|
||||
simple_combine <- function(x, y) c(x, y) # Fast, simple
|
||||
robust_combine <- function(x, y) vec_c(x, y) # Safer, slight overhead
|
||||
|
||||
# Use simple for hot loops, robust for package APIs
|
||||
```
|
||||
|
||||
### Data Backend Selection
|
||||
|
||||
```r
|
||||
# Use data.table when:
|
||||
✓ Very large datasets (>1GB)
|
||||
✓ Complex grouping operations
|
||||
✓ Reference semantics desired
|
||||
✓ Maximum performance critical
|
||||
|
||||
# Use dplyr when:
|
||||
✓ Readability and maintainability priority
|
||||
✓ Complex joins and window functions
|
||||
✓ Team familiarity with tidyverse
|
||||
✓ Moderate sized data (<100MB)
|
||||
|
||||
# Use dtplyr (dplyr with data.table backend) when:
|
||||
✓ Want dplyr syntax with data.table performance
|
||||
✓ Large data but team prefers tidyverse
|
||||
✓ Lazy evaluation desired
|
||||
|
||||
# Use base R when:
|
||||
✓ No dependencies allowed
|
||||
✓ Simple operations
|
||||
✓ Teaching/learning contexts
|
||||
```
|
||||
|
||||
## Profiling Best Practices
|
||||
|
||||
```r
|
||||
# 1. Profile realistic data sizes
|
||||
profvis({
|
||||
# Use actual data size, not toy examples
|
||||
real_data |> your_analysis()
|
||||
})
|
||||
|
||||
# 2. Profile multiple runs for stability
|
||||
bench::mark(
|
||||
your_function(data),
|
||||
min_iterations = 10, # Multiple runs
|
||||
max_iterations = 100
|
||||
)
|
||||
|
||||
# 3. Check memory usage too
|
||||
bench::mark(
|
||||
approach1 = method1(data),
|
||||
approach2 = method2(data),
|
||||
check = FALSE, # If outputs differ slightly
|
||||
filter_gc = FALSE # Include GC time
|
||||
)
|
||||
|
||||
# 4. Profile with realistic usage patterns
|
||||
# Not just isolated function calls
|
||||
```
|
||||
|
||||
## Performance Anti-Patterns to Avoid
|
||||
|
||||
```r
|
||||
# Don't optimize without measuring
|
||||
# ✗ "This looks slow" -> immediately rewrite
|
||||
# ✓ Profile first, optimize bottlenecks
|
||||
|
||||
# Don't over-engineer for performance
|
||||
# ✗ Complex optimizations for 1% gains
|
||||
# ✓ Focus on algorithmic improvements
|
||||
|
||||
# Don't assume - measure
|
||||
# ✗ "for loops are always slow in R"
|
||||
# ✓ Benchmark your specific use case
|
||||
|
||||
# Don't ignore readability costs
|
||||
# ✗ Unreadable code for minor speedups
|
||||
# ✓ Readable code with targeted optimizations
|
||||
|
||||
# Don't grow objects in loops
|
||||
# ✗ result <- c(); for(i in 1:n) result <- c(result, x[i])
|
||||
# ✓ result <- vector("list", n); for(i in 1:n) result[[i]] <- x[i]
|
||||
```
|
||||
|
||||
## Modern purrr Patterns for Performance
|
||||
|
||||
Use modern purrr 1.0+ patterns:
|
||||
|
||||
```r
|
||||
# Modern data frame row binding (purrr 1.0+)
|
||||
models <- data_splits |>
|
||||
map(\(split) train_model(split)) |>
|
||||
list_rbind() # Replaces map_dfr()
|
||||
|
||||
# Column binding
|
||||
summaries <- data_list |>
|
||||
map(\(df) get_summary_stats(df)) |>
|
||||
list_cbind() # Replaces map_dfc()
|
||||
|
||||
# Side effects with walk()
|
||||
plots <- walk2(data_list, plot_names, \(df, name) {
|
||||
p <- ggplot(df, aes(x, y)) + geom_point()
|
||||
ggsave(name, p)
|
||||
})
|
||||
|
||||
# Parallel processing (purrr 1.1.0+)
|
||||
library(mirai)
|
||||
daemons(4)
|
||||
results <- large_datasets |>
|
||||
map(in_parallel(expensive_computation))
|
||||
daemons(0)
|
||||
```
|
||||
|
||||
## Vectorization
|
||||
|
||||
```r
|
||||
# Good - vectorized operations
|
||||
result <- x + y
|
||||
|
||||
# Good - Type-stable purrr functions
|
||||
map_dbl(data, mean) # always returns double
|
||||
map_chr(data, class) # always returns character
|
||||
|
||||
# Avoid - Type-unstable base functions
|
||||
sapply(data, mean) # might return list or vector
|
||||
|
||||
# Avoid - explicit loops for simple operations
|
||||
result <- numeric(length(x))
|
||||
for(i in seq_along(x)) {
|
||||
result[i] <- x[i] + y[i]
|
||||
}
|
||||
```
|
||||
|
||||
## Using dtplyr for Large Data
|
||||
|
||||
For large datasets, use dtplyr to get data.table performance with dplyr syntax:
|
||||
|
||||
```r
|
||||
library(dtplyr)
|
||||
|
||||
# Convert to lazy data.table
|
||||
large_data_dt <- lazy_dt(large_data)
|
||||
|
||||
# Use dplyr syntax as normal
|
||||
result <- large_data_dt |>
|
||||
filter(year >= 2020) |>
|
||||
group_by(category) |>
|
||||
summarise(
|
||||
total = sum(value),
|
||||
avg = mean(value)
|
||||
) |>
|
||||
as_tibble() # Convert back to tibble
|
||||
|
||||
# See generated data.table code
|
||||
result |> show_query()
|
||||
```
|
||||
|
||||
## Memory Optimization
|
||||
|
||||
```r
|
||||
# Pre-allocate vectors
|
||||
result <- vector("numeric", n)
|
||||
|
||||
# Use appropriate data types
|
||||
# integer instead of double when possible
|
||||
x <- 1:1000 # integer
|
||||
y <- seq(1, 1000, by = 1) # double
|
||||
|
||||
# Remove large objects when done
|
||||
rm(large_object)
|
||||
gc() # Force garbage collection if needed
|
||||
|
||||
# Use data.table for large data
|
||||
library(data.table)
|
||||
dt <- as.data.table(large_df)
|
||||
dt[, new_col := old_col * 2] # Modifies in place
|
||||
```
|
||||
|
||||
## String Manipulation Performance
|
||||
|
||||
Use stringr over base R for consistency and performance:
|
||||
|
||||
```r
|
||||
# Good - stringr (consistent, pipe-friendly)
|
||||
text |>
|
||||
str_to_lower() |>
|
||||
str_trim() |>
|
||||
str_replace_all("pattern", "replacement") |>
|
||||
str_extract("\\d+")
|
||||
|
||||
# Common patterns
|
||||
str_detect(text, "pattern") # vs grepl("pattern", text)
|
||||
str_extract(text, "pattern") # vs complex regmatches()
|
||||
str_replace_all(text, "a", "b") # vs gsub("a", "b", text)
|
||||
str_split(text, ",") # vs strsplit(text, ",")
|
||||
str_length(text) # vs nchar(text)
|
||||
str_sub(text, 1, 5) # vs substr(text, 1, 5)
|
||||
```
|
||||
|
||||
## When to Use vctrs
|
||||
|
||||
### Core Benefits
|
||||
- **Type stability** - Predictable output types regardless of input values
|
||||
- **Size stability** - Predictable output sizes from input sizes
|
||||
- **Consistent coercion rules** - Single set of rules applied everywhere
|
||||
- **Robust class design** - Proper S3 vector infrastructure
|
||||
|
||||
### Use vctrs when:
|
||||
|
||||
```r
|
||||
# Type-Stable Functions in Packages
|
||||
my_function <- function(x, y) {
|
||||
# Always returns double, regardless of input values
|
||||
vec_cast(result, double())
|
||||
}
|
||||
|
||||
# Consistent Coercion/Casting
|
||||
vec_cast(x, double()) # Clear intent, predictable behavior
|
||||
vec_ptype_common(x, y, z) # Finds richest compatible type
|
||||
|
||||
# Size/Length Stability
|
||||
vec_c(x, y) # size = vec_size(x) + vec_size(y)
|
||||
vec_rbind(df1, df2) # size = sum of input sizes
|
||||
```
|
||||
|
||||
### Don't Use vctrs When:
|
||||
- Simple one-off analyses - Base R is sufficient
|
||||
- No custom classes needed - Standard types work fine
|
||||
- Performance critical + simple operations - Base R may be faster
|
||||
- External API constraints - Must return base R types
|
||||
|
||||
The key insight: **vctrs is most valuable in package development where type safety, consistency, and extensibility matter more than raw speed for simple operations.**
|
||||
247
skills/r-development/references/rlang-patterns.md
Normal file
247
skills/r-development/references/rlang-patterns.md
Normal file
@@ -0,0 +1,247 @@
|
||||
# rlang Patterns for Data-Masking
|
||||
|
||||
## Core Concepts
|
||||
|
||||
**Data-masking** allows R expressions to refer to data frame columns as if they were variables in the environment. rlang provides the metaprogramming framework that powers tidyverse data-masking.
|
||||
|
||||
### Key rlang Tools
|
||||
|
||||
- **Embracing `{{}}`** - Forward function arguments to data-masking functions
|
||||
- **Injection `!!`** - Inject single expressions or values
|
||||
- **Splicing `!!!`** - Inject multiple arguments from a list
|
||||
- **Dynamic dots** - Programmable `...` with injection support
|
||||
- **Pronouns `.data`/`.env`** - Explicit disambiguation between data and environment variables
|
||||
|
||||
## Function Argument Patterns
|
||||
|
||||
### Forwarding with `{{}}`
|
||||
|
||||
Use `{{}}` to forward function arguments to data-masking functions:
|
||||
|
||||
```r
|
||||
# Single argument forwarding
|
||||
my_summarise <- function(data, var) {
|
||||
data |> dplyr::summarise(mean = mean({{ var }}))
|
||||
}
|
||||
|
||||
# Works with any data-masking expression
|
||||
mtcars |> my_summarise(cyl)
|
||||
mtcars |> my_summarise(cyl * am)
|
||||
mtcars |> my_summarise(.data$cyl) # pronoun syntax supported
|
||||
```
|
||||
|
||||
### Forwarding `...`
|
||||
|
||||
No special syntax needed for dots forwarding:
|
||||
|
||||
```r
|
||||
# Simple dots forwarding
|
||||
my_group_by <- function(.data, ...) {
|
||||
.data |> dplyr::group_by(...)
|
||||
}
|
||||
|
||||
# Works with tidy selections too
|
||||
my_select <- function(.data, ...) {
|
||||
.data |> dplyr::select(...)
|
||||
}
|
||||
|
||||
# For single-argument tidy selections, wrap in c()
|
||||
my_pivot_longer <- function(.data, ...) {
|
||||
.data |> tidyr::pivot_longer(c(...))
|
||||
}
|
||||
```
|
||||
|
||||
### Names Patterns with `.data`
|
||||
|
||||
Use `.data` pronoun for programmatic column access:
|
||||
|
||||
```r
|
||||
# Single column by name
|
||||
my_mean <- function(data, var) {
|
||||
data |> dplyr::summarise(mean = mean(.data[[var]]))
|
||||
}
|
||||
|
||||
# Usage - completely insulated from data-masking
|
||||
mtcars |> my_mean("cyl") # No ambiguity, works like regular function
|
||||
|
||||
# Multiple columns with all_of()
|
||||
my_select_vars <- function(data, vars) {
|
||||
data |> dplyr::select(all_of(vars))
|
||||
}
|
||||
|
||||
mtcars |> my_select_vars(c("cyl", "am"))
|
||||
```
|
||||
|
||||
## Injection Operators
|
||||
|
||||
### When to Use Each Operator
|
||||
|
||||
| Operator | Use Case | Example |
|
||||
|----------|----------|---------|
|
||||
| `{{ }}` | Forward function arguments | `summarise(mean = mean({{ var }}))` |
|
||||
| `!!` | Inject single expression/value | `summarise(mean = mean(!!sym(var)))` |
|
||||
| `!!!` | Inject multiple arguments | `group_by(!!!syms(vars))` |
|
||||
| `.data[[]]` | Access columns by name | `mean(.data[[var]])` |
|
||||
|
||||
### Advanced Injection with `!!`
|
||||
|
||||
```r
|
||||
# Create symbols from strings
|
||||
var <- "cyl"
|
||||
mtcars |> dplyr::summarise(mean = mean(!!sym(var)))
|
||||
|
||||
# Inject values to avoid name collisions
|
||||
df <- data.frame(x = 1:3)
|
||||
x <- 100
|
||||
df |> dplyr::mutate(scaled = x / !!x) # Uses both data and env x
|
||||
|
||||
# Use data_sym() for tidyeval contexts (more robust)
|
||||
mtcars |> dplyr::summarise(mean = mean(!!data_sym(var)))
|
||||
```
|
||||
|
||||
### Splicing with `!!!`
|
||||
|
||||
```r
|
||||
# Multiple symbols from character vector
|
||||
vars <- c("cyl", "am")
|
||||
mtcars |> dplyr::group_by(!!!syms(vars))
|
||||
|
||||
# Or use data_syms() for tidy contexts
|
||||
mtcars |> dplyr::group_by(!!!data_syms(vars))
|
||||
|
||||
# Splice lists of arguments
|
||||
args <- list(na.rm = TRUE, trim = 0.1)
|
||||
mtcars |> dplyr::summarise(mean = mean(cyl, !!!args))
|
||||
```
|
||||
|
||||
## Dynamic Dots Patterns
|
||||
|
||||
### Using `list2()` for Dynamic Dots Support
|
||||
|
||||
```r
|
||||
my_function <- function(...) {
|
||||
# Collect with list2() instead of list() for dynamic features
|
||||
dots <- list2(...)
|
||||
# Process dots...
|
||||
}
|
||||
|
||||
# Enables these features:
|
||||
my_function(a = 1, b = 2) # Normal usage
|
||||
my_function(!!!list(a = 1, b = 2)) # Splice a list
|
||||
my_function("{name}" := value) # Name injection
|
||||
my_function(a = 1, ) # Trailing commas OK
|
||||
```
|
||||
|
||||
### Name Injection with Glue Syntax
|
||||
|
||||
```r
|
||||
# Basic name injection
|
||||
name <- "result"
|
||||
list2("{name}" := 1) # Creates list(result = 1)
|
||||
|
||||
# In function arguments with {{
|
||||
my_mean <- function(data, var) {
|
||||
data |> dplyr::summarise("mean_{{ var }}" := mean({{ var }}))
|
||||
}
|
||||
|
||||
mtcars |> my_mean(cyl) # Creates column "mean_cyl"
|
||||
mtcars |> my_mean(cyl * am) # Creates column "mean_cyl * am"
|
||||
|
||||
# Allow custom names with englue()
|
||||
my_mean <- function(data, var, name = englue("mean_{{ var }}")) {
|
||||
data |> dplyr::summarise("{name}" := mean({{ var }}))
|
||||
}
|
||||
|
||||
# User can override default
|
||||
mtcars |> my_mean(cyl, name = "cylinder_mean")
|
||||
```
|
||||
|
||||
## Pronouns for Disambiguation
|
||||
|
||||
### `.data` and `.env` Best Practices
|
||||
|
||||
```r
|
||||
# Explicit disambiguation prevents masking issues
|
||||
cyl <- 1000 # Environment variable
|
||||
|
||||
mtcars |> dplyr::summarise(
|
||||
data_cyl = mean(.data$cyl), # Data frame column
|
||||
env_cyl = mean(.env$cyl), # Environment variable
|
||||
ambiguous = mean(cyl) # Could be either (usually data wins)
|
||||
)
|
||||
|
||||
# Use in loops and programmatic contexts
|
||||
vars <- c("cyl", "am")
|
||||
for (var in vars) {
|
||||
result <- mtcars |> dplyr::summarise(mean = mean(.data[[var]]))
|
||||
print(result)
|
||||
}
|
||||
```
|
||||
|
||||
## Programming Patterns
|
||||
|
||||
### Bridge Patterns
|
||||
|
||||
Converting between data-masking and tidy selection behaviors:
|
||||
|
||||
```r
|
||||
# across() as selection-to-data-mask bridge
|
||||
my_group_by <- function(data, vars) {
|
||||
data |> dplyr::group_by(across({{ vars }}))
|
||||
}
|
||||
|
||||
# Works with tidy selection
|
||||
mtcars |> my_group_by(starts_with("c"))
|
||||
|
||||
# across(all_of()) as names-to-data-mask bridge
|
||||
my_group_by <- function(data, vars) {
|
||||
data |> dplyr::group_by(across(all_of(vars)))
|
||||
}
|
||||
|
||||
mtcars |> my_group_by(c("cyl", "am"))
|
||||
```
|
||||
|
||||
### Transformation Patterns
|
||||
|
||||
```r
|
||||
# Transform single arguments by wrapping
|
||||
my_mean <- function(data, var) {
|
||||
data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE))
|
||||
}
|
||||
|
||||
# Transform dots with across()
|
||||
my_means <- function(data, ...) {
|
||||
data |> dplyr::summarise(across(c(...), ~ mean(.x, na.rm = TRUE)))
|
||||
}
|
||||
|
||||
# Manual transformation (advanced)
|
||||
my_means_manual <- function(.data, ...) {
|
||||
vars <- enquos(..., .named = TRUE)
|
||||
vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE)))
|
||||
.data |> dplyr::summarise(!!!vars)
|
||||
}
|
||||
```
|
||||
|
||||
## Common Patterns Summary
|
||||
|
||||
### When to Use What
|
||||
|
||||
**Use `{{}}` when:**
|
||||
- Forwarding user-provided column references
|
||||
- Building wrapper functions around dplyr/tidyr
|
||||
- Need to support both bare names and expressions
|
||||
|
||||
**Use `.data[[]]` when:**
|
||||
- Working with character vector column names
|
||||
- Iterating over column names programmatically
|
||||
- Need complete insulation from data-masking
|
||||
|
||||
**Use `!!` when:**
|
||||
- Need to inject computed expressions
|
||||
- Converting strings to symbols with `sym()`
|
||||
- Avoiding variable name collisions
|
||||
|
||||
**Use `!!!` when:**
|
||||
- Injecting multiple arguments from a list
|
||||
- Working with variable numbers of columns
|
||||
- Splicing named arguments
|
||||
Reference in New Issue
Block a user