Initial commit
This commit is contained in:
214
skills/r-development/SKILL.md
Normal file
214
skills/r-development/SKILL.md
Normal file
@@ -0,0 +1,214 @@
|
||||
---
|
||||
name: r-development
|
||||
description: Modern R development practices emphasizing tidyverse patterns (dplyr 1.1 and later, native pipe, join_by, .by grouping), rlang metaprogramming, performance optimization, and package development. Use when Claude needs to write R code, create R packages, optimize R performance, or provide R programming guidance.
|
||||
---
|
||||
|
||||
# R Development
|
||||
|
||||
This skill provides comprehensive guidance for modern R development, emphasizing current best practices with tidyverse, performance optimization, and professional package development.
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Use modern tidyverse patterns** - Prioritize dplyr 1.1+ features, native pipe, and current APIs
|
||||
2. **Profile before optimizing** - Use profvis and bench to identify real bottlenecks
|
||||
3. **Write readable code first** - Optimize only when necessary and after profiling
|
||||
4. **Follow tidyverse style guide** - Consistent naming, spacing, and structure
|
||||
|
||||
## Modern Tidyverse Essentials
|
||||
|
||||
### Native Pipe (`|>` not `%>%`)
|
||||
|
||||
Always use native pipe `|>` instead of magrittr `%>%` (R 4.1+):
|
||||
|
||||
```r
|
||||
# Modern
|
||||
data |>
|
||||
filter(year >= 2020) |>
|
||||
summarise(mean_value = mean(value))
|
||||
|
||||
# Avoid legacy pipe
|
||||
data %>% filter(year >= 2020)
|
||||
```
|
||||
|
||||
### Join Syntax (dplyr 1.1+)
|
||||
|
||||
Use `join_by()` for all joins:
|
||||
|
||||
```r
|
||||
# Modern join syntax with equality
|
||||
transactions |>
|
||||
inner_join(companies, by = join_by(company == id))
|
||||
|
||||
# Inequality joins
|
||||
transactions |>
|
||||
inner_join(companies, join_by(company == id, year >= since))
|
||||
|
||||
# Rolling joins (closest match)
|
||||
transactions |>
|
||||
inner_join(companies, join_by(company == id, closest(year >= since)))
|
||||
```
|
||||
|
||||
Control match behavior:
|
||||
|
||||
```r
|
||||
# Expect 1:1 matches
|
||||
inner_join(x, y, by = join_by(id), multiple = "error")
|
||||
|
||||
# Ensure all rows match
|
||||
inner_join(x, y, by = join_by(id), unmatched = "error")
|
||||
```
|
||||
|
||||
### Per-Operation Grouping with `.by`
|
||||
|
||||
Use `.by` instead of `group_by() |> ... |> ungroup()`:
|
||||
|
||||
```r
|
||||
# Modern approach (always returns ungrouped)
|
||||
data |>
|
||||
summarise(mean_value = mean(value), .by = category)
|
||||
|
||||
# Multiple grouping variables
|
||||
data |>
|
||||
summarise(total = sum(revenue), .by = c(company, year))
|
||||
```
|
||||
|
||||
### Column Operations
|
||||
|
||||
Use modern column selection and transformation functions:
|
||||
|
||||
```r
|
||||
# pick() for column selection in data-masking contexts
|
||||
data |>
|
||||
summarise(
|
||||
n_x_cols = ncol(pick(starts_with("x"))),
|
||||
n_y_cols = ncol(pick(starts_with("y")))
|
||||
)
|
||||
|
||||
# across() for applying functions to multiple columns
|
||||
data |>
|
||||
summarise(across(where(is.numeric), mean, .names = "mean_{.col}"), .by = group)
|
||||
|
||||
# reframe() for multi-row results per group
|
||||
data |>
|
||||
reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group)
|
||||
```
|
||||
|
||||
## rlang Metaprogramming
|
||||
|
||||
For comprehensive rlang patterns, see [references/rlang-patterns.md](references/rlang-patterns.md).
|
||||
|
||||
### Quick Reference
|
||||
|
||||
- **`{{}}`** - Forward function arguments to data-masking functions
|
||||
- **`!!`** - Inject single expressions or values
|
||||
- **`!!!`** - Inject multiple arguments from a list
|
||||
- **`.data[[]]`** - Access columns by name (character vectors)
|
||||
- **`pick()`** - Select columns inside data-masking functions
|
||||
|
||||
Example function with embracing:
|
||||
|
||||
```r
|
||||
my_summary <- function(data, group_var, summary_var) {
|
||||
data |>
|
||||
summarise(mean_val = mean({{ summary_var }}), .by = {{ group_var }})
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
For detailed performance guidance, see [references/performance.md](references/performance.md).
|
||||
|
||||
### Key Strategies
|
||||
|
||||
1. **Profile first**: Use `profvis::profvis()` and `bench::mark()`
|
||||
2. **Vectorize operations**: Avoid loops when vectorized alternatives exist
|
||||
3. **Use dtplyr**: For large data operations (lazy evaluation with data.table backend)
|
||||
4. **Parallel processing**: Use `furrr::future_map()` for parallelizable work
|
||||
5. **Memory efficiency**: Pre-allocate, use appropriate data types
|
||||
|
||||
Quick example:
|
||||
|
||||
```r
|
||||
# Profile code
|
||||
profvis::profvis({
|
||||
result <- data |>
|
||||
complex_operation() |>
|
||||
another_operation()
|
||||
})
|
||||
|
||||
# Benchmark alternatives
|
||||
bench::mark(
|
||||
approach_1 = method1(data),
|
||||
approach_2 = method2(data),
|
||||
check = FALSE
|
||||
)
|
||||
```
|
||||
|
||||
## Package Development
|
||||
|
||||
For complete package development guidance, see [references/package-development.md](references/package-development.md).
|
||||
|
||||
### Quick Guidelines
|
||||
|
||||
**API Design:**
|
||||
- Use `.by` parameter for per-operation grouping
|
||||
- Use `{{}}` for column arguments
|
||||
- Return tibbles consistently
|
||||
- Validate user-facing function inputs thoroughly
|
||||
|
||||
**Dependencies:**
|
||||
- Add dependencies for significant functionality gains
|
||||
- Core tidyverse packages usually worth including: dplyr, purrr, stringr, tidyr
|
||||
- Minimize dependencies for widely-used packages
|
||||
|
||||
**Testing:**
|
||||
- Unit tests for individual functions
|
||||
- Integration tests for workflows
|
||||
- Test edge cases and error conditions
|
||||
|
||||
**Documentation:**
|
||||
- Document all exported functions
|
||||
- Provide usage examples
|
||||
- Explain non-obvious parameter interactions
|
||||
|
||||
## Common Migration Patterns
|
||||
|
||||
### Base R → Tidyverse
|
||||
|
||||
```r
|
||||
# Data manipulation
|
||||
subset(data, condition) → filter(data, condition)
|
||||
data[order(data$x), ] → arrange(data, x)
|
||||
aggregate(x ~ y, data, mean) → summarise(data, mean(x), .by = y)
|
||||
|
||||
# Functional programming
|
||||
sapply(x, f) → map(x, f) # type-stable
|
||||
lapply(x, f) → map(x, f)
|
||||
|
||||
# Strings
|
||||
grepl("pattern", text) → str_detect(text, "pattern")
|
||||
gsub("old", "new", text) → str_replace_all(text, "old", "new")
|
||||
```
|
||||
|
||||
### Old → New Tidyverse
|
||||
|
||||
```r
|
||||
# Pipes
|
||||
%>% → |>
|
||||
|
||||
# Grouping
|
||||
group_by() |> ... |> ungroup() → summarise(..., .by = x)
|
||||
|
||||
# Joins
|
||||
by = c("a" = "b") → by = join_by(a == b)
|
||||
|
||||
# Reshaping
|
||||
gather()/spread() → pivot_longer()/pivot_wider()
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **rlang patterns**: See [references/rlang-patterns.md](references/rlang-patterns.md) for comprehensive data-masking and metaprogramming guidance
|
||||
- **Performance optimization**: See [references/performance.md](references/performance.md) for profiling, benchmarking, and optimization strategies
|
||||
- **Package development**: See [references/package-development.md](references/package-development.md) for complete package creation guidance
|
||||
- **Object systems**: See [references/object-systems.md](references/object-systems.md) for S3, S4, S7, R6, and vctrs guidance
|
||||
Reference in New Issue
Block a user