Files
gh-codingkaiser-claude-kais…/skills/r-development/references/object-systems.md
2025-11-29 18:15:04 +08:00

6.9 KiB

Object-Oriented Programming in R

S7: Modern OOP for New Projects

S7 combines S3 simplicity with S4 structure:

  • Formal class definitions with automatic validation
  • Compatible with existing S3 code
  • Better error messages and discoverability
# S7 class definition
Range <- new_class("Range",
  properties = list(
    start = class_double,
    end = class_double
  ),
  validator = function(self) {
    if (self@end < self@start) {
      "@end must be >= @start"
    }
  }
)

# Usage - constructor and property access
x <- Range(start = 1, end = 10)
x@start  # 1
x@end <- 20  # automatic validation

# Methods
inside <- new_generic("inside", "x")
method(inside, Range) <- function(x, y) {
  y >= x@start & y <= x@end
}

OOP System Decision Matrix

Decision Tree: What Are You Building?

1. Vector-like Objects

Use vctrs when:

  • ✓ Need data frame integration (columns/rows)
  • ✓ Want type-stable vector operations
  • ✓ Building factor-like, date-like, or numeric-like classes
  • ✓ Need consistent coercion/casting behavior
  • ✓ Working with existing tidyverse infrastructure

Examples: custom date classes, units, categorical data

# Vector-like behavior in data frames
percent <- new_vctr(0.5, class = "percentage") 
data.frame(x = 1:3, pct = percent(c(0.1, 0.2, 0.3)))  # works seamlessly

# Type-stable operations
vec_c(percent(0.1), percent(0.2))  # predictable behavior
vec_cast(0.5, percent())          # explicit, safe casting

2. General Objects (Complex Data Structures)

Use S7 when:

  • ✓ NEW projects that need formal classes
  • ✓ Want property validation and safe property access (@)
  • ✓ Need multiple dispatch (beyond S3's double dispatch)
  • ✓ Converting from S3 and want better structure
  • ✓ Building class hierarchies with inheritance
  • ✓ Want better error messages and discoverability
# Complex validation needs
Range <- new_class("Range",
  properties = list(start = class_double, end = class_double),
  validator = function(self) {
    if (self@end < self@start) "@end must be >= @start"
  }
)

# Multiple dispatch needs  
method(generic, list(ClassA, ClassB)) <- function(x, y) ...

# Class hierarchies with clear inheritance
Child <- new_class("Child", parent = Parent)

Use S3 when:

  • ✓ Simple classes with minimal structure needs
  • ✓ Maximum compatibility and minimal dependencies
  • ✓ Quick prototyping or internal classes
  • ✓ Contributing to existing S3-based ecosystems
  • ✓ Performance is absolutely critical (minimal overhead)
# Simple classes without complex needs
new_simple <- function(x) structure(x, class = "simple")
print.simple <- function(x, ...) cat("Simple:", x)

Use S4 when:

  • ✓ Working in Bioconductor ecosystem
  • ✓ Need complex multiple inheritance (S7 doesn't support this)
  • ✓ Existing S4 codebase that works well

Use R6 when:

  • ✓ Need reference semantics (mutable objects)
  • ✓ Building stateful objects
  • ✓ Coming from OOP languages like Python/Java
  • ✓ Need encapsulation and private methods

Detailed S7 vs S3 Comparison

Feature S3 S7 When S7 wins
Class definition Informal (convention) Formal (new_class()) Need guaranteed structure
Property access $ or attr() (unsafe) @ (safe, validated) Property validation matters
Validation Manual, inconsistent Built-in validators Data integrity important
Method discovery Hard to find methods Clear method printing Developer experience matters
Multiple dispatch Limited (base generics) Full multiple dispatch Complex method dispatch needed
Inheritance Informal, NextMethod() Explicit super() Predictable inheritance needed
Migration cost - Low (1-2 hours) Want better structure
Performance Fastest ~Same as S3 Performance difference negligible
Compatibility Full S3 Full S3 + S7 Need both old and new patterns

vctrs for Vector Classes

Basic Vector Class

# Constructor (low-level)
new_percent <- function(x = double()) {
  vec_assert(x, double())
  new_vctr(x, class = "pkg_percent")
}

# Helper (user-facing)
percent <- function(x = double()) {
  x <- vec_cast(x, double())
  new_percent(x)
}

# Format method
format.pkg_percent <- function(x, ...) {
  paste0(vec_data(x) * 100, "%")
}

Coercion Methods

# Self-coercion
vec_ptype2.pkg_percent.pkg_percent <- function(x, y, ...) {
  new_percent()
}

# With double
vec_ptype2.pkg_percent.double <- function(x, y, ...) double()
vec_ptype2.double.pkg_percent <- function(x, y, ...) double()

# Casting
vec_cast.pkg_percent.double <- function(x, to, ...) {
  new_percent(x)
}
vec_cast.double.pkg_percent <- function(x, to, ...) {
  vec_data(x)
}

S3 Basics

Creating S3 Classes

# Constructor
new_myclass <- function(x, y) {
  structure(
    list(x = x, y = y),
    class = "myclass"
  )
}

# Methods
print.myclass <- function(x, ...) {
  cat("myclass object\n")
  cat("x:", x$x, "\n")
  cat("y:", x$y, "\n")
}

summary.myclass <- function(object, ...) {
  list(x = object$x, y = object$y)
}

Generic Functions

# Create generic
my_generic <- function(x, ...) {
  UseMethod("my_generic")
}

# Default method
my_generic.default <- function(x, ...) {
  stop("No method for class ", class(x))
}

# Specific method
my_generic.myclass <- function(x, ...) {
  # Implementation
}

R6 Classes

Basic R6 Class

library(R6)

MyClass <- R6Class("MyClass",
  public = list(
    x = NULL,
    y = NULL,
    
    initialize = function(x, y) {
      self$x <- x
      self$y <- y
    },
    
    add = function() {
      self$x + self$y
    }
  ),
  
  private = list(
    internal_value = NULL
  )
)

# Usage
obj <- MyClass$new(1, 2)
obj$add()  # 3

Migration Strategy

S3 → S7

Usually 1-2 hours work, keeps full compatibility:

# S3 version
new_range <- function(start, end) {
  structure(
    list(start = start, end = end),
    class = "range"
  )
}

# S7 version
Range <- new_class("Range",
  properties = list(
    start = class_double,
    end = class_double
  )
)

S4 → S7

More complex, evaluate if S4 features are actually needed.

Base R → vctrs

For vector-like classes, significant benefits in type stability and data frame integration.

Combining Approaches

S7 classes can use vctrs principles internally for vector-like properties.

When to Use Each System

Use S7 for:

  • New projects needing formal OOP
  • Class validation and type safety
  • Multiple dispatch
  • Better developer experience

Use vctrs for:

  • Vector-like classes
  • Data frame columns
  • Type-stable operations
  • Tidyverse integration

Use S3 for:

  • Simple classes
  • Maximum compatibility
  • Existing S3 ecosystems
  • Quick prototypes

Use S4 for:

  • Bioconductor packages
  • Complex multiple inheritance
  • Existing S4 codebases

Use R6 for:

  • Mutable state
  • Reference semantics
  • Encapsulation needs
  • Coming from OOP languages