Initial commit

2025-11-30 09:01:25 +08:00
commit d733741f8a
37 changed files with 26647 additions and 0 deletions
--- a/skills/solana-development/references/compute-optimization.md
+++ b/skills/solana-development/references/compute-optimization.md
@@ -0,0 +1,680 @@
+# Compute Unit Optimization Guide
+
+This guide provides comprehensive techniques for optimizing compute unit (CU) usage in Solana native Rust programs, compiled from official Solana documentation, community repositories, and expert resources.
+
+## Understanding Compute Units
+
+### Compute Limits
+
+Solana enforces strict compute budgets to ensure network performance:
+
+- **Max CU per block**: 60 million CU
+- **Max CU per account per block**: 12 million CU
+- **Max CU per transaction**: 1.4 million CU
+- **Default soft cap per transaction**: 200,000 CU
+
+Programs can request higher compute budgets using the Compute Budget program, up to the 1.4M hard limit.
+
+### Transaction Fees
+
+Transaction fees consist of two components:
+
+1. **Base fee**: 5,000 lamports per signature (fixed, independent of CU usage)
+2. **Priority fee**: Optional additional fee to prioritize transaction inclusion
+
+Priority fees are calculated as:
+```
+priority_fee = microLamports_per_CU × requested_compute_units
+```
+
+### Why Optimize CU Usage?
+
+Even though current fees don't scale with CU usage within the budget, optimization matters:
+
+1. **Block inclusion probability**: Smaller transactions are more likely to fit in congested blocks
+2. **Composability**: When your program is called via CPI, it shares the caller's CU budget
+3. **Efficient resource usage**: Better utilization of limited block space
+4. **Future-proofing**: Fee structures may change to account for actual CU consumption
+5. **User experience**: Faster transaction execution and lower rejection rates
+
+## Common Optimization Techniques
+
+### 1. Logging Optimization (Highest Impact)
+
+Logging is one of the most expensive operations in Solana programs.
+
+**Anti-patterns:**
+
+```rust
+// EXPENSIVE: 11,962 CU
+// Base58 encoding + string concatenation
+msg!("A string {0}", ctx.accounts.counter.to_account_info().key());
+
+// EXPENSIVE: 357 CU
+// String concatenation
+msg!("A string {0}", "5w6z5PWvtkCd4PaAV7avxE6Fy5brhZsFdbRLMt8UefRQ");
+```
+
+**Best practices:**
+
+```rust
+// EFFICIENT: 262 CU
+// Use .key().log() directly
+ctx.accounts.counter.to_account_info().key().log();
+
+// BETTER: 206 CU
+// Store in variable first
+let pubkey = ctx.accounts.counter.to_account_info().key();
+pubkey.log();
+
+// CHEAPEST: 204 CU
+// Simple string logging
+msg!("Compute units");
+```
+
+**Recommendation**: Avoid logging in production unless absolutely necessary for debugging. Remove or conditionally compile logging for mainnet deployments.
+
+### 2. Data Type Optimization
+
+Smaller data types consume fewer compute units.
+
+**Comparison:**
+
+```rust
+// 618 CU - u64
+let mut a: Vec<u64> = Vec::new();
+for _ in 0..6 {
+    a.push(1);
+}
+
+// 600 CU - i32 (default integer type)
+let mut a = Vec::new();
+for _ in 0..6 {
+    a.push(1);
+}
+
+// 459 CU - u8 (best for small values)
+let mut a: Vec<u8> = Vec::new();
+for _ in 0..6 {
+    a.push(1);
+}
+```
+
+**Initialization vs pushing:**
+
+```rust
+// 357 CU - Pushing elements one by one
+let mut a: Vec<u64> = Vec::new();
+for _ in 0..6 {
+    a.push(1);
+}
+
+// 125 CU - Direct initialization (65% savings!)
+let _a: Vec<u64> = vec![1, 1, 1, 1, 1, 1];
+```
+
+**Best practice**: Use the smallest data type that fits your requirements (u8 > u16 > u32 > u64), and prefer `vec![]` initialization over repeated `push()` calls.
+
+### 3. Serialization: Zero-Copy vs Borsh
+
+Zero-copy deserialization can provide massive CU savings for account operations.
+
+**Standard Borsh serialization:**
+
+```rust
+// 6,302 CU - Standard account initialization
+pub fn initialize(_ctx: Context<InitializeCounter>) -> Result<()> {
+    Ok(())
+}
+
+// 2,600 CU total for increment (including serialization overhead)
+pub fn increment(ctx: Context<Increment>) -> Result<()> {
+    let counter = &mut ctx.accounts.counter;
+    counter.count = counter.count.checked_add(1).unwrap(); // 108 CU for operation
+    Ok(())
+}
+```
+
+**Zero-copy optimization:**
+
+```rust
+// 5,020 CU - Zero-copy initialization (20% savings)
+pub fn initialize_zero_copy(_ctx: Context<InitializeCounterZeroCopy>) -> Result<()> {
+    Ok(())
+}
+
+// 1,254 CU total for increment (52% savings!)
+pub fn increment_zero_copy(ctx: Context<IncrementZeroCopy>) -> Result<()> {
+    let counter = &mut ctx.accounts.counter_zero_copy.load_mut()?;
+    counter.count = counter.count.checked_add(1).unwrap(); // 151 CU for operation
+    Ok(())
+}
+```
+
+**Zero-copy account definition:**
+
+```rust
+#[account(zero_copy)]
+#[repr(C)]
+#[derive(InitSpace)]
+pub struct CounterZeroCopy {
+    count: u64,
+    authority: Pubkey,
+    big_struct: BigStruct,  // Can include large structs without stack overflow
+}
+```
+
+**Benefits of zero-copy:**
+- 50%+ CU savings on serialization/deserialization
+- Avoids stack frame violations with large account structures
+- Direct memory access without intermediate copying
+- Particularly valuable for frequently updated accounts
+
+**Trade-off**: Slightly more complex API (`load()`, `load_mut()`) and requires `#[repr(C)]` for memory layout guarantees.
+
+### 4. Program Derived Addresses (PDAs)
+
+PDA operations vary significantly in cost depending on the method used.
+
+**Finding PDAs:**
+
+```rust
+// EXPENSIVE: 12,136 CU
+// Iterates through nonces to find valid bump seed
+let (pda, bump) = Pubkey::find_program_address(&[b"counter"], ctx.program_id);
+
+// EFFICIENT: 1,651 CU (87% savings!)
+// Uses known bump seed directly
+let pda = Pubkey::create_program_address(&[b"counter", &[248_u8]], &program_id).unwrap();
+```
+
+**Optimization strategy:**
+
+1. Use `find_program_address()` **once** during account initialization
+2. Save the bump seed in the account data
+3. Use `create_program_address()` with the saved bump for all subsequent operations
+
+**Anchor implementation:**
+
+```rust
+// Account structure - save the bump
+#[account]
+pub struct CounterData {
+    pub count: u64,
+    pub bump: u8,  // Store the bump seed here
+}
+
+// EXPENSIVE: 12,136 CU - Without saved bump
+#[account(
+    seeds = [b"counter"],
+    bump  // Anchor finds it every time
+)]
+pub counter_checked: Account<'info, CounterData>,
+
+// EFFICIENT: 1,600 CU - With saved bump (87% savings!)
+#[account(
+    seeds = [b"counter"],
+    bump = counter_checked.bump  // Use the saved bump
+)]
+pub counter_checked: Account<'info, CounterData>,
+```
+
+### 5. Cross-Program Invocations (CPIs)
+
+CPIs add significant overhead compared to direct operations.
+
+**CPI to System Program:**
+
+```rust
+// 2,215 CU - CPI for SOL transfer
+let cpi_context = CpiContext::new(
+    ctx.accounts.system_program.to_account_info(),
+    system_program::Transfer {
+        from: ctx.accounts.payer.to_account_info().clone(),
+        to: ctx.accounts.counter.to_account_info().clone(),
+    },
+);
+system_program::transfer(cpi_context, 1_000_000)?;
+```
+
+**Direct lamport manipulation:**
+
+```rust
+// 251 CU - Direct operation (90% savings!)
+let counter_account_info = ctx.accounts.counter.to_account_info();
+let mut counter_lamports = counter_account_info.try_borrow_mut_lamports()?;
+**counter_lamports += 1_000_000;
+
+let payer_account_info = ctx.accounts.payer.to_account_info();
+let mut payer_lamports = payer_account_info.try_borrow_mut_lamports()?;
+**payer_lamports -= 1_000_000;
+```
+
+**Important caveats:**
+
+1. **Error handling overhead**: Error paths add ~1,199 CU if triggered
+2. **Safety**: Direct manipulation bypasses safety checks in the System Program
+3. **Ownership**: Only safe when you control both accounts
+4. **Rent exemption**: You're responsible for maintaining rent exemption
+
+**Best practice**: Use CPIs for safety and correctness by default. Only optimize to direct manipulation when:
+- You have tight CU constraints
+- You fully understand the safety implications
+- Both accounts are controlled by your program
+
+### 6. Pass by Reference vs Clone
+
+Solana's bump allocator doesn't free memory, making unnecessary cloning particularly problematic.
+
+**Comparison:**
+
+```rust
+let balances = vec![10_u64; 100];
+
+// EFFICIENT: 47,683 CU - Pass by reference
+fn sum_by_reference(data: &Vec<u64>) -> u64 {
+    data.iter().sum()
+}
+
+for _ in 0..39 {
+    sum_reference += sum_by_reference(&balances);
+}
+
+// INEFFICIENT: 49,322 CU - Clone data (3.5% more expensive)
+// WARNING: Runs out of memory at 40+ iterations!
+fn sum_by_value(data: Vec<u64>) -> u64 {
+    data.iter().sum()
+}
+
+for _ in 0..39 {
+    sum_clone += sum_by_value(balances.clone());
+}
+```
+
+**Memory concern**: Solana programs have a 32KB heap using a bump allocator that **never frees memory** during transaction execution. Excessive cloning leads to out-of-memory errors.
+
+**Best practice**: Always pass by reference (`&T`) unless you explicitly need ownership transfer. Use `Copy` types for small data.
+
+### 7. Checked Math vs Unchecked Operations
+
+Checked arithmetic adds safety at the cost of compute units.
+
+**Comparison:**
+
+```rust
+let mut count: u64 = 1;
+
+// 97,314 CU - Checked multiplication with overflow protection
+for _ in 0..12000 {
+    count = count.checked_mul(2).expect("overflow");
+}
+
+// 85,113 CU - Bit shift operation (12% savings)
+// Equivalent to multiply by 2, but unchecked
+for _ in 0..12000 {
+    count = count << 1;
+}
+```
+
+**Trade-off**: Unchecked operations are faster but risk overflow bugs that can lead to serious security vulnerabilities.
+
+**Best practice**:
+- Use checked math by default for safety
+- Profile your program to identify hot paths
+- Only switch to unchecked math when:
+  - You've proven overflow is impossible
+  - CU savings are critical
+  - You've added overflow tests
+
+**Compiler configuration** (in Cargo.toml):
+
+```toml
+[profile.release]
+overflow-checks = true  # Keep overflow checks even in release mode
+```
+
+## Framework Comparison
+
+Different implementation approaches offer varying trade-offs between developer experience, safety, and performance.
+
+| Implementation | Binary Size | Deploy Cost | Init CU | Increment CU |
+|---------------|-------------|-------------|---------|--------------|
+| **Anchor** | 265,677 bytes | 1.85 SOL | 6,302 | 946 |
+| **Anchor Zero-Copy** | Same | 1.85 SOL | 5,020 | ~1,254 |
+| **Native Rust** | 48,573 bytes | 0.34 SOL | - | 843 |
+| **Unsafe Rust** | 973 bytes | 0.008 SOL | - | 5 |
+| **Assembly (SBPF)** | 1,389 bytes | 0.01 SOL | - | 4 |
+| **C** | 1,333 bytes | 0.01 SOL | - | 5 |
+
+**Key insights:**
+
+- **Anchor**: Best developer experience, automatic account validation, but highest CU and deployment costs
+- **Anchor Zero-Copy**: Significant CU improvement over standard Anchor with minimal code changes
+- **Native Rust**: 11% CU savings over Anchor, 82% smaller deployment size, moderate complexity
+- **Unsafe Rust**: 99% CU savings, minimal size, but requires extreme care and deep expertise
+- **Assembly/C**: Maximum optimization possible, but very difficult to develop and maintain
+
+**Recommendation**: Start with Anchor or native Rust. Optimize hot paths with zero-copy. Only consider unsafe Rust or lower-level languages for critical performance bottlenecks after profiling.
+
+## Advanced Optimization Techniques
+
+### 1. Compiler Flags
+
+Configure optimization in `Cargo.toml`:
+
+```toml
+[profile.release]
+opt-level = 3           # Maximum optimization
+lto = "fat"             # Full link-time optimization
+codegen-units = 1       # Single codegen unit for better optimization
+overflow-checks = true  # Keep safety checks despite performance cost
+```
+
+**Trade-offs**:
+- `overflow-checks = false`: Saves CU but removes critical safety checks
+- Higher `opt-level`: Better performance but slower compilation
+- `lto = "fat"`: Maximum optimization but much slower builds
+
+### 2. Function Inlining
+
+Control function inlining to balance CU usage and stack space:
+
+```rust
+// Force inlining - saves CU by eliminating function call overhead
+#[inline(always)]
+fn add(a: u64, b: u64) -> u64 {
+    a + b
+}
+
+// Prevent inlining - saves stack space at the cost of CU
+#[inline(never)]
+pub fn complex_operation() {
+    // Large function body
+}
+```
+
+**Trade-off**: Inlining saves CU but increases stack usage. Solana has a 4KB stack limit, so excessive inlining can cause stack overflow.
+
+### 3. Alternative Entry Points
+
+The standard Solana entry point adds overhead. Alternatives:
+
+**Standard entry point:**
+```rust
+use solana_program::entrypoint;
+entrypoint!(process_instruction);
+```
+
+**Minimal entry points:**
+- [solana-nostd-entrypoint](https://github.com/cavemanloverboy/solana-nostd-entrypoint): Ultra-minimal entry using unsafe Rust
+- [eisodos](https://github.com/anza-xyz/eisodos): Alternative minimal entry point
+
+**Warning**: These require deep understanding of Solana internals and unsafe Rust. Only use for extreme optimization needs.
+
+### 4. Custom Heap Allocators
+
+Solana's default bump allocator never frees memory during transaction execution.
+
+**Problem:**
+```rust
+// This will eventually run out of heap space (32KB limit)
+for _ in 0..1000 {
+    let v = vec![0u8; 1024];  // Each iteration uses more heap
+    // Memory is never freed!
+}
+```
+
+**Solution - Custom allocators:**
+
+- **smalloc**: Used by Metaplex programs, provides better memory management
+- Prevents out-of-memory errors in memory-intensive operations
+
+**Implementation** (advanced):
+```rust
+#[global_allocator]
+static ALLOCATOR: custom_allocator::CustomAllocator = custom_allocator::CustomAllocator;
+```
+
+### 5. Boxing and Heap Allocation
+
+Heap operations cost more CU than stack operations.
+
+```rust
+// Stack allocation - faster
+let data = [0u8; 100];
+
+// Heap allocation - slower, uses more CU
+let data = Box::new([0u8; 100]);
+```
+
+**Best practice**: Avoid `Box`, `Vec`, and other heap allocations when stack allocation is possible and doesn't risk overflow.
+
+## Measuring Compute Units
+
+### Using sol_log_compute_units()
+
+Built-in logging function to track CU consumption:
+
+```rust
+use solana_program::log::sol_log_compute_units;
+
+pub fn my_instruction(ctx: Context<MyContext>) -> Result<()> {
+    sol_log_compute_units(); // Log remaining CU
+
+    // ... do some work ...
+
+    sol_log_compute_units(); // Log remaining CU again
+    Ok(())
+}
+```
+
+**Output in transaction logs:**
+```
+Program consumption: 200000 units remaining
+Program consumption: 195432 units remaining
+```
+
+**CU used = 200000 - 195432 = 4,568 CU**
+
+### compute_fn! Macro
+
+Convenient macro for measuring specific code blocks (costs 409 CU overhead):
+
+```rust
+#[macro_export]
+macro_rules! compute_fn {
+    ($msg:expr=> $($tt:tt)*) => {
+        ::solana_program::msg!(concat!($msg, " {"));
+        ::solana_program::log::sol_log_compute_units();
+        let res = { $($tt)* };
+        ::solana_program::log::sol_log_compute_units();
+        ::solana_program::msg!(concat!(" } // ", $msg));
+        res
+    };
+}
+```
+
+**Usage:**
+
+```rust
+let result = compute_fn! { "My expensive operation" =>
+    expensive_computation()
+};
+```
+
+**Output:**
+```
+Program log: My expensive operation {
+Program consumption: 195432 units remaining
+Program consumption: 180123 units remaining
+Program log: } // My expensive operation
+```
+
+**Actual CU = (195432 - 180123) - 409 (macro overhead) = 14,900 CU**
+
+### Using Mollusk Bencher
+
+For native Rust programs, use Mollusk's built-in benchmarking (see main SKILL.md for details).
+
+## Anti-Patterns to Avoid
+
+### 1. Excessive Logging
+
+```rust
+// BAD: Logging in production
+msg!("Processing user {}", user_pubkey);
+msg!("Amount: {}", amount);
+msg!("Timestamp: {}", Clock::get()?.unix_timestamp);
+```
+
+**Solution**: Remove logging or use conditional compilation:
+
+```rust
+#[cfg(feature = "debug")]
+msg!("Processing user {}", user_pubkey);
+```
+
+### 2. Large Data Types for Small Values
+
+```rust
+// BAD: Using u64 when u8 suffices
+pub struct Config {
+    pub fee_percentage: u64,  // Only 0-100
+    pub max_items: u64,       // Only 0-255
+}
+
+// GOOD: Use smallest type
+pub struct Config {
+    pub fee_percentage: u8,   // 0-100
+    pub max_items: u8,        // 0-255
+}
+```
+
+### 3. Cloning Large Structures
+
+```rust
+// BAD: Unnecessary clone
+fn process_data(data: Vec<u8>) -> Result<()> {
+    let copy = data.clone();  // Wastes CU and heap
+    // ...
+}
+
+// GOOD: Pass by reference
+fn process_data(data: &[u8]) -> Result<()> {
+    // Work directly with reference
+}
+```
+
+### 4. Repeated PDA Derivation
+
+```rust
+// BAD: Finding bump every time
+#[account(
+    seeds = [b"vault"],
+    bump  // Finds bump on every call!
+)]
+pub vault: Account<'info, Vault>,
+
+// GOOD: Use saved bump
+#[account(
+    seeds = [b"vault"],
+    bump = vault.bump  // Uses saved bump
+)]
+pub vault: Account<'info, Vault>,
+```
+
+### 5. Unnecessary Boxing
+
+```rust
+// BAD: Boxing adds heap overhead
+let value = Box::new(calculate_value());
+
+// GOOD: Keep on stack
+let value = calculate_value();
+```
+
+### 6. String Operations
+
+```rust
+// BAD: String concatenation and formatting
+let message = format!("User {} sent {} tokens", user, amount);
+msg!(&message);
+
+// GOOD: Use separate logs or remove entirely
+user.log();
+amount.log();
+```
+
+### 7. Deep CPI Chains
+
+Each CPI adds significant overhead. Avoid unnecessary indirection:
+
+```rust
+// BAD: Unnecessary CPI
+invoke(
+    &my_helper_program::process(),
+    &accounts,
+)?;
+
+// GOOD: Direct implementation
+process_directly(&accounts)?;
+```
+
+### 8. Not Using Zero-Copy for Large Accounts
+
+```rust
+// BAD: Large account with standard serialization
+#[account]
+pub struct LargeData {
+    pub items: [u64; 1000],  // Expensive to serialize/deserialize
+}
+
+// GOOD: Use zero-copy
+#[account(zero_copy)]
+#[repr(C)]
+pub struct LargeData {
+    pub items: [u64; 1000],  // Direct memory access
+}
+```
+
+## Best Practices Summary
+
+1. **Minimize or eliminate logging** in production code
+2. **Use zero-copy** for accounts with large data structures
+3. **Cache PDA bumps** - derive once, store in account, reuse
+4. **Choose smallest data types** that meet your requirements
+5. **Pass by reference** instead of cloning data
+6. **Profile before optimizing** - measure CU usage to identify bottlenecks
+7. **Consider native Rust** over Anchor for performance-critical programs
+8. **Use `vec![]` initialization** instead of repeated `push()` calls
+9. **Avoid unnecessary CPIs** - use direct operations when safe
+10. **Balance safety vs performance** - don't sacrifice security without careful analysis
+11. **Test CU usage** regularly - include benchmarks in your test suite
+12. **Use checked math by default** - only optimize to unchecked when proven safe
+13. **Minimize heap allocations** - prefer stack when possible
+14. **Remove or conditionally compile debug code** for production builds
+15. **Consider zero-copy for frequently updated accounts** - 50%+ CU savings
+
+## Additional Resources
+
+### Official Documentation
+- [How to Optimize Compute](https://solana.com/developers/guides/advanced/how-to-optimize-compute)
+- [Solana Compute Budget Documentation](https://github.com/solana-labs/solana/blob/090e11210aa7222d8295610a6ccac4acda711bb9/program-runtime/src/compute_budget.rs#L26-L87)
+
+### Code Examples and Tools
+- [solana-developers/cu_optimizations](https://github.com/solana-developers/cu_optimizations) - Official examples with benchmarks
+- [hetdagli234/optimising-solana-programs](https://github.com/hetdagli234/optimising-solana-programs) - Community optimization examples
+
+### Video Guides
+- [How to optimize CU in programs](https://www.youtube.com/watch?v=7CbAK7Oq_o4)
+- [Program optimization Part 1](https://www.youtube.com/watch?v=xoJ-3NkYXfY)
+- [Program optimization Part 2 - Advanced](https://www.youtube.com/watch?v=Pwly1cOa2hg)
+- [Writing Solana programs in Assembly](https://www.youtube.com/watch?v=eacDC0VgyxI)
+
+### Technical Articles
+- [RareSkills: Solana Compute Unit Price](https://rareskills.io/post/solana-compute-unit-price)
+- [Understanding Solana Compute Units](https://www.helius.dev/blog/priority-fees-understanding-solanas-transaction-fee-mechanics)
+
+### Advanced Tools
+- [solana-nostd-entrypoint](https://github.com/cavemanloverboy/solana-nostd-entrypoint) - Minimal entry point
+- [Mollusk](https://github.com/anza-xyz/mollusk) - Fast testing with CU benchmarking