Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:01:25 +08:00
commit d733741f8a
37 changed files with 26647 additions and 0 deletions

View File

@@ -0,0 +1,680 @@
# Compute Unit Optimization Guide
This guide provides comprehensive techniques for optimizing compute unit (CU) usage in Solana native Rust programs, compiled from official Solana documentation, community repositories, and expert resources.
## Understanding Compute Units
### Compute Limits
Solana enforces strict compute budgets to ensure network performance:
- **Max CU per block**: 60 million CU
- **Max CU per account per block**: 12 million CU
- **Max CU per transaction**: 1.4 million CU
- **Default soft cap per transaction**: 200,000 CU
Programs can request higher compute budgets using the Compute Budget program, up to the 1.4M hard limit.
### Transaction Fees
Transaction fees consist of two components:
1. **Base fee**: 5,000 lamports per signature (fixed, independent of CU usage)
2. **Priority fee**: Optional additional fee to prioritize transaction inclusion
Priority fees are calculated as:
```
priority_fee = microLamports_per_CU × requested_compute_units
```
### Why Optimize CU Usage?
Even though current fees don't scale with CU usage within the budget, optimization matters:
1. **Block inclusion probability**: Smaller transactions are more likely to fit in congested blocks
2. **Composability**: When your program is called via CPI, it shares the caller's CU budget
3. **Efficient resource usage**: Better utilization of limited block space
4. **Future-proofing**: Fee structures may change to account for actual CU consumption
5. **User experience**: Faster transaction execution and lower rejection rates
## Common Optimization Techniques
### 1. Logging Optimization (Highest Impact)
Logging is one of the most expensive operations in Solana programs.
**Anti-patterns:**
```rust
// EXPENSIVE: 11,962 CU
// Base58 encoding + string concatenation
msg!("A string {0}", ctx.accounts.counter.to_account_info().key());
// EXPENSIVE: 357 CU
// String concatenation
msg!("A string {0}", "5w6z5PWvtkCd4PaAV7avxE6Fy5brhZsFdbRLMt8UefRQ");
```
**Best practices:**
```rust
// EFFICIENT: 262 CU
// Use .key().log() directly
ctx.accounts.counter.to_account_info().key().log();
// BETTER: 206 CU
// Store in variable first
let pubkey = ctx.accounts.counter.to_account_info().key();
pubkey.log();
// CHEAPEST: 204 CU
// Simple string logging
msg!("Compute units");
```
**Recommendation**: Avoid logging in production unless absolutely necessary for debugging. Remove or conditionally compile logging for mainnet deployments.
### 2. Data Type Optimization
Smaller data types consume fewer compute units.
**Comparison:**
```rust
// 618 CU - u64
let mut a: Vec<u64> = Vec::new();
for _ in 0..6 {
a.push(1);
}
// 600 CU - i32 (default integer type)
let mut a = Vec::new();
for _ in 0..6 {
a.push(1);
}
// 459 CU - u8 (best for small values)
let mut a: Vec<u8> = Vec::new();
for _ in 0..6 {
a.push(1);
}
```
**Initialization vs pushing:**
```rust
// 357 CU - Pushing elements one by one
let mut a: Vec<u64> = Vec::new();
for _ in 0..6 {
a.push(1);
}
// 125 CU - Direct initialization (65% savings!)
let _a: Vec<u64> = vec![1, 1, 1, 1, 1, 1];
```
**Best practice**: Use the smallest data type that fits your requirements (u8 > u16 > u32 > u64), and prefer `vec![]` initialization over repeated `push()` calls.
### 3. Serialization: Zero-Copy vs Borsh
Zero-copy deserialization can provide massive CU savings for account operations.
**Standard Borsh serialization:**
```rust
// 6,302 CU - Standard account initialization
pub fn initialize(_ctx: Context<InitializeCounter>) -> Result<()> {
Ok(())
}
// 2,600 CU total for increment (including serialization overhead)
pub fn increment(ctx: Context<Increment>) -> Result<()> {
let counter = &mut ctx.accounts.counter;
counter.count = counter.count.checked_add(1).unwrap(); // 108 CU for operation
Ok(())
}
```
**Zero-copy optimization:**
```rust
// 5,020 CU - Zero-copy initialization (20% savings)
pub fn initialize_zero_copy(_ctx: Context<InitializeCounterZeroCopy>) -> Result<()> {
Ok(())
}
// 1,254 CU total for increment (52% savings!)
pub fn increment_zero_copy(ctx: Context<IncrementZeroCopy>) -> Result<()> {
let counter = &mut ctx.accounts.counter_zero_copy.load_mut()?;
counter.count = counter.count.checked_add(1).unwrap(); // 151 CU for operation
Ok(())
}
```
**Zero-copy account definition:**
```rust
#[account(zero_copy)]
#[repr(C)]
#[derive(InitSpace)]
pub struct CounterZeroCopy {
count: u64,
authority: Pubkey,
big_struct: BigStruct, // Can include large structs without stack overflow
}
```
**Benefits of zero-copy:**
- 50%+ CU savings on serialization/deserialization
- Avoids stack frame violations with large account structures
- Direct memory access without intermediate copying
- Particularly valuable for frequently updated accounts
**Trade-off**: Slightly more complex API (`load()`, `load_mut()`) and requires `#[repr(C)]` for memory layout guarantees.
### 4. Program Derived Addresses (PDAs)
PDA operations vary significantly in cost depending on the method used.
**Finding PDAs:**
```rust
// EXPENSIVE: 12,136 CU
// Iterates through nonces to find valid bump seed
let (pda, bump) = Pubkey::find_program_address(&[b"counter"], ctx.program_id);
// EFFICIENT: 1,651 CU (87% savings!)
// Uses known bump seed directly
let pda = Pubkey::create_program_address(&[b"counter", &[248_u8]], &program_id).unwrap();
```
**Optimization strategy:**
1. Use `find_program_address()` **once** during account initialization
2. Save the bump seed in the account data
3. Use `create_program_address()` with the saved bump for all subsequent operations
**Anchor implementation:**
```rust
// Account structure - save the bump
#[account]
pub struct CounterData {
pub count: u64,
pub bump: u8, // Store the bump seed here
}
// EXPENSIVE: 12,136 CU - Without saved bump
#[account(
seeds = [b"counter"],
bump // Anchor finds it every time
)]
pub counter_checked: Account<'info, CounterData>,
// EFFICIENT: 1,600 CU - With saved bump (87% savings!)
#[account(
seeds = [b"counter"],
bump = counter_checked.bump // Use the saved bump
)]
pub counter_checked: Account<'info, CounterData>,
```
### 5. Cross-Program Invocations (CPIs)
CPIs add significant overhead compared to direct operations.
**CPI to System Program:**
```rust
// 2,215 CU - CPI for SOL transfer
let cpi_context = CpiContext::new(
ctx.accounts.system_program.to_account_info(),
system_program::Transfer {
from: ctx.accounts.payer.to_account_info().clone(),
to: ctx.accounts.counter.to_account_info().clone(),
},
);
system_program::transfer(cpi_context, 1_000_000)?;
```
**Direct lamport manipulation:**
```rust
// 251 CU - Direct operation (90% savings!)
let counter_account_info = ctx.accounts.counter.to_account_info();
let mut counter_lamports = counter_account_info.try_borrow_mut_lamports()?;
**counter_lamports += 1_000_000;
let payer_account_info = ctx.accounts.payer.to_account_info();
let mut payer_lamports = payer_account_info.try_borrow_mut_lamports()?;
**payer_lamports -= 1_000_000;
```
**Important caveats:**
1. **Error handling overhead**: Error paths add ~1,199 CU if triggered
2. **Safety**: Direct manipulation bypasses safety checks in the System Program
3. **Ownership**: Only safe when you control both accounts
4. **Rent exemption**: You're responsible for maintaining rent exemption
**Best practice**: Use CPIs for safety and correctness by default. Only optimize to direct manipulation when:
- You have tight CU constraints
- You fully understand the safety implications
- Both accounts are controlled by your program
### 6. Pass by Reference vs Clone
Solana's bump allocator doesn't free memory, making unnecessary cloning particularly problematic.
**Comparison:**
```rust
let balances = vec![10_u64; 100];
// EFFICIENT: 47,683 CU - Pass by reference
fn sum_by_reference(data: &Vec<u64>) -> u64 {
data.iter().sum()
}
for _ in 0..39 {
sum_reference += sum_by_reference(&balances);
}
// INEFFICIENT: 49,322 CU - Clone data (3.5% more expensive)
// WARNING: Runs out of memory at 40+ iterations!
fn sum_by_value(data: Vec<u64>) -> u64 {
data.iter().sum()
}
for _ in 0..39 {
sum_clone += sum_by_value(balances.clone());
}
```
**Memory concern**: Solana programs have a 32KB heap using a bump allocator that **never frees memory** during transaction execution. Excessive cloning leads to out-of-memory errors.
**Best practice**: Always pass by reference (`&T`) unless you explicitly need ownership transfer. Use `Copy` types for small data.
### 7. Checked Math vs Unchecked Operations
Checked arithmetic adds safety at the cost of compute units.
**Comparison:**
```rust
let mut count: u64 = 1;
// 97,314 CU - Checked multiplication with overflow protection
for _ in 0..12000 {
count = count.checked_mul(2).expect("overflow");
}
// 85,113 CU - Bit shift operation (12% savings)
// Equivalent to multiply by 2, but unchecked
for _ in 0..12000 {
count = count << 1;
}
```
**Trade-off**: Unchecked operations are faster but risk overflow bugs that can lead to serious security vulnerabilities.
**Best practice**:
- Use checked math by default for safety
- Profile your program to identify hot paths
- Only switch to unchecked math when:
- You've proven overflow is impossible
- CU savings are critical
- You've added overflow tests
**Compiler configuration** (in Cargo.toml):
```toml
[profile.release]
overflow-checks = true # Keep overflow checks even in release mode
```
## Framework Comparison
Different implementation approaches offer varying trade-offs between developer experience, safety, and performance.
| Implementation | Binary Size | Deploy Cost | Init CU | Increment CU |
|---------------|-------------|-------------|---------|--------------|
| **Anchor** | 265,677 bytes | 1.85 SOL | 6,302 | 946 |
| **Anchor Zero-Copy** | Same | 1.85 SOL | 5,020 | ~1,254 |
| **Native Rust** | 48,573 bytes | 0.34 SOL | - | 843 |
| **Unsafe Rust** | 973 bytes | 0.008 SOL | - | 5 |
| **Assembly (SBPF)** | 1,389 bytes | 0.01 SOL | - | 4 |
| **C** | 1,333 bytes | 0.01 SOL | - | 5 |
**Key insights:**
- **Anchor**: Best developer experience, automatic account validation, but highest CU and deployment costs
- **Anchor Zero-Copy**: Significant CU improvement over standard Anchor with minimal code changes
- **Native Rust**: 11% CU savings over Anchor, 82% smaller deployment size, moderate complexity
- **Unsafe Rust**: 99% CU savings, minimal size, but requires extreme care and deep expertise
- **Assembly/C**: Maximum optimization possible, but very difficult to develop and maintain
**Recommendation**: Start with Anchor or native Rust. Optimize hot paths with zero-copy. Only consider unsafe Rust or lower-level languages for critical performance bottlenecks after profiling.
## Advanced Optimization Techniques
### 1. Compiler Flags
Configure optimization in `Cargo.toml`:
```toml
[profile.release]
opt-level = 3 # Maximum optimization
lto = "fat" # Full link-time optimization
codegen-units = 1 # Single codegen unit for better optimization
overflow-checks = true # Keep safety checks despite performance cost
```
**Trade-offs**:
- `overflow-checks = false`: Saves CU but removes critical safety checks
- Higher `opt-level`: Better performance but slower compilation
- `lto = "fat"`: Maximum optimization but much slower builds
### 2. Function Inlining
Control function inlining to balance CU usage and stack space:
```rust
// Force inlining - saves CU by eliminating function call overhead
#[inline(always)]
fn add(a: u64, b: u64) -> u64 {
a + b
}
// Prevent inlining - saves stack space at the cost of CU
#[inline(never)]
pub fn complex_operation() {
// Large function body
}
```
**Trade-off**: Inlining saves CU but increases stack usage. Solana has a 4KB stack limit, so excessive inlining can cause stack overflow.
### 3. Alternative Entry Points
The standard Solana entry point adds overhead. Alternatives:
**Standard entry point:**
```rust
use solana_program::entrypoint;
entrypoint!(process_instruction);
```
**Minimal entry points:**
- [solana-nostd-entrypoint](https://github.com/cavemanloverboy/solana-nostd-entrypoint): Ultra-minimal entry using unsafe Rust
- [eisodos](https://github.com/anza-xyz/eisodos): Alternative minimal entry point
**Warning**: These require deep understanding of Solana internals and unsafe Rust. Only use for extreme optimization needs.
### 4. Custom Heap Allocators
Solana's default bump allocator never frees memory during transaction execution.
**Problem:**
```rust
// This will eventually run out of heap space (32KB limit)
for _ in 0..1000 {
let v = vec![0u8; 1024]; // Each iteration uses more heap
// Memory is never freed!
}
```
**Solution - Custom allocators:**
- **smalloc**: Used by Metaplex programs, provides better memory management
- Prevents out-of-memory errors in memory-intensive operations
**Implementation** (advanced):
```rust
#[global_allocator]
static ALLOCATOR: custom_allocator::CustomAllocator = custom_allocator::CustomAllocator;
```
### 5. Boxing and Heap Allocation
Heap operations cost more CU than stack operations.
```rust
// Stack allocation - faster
let data = [0u8; 100];
// Heap allocation - slower, uses more CU
let data = Box::new([0u8; 100]);
```
**Best practice**: Avoid `Box`, `Vec`, and other heap allocations when stack allocation is possible and doesn't risk overflow.
## Measuring Compute Units
### Using sol_log_compute_units()
Built-in logging function to track CU consumption:
```rust
use solana_program::log::sol_log_compute_units;
pub fn my_instruction(ctx: Context<MyContext>) -> Result<()> {
sol_log_compute_units(); // Log remaining CU
// ... do some work ...
sol_log_compute_units(); // Log remaining CU again
Ok(())
}
```
**Output in transaction logs:**
```
Program consumption: 200000 units remaining
Program consumption: 195432 units remaining
```
**CU used = 200000 - 195432 = 4,568 CU**
### compute_fn! Macro
Convenient macro for measuring specific code blocks (costs 409 CU overhead):
```rust
#[macro_export]
macro_rules! compute_fn {
($msg:expr=> $($tt:tt)*) => {
::solana_program::msg!(concat!($msg, " {"));
::solana_program::log::sol_log_compute_units();
let res = { $($tt)* };
::solana_program::log::sol_log_compute_units();
::solana_program::msg!(concat!(" } // ", $msg));
res
};
}
```
**Usage:**
```rust
let result = compute_fn! { "My expensive operation" =>
expensive_computation()
};
```
**Output:**
```
Program log: My expensive operation {
Program consumption: 195432 units remaining
Program consumption: 180123 units remaining
Program log: } // My expensive operation
```
**Actual CU = (195432 - 180123) - 409 (macro overhead) = 14,900 CU**
### Using Mollusk Bencher
For native Rust programs, use Mollusk's built-in benchmarking (see main SKILL.md for details).
## Anti-Patterns to Avoid
### 1. Excessive Logging
```rust
// BAD: Logging in production
msg!("Processing user {}", user_pubkey);
msg!("Amount: {}", amount);
msg!("Timestamp: {}", Clock::get()?.unix_timestamp);
```
**Solution**: Remove logging or use conditional compilation:
```rust
#[cfg(feature = "debug")]
msg!("Processing user {}", user_pubkey);
```
### 2. Large Data Types for Small Values
```rust
// BAD: Using u64 when u8 suffices
pub struct Config {
pub fee_percentage: u64, // Only 0-100
pub max_items: u64, // Only 0-255
}
// GOOD: Use smallest type
pub struct Config {
pub fee_percentage: u8, // 0-100
pub max_items: u8, // 0-255
}
```
### 3. Cloning Large Structures
```rust
// BAD: Unnecessary clone
fn process_data(data: Vec<u8>) -> Result<()> {
let copy = data.clone(); // Wastes CU and heap
// ...
}
// GOOD: Pass by reference
fn process_data(data: &[u8]) -> Result<()> {
// Work directly with reference
}
```
### 4. Repeated PDA Derivation
```rust
// BAD: Finding bump every time
#[account(
seeds = [b"vault"],
bump // Finds bump on every call!
)]
pub vault: Account<'info, Vault>,
// GOOD: Use saved bump
#[account(
seeds = [b"vault"],
bump = vault.bump // Uses saved bump
)]
pub vault: Account<'info, Vault>,
```
### 5. Unnecessary Boxing
```rust
// BAD: Boxing adds heap overhead
let value = Box::new(calculate_value());
// GOOD: Keep on stack
let value = calculate_value();
```
### 6. String Operations
```rust
// BAD: String concatenation and formatting
let message = format!("User {} sent {} tokens", user, amount);
msg!(&message);
// GOOD: Use separate logs or remove entirely
user.log();
amount.log();
```
### 7. Deep CPI Chains
Each CPI adds significant overhead. Avoid unnecessary indirection:
```rust
// BAD: Unnecessary CPI
invoke(
&my_helper_program::process(),
&accounts,
)?;
// GOOD: Direct implementation
process_directly(&accounts)?;
```
### 8. Not Using Zero-Copy for Large Accounts
```rust
// BAD: Large account with standard serialization
#[account]
pub struct LargeData {
pub items: [u64; 1000], // Expensive to serialize/deserialize
}
// GOOD: Use zero-copy
#[account(zero_copy)]
#[repr(C)]
pub struct LargeData {
pub items: [u64; 1000], // Direct memory access
}
```
## Best Practices Summary
1. **Minimize or eliminate logging** in production code
2. **Use zero-copy** for accounts with large data structures
3. **Cache PDA bumps** - derive once, store in account, reuse
4. **Choose smallest data types** that meet your requirements
5. **Pass by reference** instead of cloning data
6. **Profile before optimizing** - measure CU usage to identify bottlenecks
7. **Consider native Rust** over Anchor for performance-critical programs
8. **Use `vec![]` initialization** instead of repeated `push()` calls
9. **Avoid unnecessary CPIs** - use direct operations when safe
10. **Balance safety vs performance** - don't sacrifice security without careful analysis
11. **Test CU usage** regularly - include benchmarks in your test suite
12. **Use checked math by default** - only optimize to unchecked when proven safe
13. **Minimize heap allocations** - prefer stack when possible
14. **Remove or conditionally compile debug code** for production builds
15. **Consider zero-copy for frequently updated accounts** - 50%+ CU savings
## Additional Resources
### Official Documentation
- [How to Optimize Compute](https://solana.com/developers/guides/advanced/how-to-optimize-compute)
- [Solana Compute Budget Documentation](https://github.com/solana-labs/solana/blob/090e11210aa7222d8295610a6ccac4acda711bb9/program-runtime/src/compute_budget.rs#L26-L87)
### Code Examples and Tools
- [solana-developers/cu_optimizations](https://github.com/solana-developers/cu_optimizations) - Official examples with benchmarks
- [hetdagli234/optimising-solana-programs](https://github.com/hetdagli234/optimising-solana-programs) - Community optimization examples
### Video Guides
- [How to optimize CU in programs](https://www.youtube.com/watch?v=7CbAK7Oq_o4)
- [Program optimization Part 1](https://www.youtube.com/watch?v=xoJ-3NkYXfY)
- [Program optimization Part 2 - Advanced](https://www.youtube.com/watch?v=Pwly1cOa2hg)
- [Writing Solana programs in Assembly](https://www.youtube.com/watch?v=eacDC0VgyxI)
### Technical Articles
- [RareSkills: Solana Compute Unit Price](https://rareskills.io/post/solana-compute-unit-price)
- [Understanding Solana Compute Units](https://www.helius.dev/blog/priority-fees-understanding-solanas-transaction-fee-mechanics)
### Advanced Tools
- [solana-nostd-entrypoint](https://github.com/cavemanloverboy/solana-nostd-entrypoint) - Minimal entry point
- [Mollusk](https://github.com/anza-xyz/mollusk) - Fast testing with CU benchmarking