Files
2025-11-30 09:01:25 +08:00

19 KiB
Raw Permalink Blame History

Compute Unit Optimization Guide

This guide provides comprehensive techniques for optimizing compute unit (CU) usage in Solana native Rust programs, compiled from official Solana documentation, community repositories, and expert resources.

Understanding Compute Units

Compute Limits

Solana enforces strict compute budgets to ensure network performance:

  • Max CU per block: 60 million CU
  • Max CU per account per block: 12 million CU
  • Max CU per transaction: 1.4 million CU
  • Default soft cap per transaction: 200,000 CU

Programs can request higher compute budgets using the Compute Budget program, up to the 1.4M hard limit.

Transaction Fees

Transaction fees consist of two components:

  1. Base fee: 5,000 lamports per signature (fixed, independent of CU usage)
  2. Priority fee: Optional additional fee to prioritize transaction inclusion

Priority fees are calculated as:

priority_fee = microLamports_per_CU × requested_compute_units

Why Optimize CU Usage?

Even though current fees don't scale with CU usage within the budget, optimization matters:

  1. Block inclusion probability: Smaller transactions are more likely to fit in congested blocks
  2. Composability: When your program is called via CPI, it shares the caller's CU budget
  3. Efficient resource usage: Better utilization of limited block space
  4. Future-proofing: Fee structures may change to account for actual CU consumption
  5. User experience: Faster transaction execution and lower rejection rates

Common Optimization Techniques

1. Logging Optimization (Highest Impact)

Logging is one of the most expensive operations in Solana programs.

Anti-patterns:

// EXPENSIVE: 11,962 CU
// Base58 encoding + string concatenation
msg!("A string {0}", ctx.accounts.counter.to_account_info().key());

// EXPENSIVE: 357 CU
// String concatenation
msg!("A string {0}", "5w6z5PWvtkCd4PaAV7avxE6Fy5brhZsFdbRLMt8UefRQ");

Best practices:

// EFFICIENT: 262 CU
// Use .key().log() directly
ctx.accounts.counter.to_account_info().key().log();

// BETTER: 206 CU
// Store in variable first
let pubkey = ctx.accounts.counter.to_account_info().key();
pubkey.log();

// CHEAPEST: 204 CU
// Simple string logging
msg!("Compute units");

Recommendation: Avoid logging in production unless absolutely necessary for debugging. Remove or conditionally compile logging for mainnet deployments.

2. Data Type Optimization

Smaller data types consume fewer compute units.

Comparison:

// 618 CU - u64
let mut a: Vec<u64> = Vec::new();
for _ in 0..6 {
    a.push(1);
}

// 600 CU - i32 (default integer type)
let mut a = Vec::new();
for _ in 0..6 {
    a.push(1);
}

// 459 CU - u8 (best for small values)
let mut a: Vec<u8> = Vec::new();
for _ in 0..6 {
    a.push(1);
}

Initialization vs pushing:

// 357 CU - Pushing elements one by one
let mut a: Vec<u64> = Vec::new();
for _ in 0..6 {
    a.push(1);
}

// 125 CU - Direct initialization (65% savings!)
let _a: Vec<u64> = vec![1, 1, 1, 1, 1, 1];

Best practice: Use the smallest data type that fits your requirements (u8 > u16 > u32 > u64), and prefer vec![] initialization over repeated push() calls.

3. Serialization: Zero-Copy vs Borsh

Zero-copy deserialization can provide massive CU savings for account operations.

Standard Borsh serialization:

// 6,302 CU - Standard account initialization
pub fn initialize(_ctx: Context<InitializeCounter>) -> Result<()> {
    Ok(())
}

// 2,600 CU total for increment (including serialization overhead)
pub fn increment(ctx: Context<Increment>) -> Result<()> {
    let counter = &mut ctx.accounts.counter;
    counter.count = counter.count.checked_add(1).unwrap(); // 108 CU for operation
    Ok(())
}

Zero-copy optimization:

// 5,020 CU - Zero-copy initialization (20% savings)
pub fn initialize_zero_copy(_ctx: Context<InitializeCounterZeroCopy>) -> Result<()> {
    Ok(())
}

// 1,254 CU total for increment (52% savings!)
pub fn increment_zero_copy(ctx: Context<IncrementZeroCopy>) -> Result<()> {
    let counter = &mut ctx.accounts.counter_zero_copy.load_mut()?;
    counter.count = counter.count.checked_add(1).unwrap(); // 151 CU for operation
    Ok(())
}

Zero-copy account definition:

#[account(zero_copy)]
#[repr(C)]
#[derive(InitSpace)]
pub struct CounterZeroCopy {
    count: u64,
    authority: Pubkey,
    big_struct: BigStruct,  // Can include large structs without stack overflow
}

Benefits of zero-copy:

  • 50%+ CU savings on serialization/deserialization
  • Avoids stack frame violations with large account structures
  • Direct memory access without intermediate copying
  • Particularly valuable for frequently updated accounts

Trade-off: Slightly more complex API (load(), load_mut()) and requires #[repr(C)] for memory layout guarantees.

4. Program Derived Addresses (PDAs)

PDA operations vary significantly in cost depending on the method used.

Finding PDAs:

// EXPENSIVE: 12,136 CU
// Iterates through nonces to find valid bump seed
let (pda, bump) = Pubkey::find_program_address(&[b"counter"], ctx.program_id);

// EFFICIENT: 1,651 CU (87% savings!)
// Uses known bump seed directly
let pda = Pubkey::create_program_address(&[b"counter", &[248_u8]], &program_id).unwrap();

Optimization strategy:

  1. Use find_program_address() once during account initialization
  2. Save the bump seed in the account data
  3. Use create_program_address() with the saved bump for all subsequent operations

Anchor implementation:

// Account structure - save the bump
#[account]
pub struct CounterData {
    pub count: u64,
    pub bump: u8,  // Store the bump seed here
}

// EXPENSIVE: 12,136 CU - Without saved bump
#[account(
    seeds = [b"counter"],
    bump  // Anchor finds it every time
)]
pub counter_checked: Account<'info, CounterData>,

// EFFICIENT: 1,600 CU - With saved bump (87% savings!)
#[account(
    seeds = [b"counter"],
    bump = counter_checked.bump  // Use the saved bump
)]
pub counter_checked: Account<'info, CounterData>,

5. Cross-Program Invocations (CPIs)

CPIs add significant overhead compared to direct operations.

CPI to System Program:

// 2,215 CU - CPI for SOL transfer
let cpi_context = CpiContext::new(
    ctx.accounts.system_program.to_account_info(),
    system_program::Transfer {
        from: ctx.accounts.payer.to_account_info().clone(),
        to: ctx.accounts.counter.to_account_info().clone(),
    },
);
system_program::transfer(cpi_context, 1_000_000)?;

Direct lamport manipulation:

// 251 CU - Direct operation (90% savings!)
let counter_account_info = ctx.accounts.counter.to_account_info();
let mut counter_lamports = counter_account_info.try_borrow_mut_lamports()?;
**counter_lamports += 1_000_000;

let payer_account_info = ctx.accounts.payer.to_account_info();
let mut payer_lamports = payer_account_info.try_borrow_mut_lamports()?;
**payer_lamports -= 1_000_000;

Important caveats:

  1. Error handling overhead: Error paths add ~1,199 CU if triggered
  2. Safety: Direct manipulation bypasses safety checks in the System Program
  3. Ownership: Only safe when you control both accounts
  4. Rent exemption: You're responsible for maintaining rent exemption

Best practice: Use CPIs for safety and correctness by default. Only optimize to direct manipulation when:

  • You have tight CU constraints
  • You fully understand the safety implications
  • Both accounts are controlled by your program

6. Pass by Reference vs Clone

Solana's bump allocator doesn't free memory, making unnecessary cloning particularly problematic.

Comparison:

let balances = vec![10_u64; 100];

// EFFICIENT: 47,683 CU - Pass by reference
fn sum_by_reference(data: &Vec<u64>) -> u64 {
    data.iter().sum()
}

for _ in 0..39 {
    sum_reference += sum_by_reference(&balances);
}

// INEFFICIENT: 49,322 CU - Clone data (3.5% more expensive)
// WARNING: Runs out of memory at 40+ iterations!
fn sum_by_value(data: Vec<u64>) -> u64 {
    data.iter().sum()
}

for _ in 0..39 {
    sum_clone += sum_by_value(balances.clone());
}

Memory concern: Solana programs have a 32KB heap using a bump allocator that never frees memory during transaction execution. Excessive cloning leads to out-of-memory errors.

Best practice: Always pass by reference (&T) unless you explicitly need ownership transfer. Use Copy types for small data.

7. Checked Math vs Unchecked Operations

Checked arithmetic adds safety at the cost of compute units.

Comparison:

let mut count: u64 = 1;

// 97,314 CU - Checked multiplication with overflow protection
for _ in 0..12000 {
    count = count.checked_mul(2).expect("overflow");
}

// 85,113 CU - Bit shift operation (12% savings)
// Equivalent to multiply by 2, but unchecked
for _ in 0..12000 {
    count = count << 1;
}

Trade-off: Unchecked operations are faster but risk overflow bugs that can lead to serious security vulnerabilities.

Best practice:

  • Use checked math by default for safety
  • Profile your program to identify hot paths
  • Only switch to unchecked math when:
    • You've proven overflow is impossible
    • CU savings are critical
    • You've added overflow tests

Compiler configuration (in Cargo.toml):

[profile.release]
overflow-checks = true  # Keep overflow checks even in release mode

Framework Comparison

Different implementation approaches offer varying trade-offs between developer experience, safety, and performance.

Implementation Binary Size Deploy Cost Init CU Increment CU
Anchor 265,677 bytes 1.85 SOL 6,302 946
Anchor Zero-Copy Same 1.85 SOL 5,020 ~1,254
Native Rust 48,573 bytes 0.34 SOL - 843
Unsafe Rust 973 bytes 0.008 SOL - 5
Assembly (SBPF) 1,389 bytes 0.01 SOL - 4
C 1,333 bytes 0.01 SOL - 5

Key insights:

  • Anchor: Best developer experience, automatic account validation, but highest CU and deployment costs
  • Anchor Zero-Copy: Significant CU improvement over standard Anchor with minimal code changes
  • Native Rust: 11% CU savings over Anchor, 82% smaller deployment size, moderate complexity
  • Unsafe Rust: 99% CU savings, minimal size, but requires extreme care and deep expertise
  • Assembly/C: Maximum optimization possible, but very difficult to develop and maintain

Recommendation: Start with Anchor or native Rust. Optimize hot paths with zero-copy. Only consider unsafe Rust or lower-level languages for critical performance bottlenecks after profiling.

Advanced Optimization Techniques

1. Compiler Flags

Configure optimization in Cargo.toml:

[profile.release]
opt-level = 3           # Maximum optimization
lto = "fat"             # Full link-time optimization
codegen-units = 1       # Single codegen unit for better optimization
overflow-checks = true  # Keep safety checks despite performance cost

Trade-offs:

  • overflow-checks = false: Saves CU but removes critical safety checks
  • Higher opt-level: Better performance but slower compilation
  • lto = "fat": Maximum optimization but much slower builds

2. Function Inlining

Control function inlining to balance CU usage and stack space:

// Force inlining - saves CU by eliminating function call overhead
#[inline(always)]
fn add(a: u64, b: u64) -> u64 {
    a + b
}

// Prevent inlining - saves stack space at the cost of CU
#[inline(never)]
pub fn complex_operation() {
    // Large function body
}

Trade-off: Inlining saves CU but increases stack usage. Solana has a 4KB stack limit, so excessive inlining can cause stack overflow.

3. Alternative Entry Points

The standard Solana entry point adds overhead. Alternatives:

Standard entry point:

use solana_program::entrypoint;
entrypoint!(process_instruction);

Minimal entry points:

Warning: These require deep understanding of Solana internals and unsafe Rust. Only use for extreme optimization needs.

4. Custom Heap Allocators

Solana's default bump allocator never frees memory during transaction execution.

Problem:

// This will eventually run out of heap space (32KB limit)
for _ in 0..1000 {
    let v = vec![0u8; 1024];  // Each iteration uses more heap
    // Memory is never freed!
}

Solution - Custom allocators:

  • smalloc: Used by Metaplex programs, provides better memory management
  • Prevents out-of-memory errors in memory-intensive operations

Implementation (advanced):

#[global_allocator]
static ALLOCATOR: custom_allocator::CustomAllocator = custom_allocator::CustomAllocator;

5. Boxing and Heap Allocation

Heap operations cost more CU than stack operations.

// Stack allocation - faster
let data = [0u8; 100];

// Heap allocation - slower, uses more CU
let data = Box::new([0u8; 100]);

Best practice: Avoid Box, Vec, and other heap allocations when stack allocation is possible and doesn't risk overflow.

Measuring Compute Units

Using sol_log_compute_units()

Built-in logging function to track CU consumption:

use solana_program::log::sol_log_compute_units;

pub fn my_instruction(ctx: Context<MyContext>) -> Result<()> {
    sol_log_compute_units(); // Log remaining CU

    // ... do some work ...

    sol_log_compute_units(); // Log remaining CU again
    Ok(())
}

Output in transaction logs:

Program consumption: 200000 units remaining
Program consumption: 195432 units remaining

CU used = 200000 - 195432 = 4,568 CU

compute_fn! Macro

Convenient macro for measuring specific code blocks (costs 409 CU overhead):

#[macro_export]
macro_rules! compute_fn {
    ($msg:expr=> $($tt:tt)*) => {
        ::solana_program::msg!(concat!($msg, " {"));
        ::solana_program::log::sol_log_compute_units();
        let res = { $($tt)* };
        ::solana_program::log::sol_log_compute_units();
        ::solana_program::msg!(concat!(" } // ", $msg));
        res
    };
}

Usage:

let result = compute_fn! { "My expensive operation" =>
    expensive_computation()
};

Output:

Program log: My expensive operation {
Program consumption: 195432 units remaining
Program consumption: 180123 units remaining
Program log: } // My expensive operation

Actual CU = (195432 - 180123) - 409 (macro overhead) = 14,900 CU

Using Mollusk Bencher

For native Rust programs, use Mollusk's built-in benchmarking (see main SKILL.md for details).

Anti-Patterns to Avoid

1. Excessive Logging

// BAD: Logging in production
msg!("Processing user {}", user_pubkey);
msg!("Amount: {}", amount);
msg!("Timestamp: {}", Clock::get()?.unix_timestamp);

Solution: Remove logging or use conditional compilation:

#[cfg(feature = "debug")]
msg!("Processing user {}", user_pubkey);

2. Large Data Types for Small Values

// BAD: Using u64 when u8 suffices
pub struct Config {
    pub fee_percentage: u64,  // Only 0-100
    pub max_items: u64,       // Only 0-255
}

// GOOD: Use smallest type
pub struct Config {
    pub fee_percentage: u8,   // 0-100
    pub max_items: u8,        // 0-255
}

3. Cloning Large Structures

// BAD: Unnecessary clone
fn process_data(data: Vec<u8>) -> Result<()> {
    let copy = data.clone();  // Wastes CU and heap
    // ...
}

// GOOD: Pass by reference
fn process_data(data: &[u8]) -> Result<()> {
    // Work directly with reference
}

4. Repeated PDA Derivation

// BAD: Finding bump every time
#[account(
    seeds = [b"vault"],
    bump  // Finds bump on every call!
)]
pub vault: Account<'info, Vault>,

// GOOD: Use saved bump
#[account(
    seeds = [b"vault"],
    bump = vault.bump  // Uses saved bump
)]
pub vault: Account<'info, Vault>,

5. Unnecessary Boxing

// BAD: Boxing adds heap overhead
let value = Box::new(calculate_value());

// GOOD: Keep on stack
let value = calculate_value();

6. String Operations

// BAD: String concatenation and formatting
let message = format!("User {} sent {} tokens", user, amount);
msg!(&message);

// GOOD: Use separate logs or remove entirely
user.log();
amount.log();

7. Deep CPI Chains

Each CPI adds significant overhead. Avoid unnecessary indirection:

// BAD: Unnecessary CPI
invoke(
    &my_helper_program::process(),
    &accounts,
)?;

// GOOD: Direct implementation
process_directly(&accounts)?;

8. Not Using Zero-Copy for Large Accounts

// BAD: Large account with standard serialization
#[account]
pub struct LargeData {
    pub items: [u64; 1000],  // Expensive to serialize/deserialize
}

// GOOD: Use zero-copy
#[account(zero_copy)]
#[repr(C)]
pub struct LargeData {
    pub items: [u64; 1000],  // Direct memory access
}

Best Practices Summary

  1. Minimize or eliminate logging in production code
  2. Use zero-copy for accounts with large data structures
  3. Cache PDA bumps - derive once, store in account, reuse
  4. Choose smallest data types that meet your requirements
  5. Pass by reference instead of cloning data
  6. Profile before optimizing - measure CU usage to identify bottlenecks
  7. Consider native Rust over Anchor for performance-critical programs
  8. Use vec![] initialization instead of repeated push() calls
  9. Avoid unnecessary CPIs - use direct operations when safe
  10. Balance safety vs performance - don't sacrifice security without careful analysis
  11. Test CU usage regularly - include benchmarks in your test suite
  12. Use checked math by default - only optimize to unchecked when proven safe
  13. Minimize heap allocations - prefer stack when possible
  14. Remove or conditionally compile debug code for production builds
  15. Consider zero-copy for frequently updated accounts - 50%+ CU savings

Additional Resources

Official Documentation

Code Examples and Tools

Video Guides

Technical Articles

Advanced Tools