Files
2025-11-30 09:01:25 +08:00

14 KiB

Serialization and Data Handling

This reference provides comprehensive coverage of data serialization and deserialization patterns for native Rust Solana program development, focusing on Borsh and account data layout best practices.

Table of Contents

  1. Why Borsh for Solana
  2. Basic Borsh Usage
  3. Account Data Layout Design
  4. Serialization Patterns
  5. Zero-Copy Deserialization
  6. Data Versioning
  7. Performance Considerations
  8. Common Pitfalls

Why Borsh for Solana

Borsh (Binary Object Representation Serializer for Hashing) is the recommended serialization format for Solana programs.

Advantages

  1. Deterministic: Same data always produces same bytes
  2. Compact: Efficient binary encoding
  3. Fast: Lower compute unit cost than alternatives
  4. Strict Schema: Type-safe serialization/deserialization
  5. No Metadata: Unlike JSON, no field names in output

vs Alternatives

Format CU Cost Size Type Safety Deterministic
Borsh Low Compact Yes Yes
bincode High Compact Yes ⚠️ Config-dependent
JSON Very High Large No No
MessagePack ⚠️ Medium Compact ⚠️ Partial ⚠️ Mostly

Recommendation: Use Borsh for all program account data.


Basic Borsh Usage

Dependencies

[dependencies]
borsh = { version = "1.5", features = ["derive"] }

Deriving Borsh Traits

use borsh::{BorshDeserialize, BorshSerialize};

#[derive(BorshSerialize, BorshDeserialize, Debug, Clone)]
pub struct UserAccount {
    pub user: Pubkey,
    pub balance: u64,
    pub created_at: i64,
}

Serialization

To bytes:

let account_data = UserAccount {
    user: Pubkey::new_unique(),
    balance: 1000,
    created_at: 1234567890,
};

// Serialize to Vec<u8>
let bytes = account_data.try_to_vec()?;

// Serialize to existing buffer
let mut buffer = vec![0u8; 100];
account_data.serialize(&mut buffer.as_mut_slice())?;

Deserialization

From bytes:

// Deserialize from slice
let account_data = UserAccount::try_from_slice(&bytes)?;

// Deserialize with BorshDeserialize
let mut cursor = &bytes[..];
let account_data = UserAccount::deserialize(&mut cursor)?;

Account Data Layout Design

Basic Structure

#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountData {
    // 1. Discriminator / Type Field (1 byte)
    pub account_type: u8,

    // 2. Flags / State (1 byte)
    pub is_initialized: bool,

    // 3. Fixed-size fields (predictable layout)
    pub owner: Pubkey,           // 32 bytes
    pub created_at: i64,         // 8 bytes
    pub counter: u64,            // 8 bytes

    // 4. Variable-size fields (at end)
    pub name: String,            // 4 + length
    pub metadata: Vec<u8>,       // 4 + length
}

Size calculation:

1 (type) + 1 (flag) + 32 (pubkey) + 8 (i64) + 8 (u64) + 4 (string len) + N (string) + 4 (vec len) + M (vec)
= 58 + N + M bytes

Size Calculation Helper

impl AccountData {
    pub const FIXED_SIZE: usize = 58;  // All fixed fields

    pub fn calculate_size(name_len: usize, metadata_len: usize) -> usize {
        Self::FIXED_SIZE + name_len + metadata_len
    }

    pub fn max_size(max_name: usize, max_metadata: usize) -> usize {
        Self::calculate_size(max_name, max_metadata)
    }
}

// Usage
let account_size = AccountData::max_size(32, 256);  // 346 bytes

Fixed-Size Accounts

Best for performance:

#[derive(BorshSerialize, BorshDeserialize)]
pub struct FixedAccount {
    pub is_initialized: bool,
    pub owner: Pubkey,
    pub balance: u64,
    pub last_updated: i64,
    // Fixed-size array instead of Vec
    pub data: [u8; 256],
}

impl FixedAccount {
    pub const SIZE: usize = 1 + 32 + 8 + 8 + 256;  // 305 bytes
}

Serialization Patterns

Most common pattern for account deserialization:

use borsh::BorshDeserialize;

pub fn load_account_data(
    account_info: &AccountInfo,
) -> Result<UserAccount, ProgramError> {
    let data = UserAccount::try_from_slice(&account_info.data.borrow())?;
    Ok(data)
}

Error handling:

let data = UserAccount::try_from_slice(&account_info.data.borrow())
    .map_err(|e| {
        msg!("Failed to deserialize account: {}", e);
        ProgramError::InvalidAccountData
    })?;

Pattern 2: Unchecked Deserialization

Use when you've already validated the account:

use borsh::try_from_slice_unchecked;

// After validation checks
let mut data = try_from_slice_unchecked::<UserAccount>(&account_info.data.borrow())
    .unwrap();  // Safe because we validated

⚠️ Warning: Only use after thorough validation. Skips some safety checks.

Pattern 3: Partial Deserialization

Read only what you need:

#[derive(BorshDeserialize)]
pub struct AccountHeader {
    pub account_type: u8,
    pub is_initialized: bool,
    pub owner: Pubkey,
}

// Deserialize just the header
let header = AccountHeader::try_from_slice(&account_info.data.borrow()[..42])?;

if !header.is_initialized {
    return Err(ProgramError::UninitializedAccount);
}

Pattern 4: In-Place Modification

Efficient for large accounts:

pub fn update_balance(
    account_info: &AccountInfo,
    new_balance: u64,
) -> ProgramResult {
    let mut data = account_info.data.borrow_mut();

    // Deserialize
    let mut account = UserAccount::try_from_slice(&data)?;

    // Modify
    account.balance = new_balance;
    account.last_updated = Clock::get()?.unix_timestamp;

    // Serialize back
    account.serialize(&mut &mut data[..])?;

    Ok(())
}

Pattern 5: Bulk Operations

Processing multiple accounts:

pub fn process_accounts(
    accounts: &[AccountInfo],
) -> ProgramResult {
    let account_data: Vec<UserAccount> = accounts
        .iter()
        .map(|acc| UserAccount::try_from_slice(&acc.data.borrow()))
        .collect::<Result<Vec<_>, _>>()?;

    // Process all accounts
    for (i, data) in account_data.iter().enumerate() {
        msg!("Account {}: balance = {}", i, data.balance);
    }

    Ok(())
}

Zero-Copy Deserialization

When to Use Zero-Copy

Benefits:

  • Avoids memory allocation
  • Reduces compute units (50%+ savings for large structs)
  • Direct access to account data

Use when:

  • Account data is large (> 100 bytes)
  • Frequent reads
  • Performance-critical paths

Bytemuck Pattern

[dependencies]
bytemuck = { version = "1.14", features = ["derive"] }
use bytemuck::{Pod, Zeroable};

#[repr(C)]
#[derive(Copy, Clone, Pod, Zeroable)]
pub struct ZeroCopyAccount {
    pub is_initialized: u8,      // bool as u8
    pub owner: [u8; 32],         // Pubkey as bytes
    pub balance: u64,
    pub counter: u64,
}

impl ZeroCopyAccount {
    pub const SIZE: usize = std::mem::size_of::<Self>();

    pub fn from_account_info(account_info: &AccountInfo) -> Result<&Self, ProgramError> {
        let data = account_info.data.borrow();
        bytemuck::try_from_bytes(&data)
            .map_err(|_| ProgramError::InvalidAccountData)
    }

    pub fn from_account_info_mut(
        account_info: &AccountInfo,
    ) -> Result<&mut Self, ProgramError> {
        let data = account_info.data.borrow_mut();
        bytemuck::try_from_bytes_mut(&mut data)
            .map_err(|_| ProgramError::InvalidAccountData)
    }
}

// Usage
let account = ZeroCopyAccount::from_account_info(account_info)?;
msg!("Balance: {}", account.balance);

// Mutable access
let account = ZeroCopyAccount::from_account_info_mut(account_info)?;
account.balance += 100;

⚠️ Limitations:

  • Only works with types that are Pod (Plain Old Data)
  • No String, Vec, or other heap-allocated types
  • Must be #[repr(C)] for stable layout

Data Versioning

Pattern 1: Version Field

#[derive(BorshSerialize, BorshDeserialize)]
pub struct VersionedAccount {
    pub version: u8,
    pub data: AccountDataEnum,
}

#[derive(BorshSerialize, BorshDeserialize)]
pub enum AccountDataEnum {
    V1(AccountDataV1),
    V2(AccountDataV2),
}

#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountDataV1 {
    pub balance: u64,
}

#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountDataV2 {
    pub balance: u64,
    pub last_updated: i64,  // New field
}

// Deserialization with version handling
pub fn load_versioned_account(
    account_info: &AccountInfo,
) -> ProgramResult {
    let versioned = VersionedAccount::try_from_slice(&account_info.data.borrow())?;

    match versioned.data {
        AccountDataEnum::V1(data_v1) => {
            msg!("V1 account: balance = {}", data_v1.balance);
        }
        AccountDataEnum::V2(data_v2) => {
            msg!("V2 account: balance = {}, updated = {}",
                data_v2.balance, data_v2.last_updated);
        }
    }

    Ok(())
}

Pattern 2: Optional Fields

#[derive(BorshSerialize, BorshDeserialize)]
pub struct Account {
    pub balance: u64,

    // V2: Added optional field
    pub metadata: Option<Metadata>,
}

#[derive(BorshSerialize, BorshDeserialize)]
pub struct Metadata {
    pub name: String,
    pub url: String,
}

// Old accounts: metadata = None
// New accounts: metadata = Some(Metadata { ... })

Pattern 3: Migration Function

pub fn migrate_account_v1_to_v2(
    account_info: &AccountInfo,
) -> ProgramResult {
    // Load V1
    let data_v1 = AccountDataV1::try_from_slice(&account_info.data.borrow())?;

    // Convert to V2
    let data_v2 = AccountDataV2 {
        balance: data_v1.balance,
        last_updated: Clock::get()?.unix_timestamp,
    };

    // Reallocate if needed
    let new_size = data_v2.try_to_vec()?.len();
    account_info.realloc(new_size, false)?;

    // Serialize V2
    data_v2.serialize(&mut &mut account_info.data.borrow_mut()[..])?;

    Ok(())
}

Performance Considerations

Compute Unit Costs

Serialization costs (approximate):

Operation CU Cost
Serialize small struct (< 100 bytes) ~500 CU
Serialize large struct (> 1KB) ~2,000 CU
Deserialize small struct ~800 CU
Deserialize large struct ~3,000 CU
Zero-copy access ~100 CU

Optimization Tips

1. Minimize serialization frequency:

// ❌ Wasteful - serializes twice
let mut data = load_data(account)?;
data.field1 = value1;
save_data(account, &data)?;

data.field2 = value2;
save_data(account, &data)?;  // Serialize again!

// ✅ Efficient - serialize once
let mut data = load_data(account)?;
data.field1 = value1;
data.field2 = value2;
save_data(account, &data)?;

2. Use fixed-size fields:

// ❌ Variable size - more expensive
pub struct Account {
    pub name: String,        // 4 + N bytes
}

// ✅ Fixed size - cheaper
pub struct Account {
    pub name: [u8; 32],      // Exactly 32 bytes
}

3. Order fields by size:

// ✅ Optimized layout (largest first)
#[derive(BorshSerialize, BorshDeserialize)]
#[repr(C)]
pub struct OptimizedAccount {
    pub pubkey1: Pubkey,     // 32 bytes
    pub pubkey2: Pubkey,     // 32 bytes
    pub amount: u64,         // 8 bytes
    pub timestamp: i64,      // 8 bytes
    pub flags: u8,           // 1 byte
}

Common Pitfalls

1. Buffer Too Small

// ❌ Error: buffer too small
let mut buffer = vec![0u8; 10];
large_struct.serialize(&mut buffer.as_mut_slice())?;  // Fails!

// ✅ Correct: proper size
let size = large_struct.try_to_vec()?.len();
let mut buffer = vec![0u8; size];
large_struct.serialize(&mut buffer.as_mut_slice())?;

2. Forgetting to Borrow

// ❌ Error: data moved
let data = account_info.data;
UserAccount::try_from_slice(&data)?;  // Fails!

// ✅ Correct: borrow data
let data = account_info.data.borrow();
UserAccount::try_from_slice(&data)?;

3. Mismatched Schema

// Account created with V1
#[derive(BorshSerialize)]
pub struct AccountV1 {
    pub balance: u64,
}

// Later, trying to deserialize as V2
#[derive(BorshDeserialize)]
pub struct AccountV2 {
    pub balance: u64,
    pub timestamp: i64,  // New field!
}

// ❌ Fails: not enough bytes
let data = AccountV2::try_from_slice(&bytes)?;  // Error!

Solution: Use versioning or optional fields.

4. String/Vec Limits

// ❌ No validation
#[derive(BorshSerialize, BorshDeserialize)]
pub struct Account {
    pub name: String,  // Could be 10MB!
}

// ✅ Validate before deserializing
pub fn validate_name(name: &str) -> ProgramResult {
    if name.len() > 32 {
        return Err(ProgramError::InvalidArgument);
    }
    Ok(())
}

5. Incorrect Size Calculation

// ❌ Wrong: ignores vector length prefix
let size = my_vec.len();

// ✅ Correct: includes 4-byte length prefix
let size = 4 + my_vec.len();

Summary

Key Takeaways:

  1. Use Borsh for all Solana program serialization
  2. Design fixed-size layouts when possible for predictability
  3. Validate before deserializing to prevent errors
  4. Use zero-copy for large, frequently-accessed data
  5. Plan for versioning from the start
  6. Minimize serialization frequency to save compute units

Common Patterns:

// Deserialize
let data = AccountData::try_from_slice(&account_info.data.borrow())?;

// Modify
let mut data = data;
data.field = new_value;

// Serialize
data.serialize(&mut &mut account_info.data.borrow_mut()[..])?;

Size Calculation:

// Fixed fields
const FIXED_SIZE: usize = 1 + 32 + 8;

// Variable fields
let total_size = FIXED_SIZE + 4 + string.len() + 4 + vec.len();

Proper serialization patterns are fundamental to efficient and correct Solana programs. Master Borsh for production-ready data handling.