# Serialization and Data Handling This reference provides comprehensive coverage of data serialization and deserialization patterns for native Rust Solana program development, focusing on Borsh and account data layout best practices. ## Table of Contents 1. [Why Borsh for Solana](#why-borsh-for-solana) 2. [Basic Borsh Usage](#basic-borsh-usage) 3. [Account Data Layout Design](#account-data-layout-design) 4. [Serialization Patterns](#serialization-patterns) 5. [Zero-Copy Deserialization](#zero-copy-deserialization) 6. [Data Versioning](#data-versioning) 7. [Performance Considerations](#performance-considerations) 8. [Common Pitfalls](#common-pitfalls) --- ## Why Borsh for Solana **Borsh (Binary Object Representation Serializer for Hashing)** is the recommended serialization format for Solana programs. ### Advantages 1. **Deterministic:** Same data always produces same bytes 2. **Compact:** Efficient binary encoding 3. **Fast:** Lower compute unit cost than alternatives 4. **Strict Schema:** Type-safe serialization/deserialization 5. **No Metadata:** Unlike JSON, no field names in output ### vs Alternatives | Format | CU Cost | Size | Type Safety | Deterministic | |--------|---------|------|-------------|---------------| | **Borsh** | ✅ Low | ✅ Compact | ✅ Yes | ✅ Yes | | bincode | ❌ High | ✅ Compact | ✅ Yes | ⚠️ Config-dependent | | JSON | ❌ Very High | ❌ Large | ❌ No | ❌ No | | MessagePack | ⚠️ Medium | ✅ Compact | ⚠️ Partial | ⚠️ Mostly | **Recommendation:** Use Borsh for all program account data. --- ## Basic Borsh Usage ### Dependencies ```toml [dependencies] borsh = { version = "1.5", features = ["derive"] } ``` ### Deriving Borsh Traits ```rust use borsh::{BorshDeserialize, BorshSerialize}; #[derive(BorshSerialize, BorshDeserialize, Debug, Clone)] pub struct UserAccount { pub user: Pubkey, pub balance: u64, pub created_at: i64, } ``` ### Serialization **To bytes:** ```rust let account_data = UserAccount { user: Pubkey::new_unique(), balance: 1000, created_at: 1234567890, }; // Serialize to Vec let bytes = account_data.try_to_vec()?; // Serialize to existing buffer let mut buffer = vec![0u8; 100]; account_data.serialize(&mut buffer.as_mut_slice())?; ``` ### Deserialization **From bytes:** ```rust // Deserialize from slice let account_data = UserAccount::try_from_slice(&bytes)?; // Deserialize with BorshDeserialize let mut cursor = &bytes[..]; let account_data = UserAccount::deserialize(&mut cursor)?; ``` --- ## Account Data Layout Design ### Basic Structure ```rust #[derive(BorshSerialize, BorshDeserialize)] pub struct AccountData { // 1. Discriminator / Type Field (1 byte) pub account_type: u8, // 2. Flags / State (1 byte) pub is_initialized: bool, // 3. Fixed-size fields (predictable layout) pub owner: Pubkey, // 32 bytes pub created_at: i64, // 8 bytes pub counter: u64, // 8 bytes // 4. Variable-size fields (at end) pub name: String, // 4 + length pub metadata: Vec, // 4 + length } ``` **Size calculation:** ``` 1 (type) + 1 (flag) + 32 (pubkey) + 8 (i64) + 8 (u64) + 4 (string len) + N (string) + 4 (vec len) + M (vec) = 58 + N + M bytes ``` ### Size Calculation Helper ```rust impl AccountData { pub const FIXED_SIZE: usize = 58; // All fixed fields pub fn calculate_size(name_len: usize, metadata_len: usize) -> usize { Self::FIXED_SIZE + name_len + metadata_len } pub fn max_size(max_name: usize, max_metadata: usize) -> usize { Self::calculate_size(max_name, max_metadata) } } // Usage let account_size = AccountData::max_size(32, 256); // 346 bytes ``` ### Fixed-Size Accounts **Best for performance:** ```rust #[derive(BorshSerialize, BorshDeserialize)] pub struct FixedAccount { pub is_initialized: bool, pub owner: Pubkey, pub balance: u64, pub last_updated: i64, // Fixed-size array instead of Vec pub data: [u8; 256], } impl FixedAccount { pub const SIZE: usize = 1 + 32 + 8 + 8 + 256; // 305 bytes } ``` --- ## Serialization Patterns ### Pattern 1: try_from_slice (Recommended) **Most common pattern for account deserialization:** ```rust use borsh::BorshDeserialize; pub fn load_account_data( account_info: &AccountInfo, ) -> Result { let data = UserAccount::try_from_slice(&account_info.data.borrow())?; Ok(data) } ``` **Error handling:** ```rust let data = UserAccount::try_from_slice(&account_info.data.borrow()) .map_err(|e| { msg!("Failed to deserialize account: {}", e); ProgramError::InvalidAccountData })?; ``` ### Pattern 2: Unchecked Deserialization **Use when you've already validated the account:** ```rust use borsh::try_from_slice_unchecked; // After validation checks let mut data = try_from_slice_unchecked::(&account_info.data.borrow()) .unwrap(); // Safe because we validated ``` **⚠️ Warning:** Only use after thorough validation. Skips some safety checks. ### Pattern 3: Partial Deserialization **Read only what you need:** ```rust #[derive(BorshDeserialize)] pub struct AccountHeader { pub account_type: u8, pub is_initialized: bool, pub owner: Pubkey, } // Deserialize just the header let header = AccountHeader::try_from_slice(&account_info.data.borrow()[..42])?; if !header.is_initialized { return Err(ProgramError::UninitializedAccount); } ``` ### Pattern 4: In-Place Modification **Efficient for large accounts:** ```rust pub fn update_balance( account_info: &AccountInfo, new_balance: u64, ) -> ProgramResult { let mut data = account_info.data.borrow_mut(); // Deserialize let mut account = UserAccount::try_from_slice(&data)?; // Modify account.balance = new_balance; account.last_updated = Clock::get()?.unix_timestamp; // Serialize back account.serialize(&mut &mut data[..])?; Ok(()) } ``` ### Pattern 5: Bulk Operations **Processing multiple accounts:** ```rust pub fn process_accounts( accounts: &[AccountInfo], ) -> ProgramResult { let account_data: Vec = accounts .iter() .map(|acc| UserAccount::try_from_slice(&acc.data.borrow())) .collect::, _>>()?; // Process all accounts for (i, data) in account_data.iter().enumerate() { msg!("Account {}: balance = {}", i, data.balance); } Ok(()) } ``` --- ## Zero-Copy Deserialization ### When to Use Zero-Copy **Benefits:** - Avoids memory allocation - Reduces compute units (50%+ savings for large structs) - Direct access to account data **Use when:** - Account data is large (> 100 bytes) - Frequent reads - Performance-critical paths ### Bytemuck Pattern ```toml [dependencies] bytemuck = { version = "1.14", features = ["derive"] } ``` ```rust use bytemuck::{Pod, Zeroable}; #[repr(C)] #[derive(Copy, Clone, Pod, Zeroable)] pub struct ZeroCopyAccount { pub is_initialized: u8, // bool as u8 pub owner: [u8; 32], // Pubkey as bytes pub balance: u64, pub counter: u64, } impl ZeroCopyAccount { pub const SIZE: usize = std::mem::size_of::(); pub fn from_account_info(account_info: &AccountInfo) -> Result<&Self, ProgramError> { let data = account_info.data.borrow(); bytemuck::try_from_bytes(&data) .map_err(|_| ProgramError::InvalidAccountData) } pub fn from_account_info_mut( account_info: &AccountInfo, ) -> Result<&mut Self, ProgramError> { let data = account_info.data.borrow_mut(); bytemuck::try_from_bytes_mut(&mut data) .map_err(|_| ProgramError::InvalidAccountData) } } // Usage let account = ZeroCopyAccount::from_account_info(account_info)?; msg!("Balance: {}", account.balance); // Mutable access let account = ZeroCopyAccount::from_account_info_mut(account_info)?; account.balance += 100; ``` **⚠️ Limitations:** - Only works with types that are `Pod` (Plain Old Data) - No `String`, `Vec`, or other heap-allocated types - Must be `#[repr(C)]` for stable layout --- ## Data Versioning ### Pattern 1: Version Field ```rust #[derive(BorshSerialize, BorshDeserialize)] pub struct VersionedAccount { pub version: u8, pub data: AccountDataEnum, } #[derive(BorshSerialize, BorshDeserialize)] pub enum AccountDataEnum { V1(AccountDataV1), V2(AccountDataV2), } #[derive(BorshSerialize, BorshDeserialize)] pub struct AccountDataV1 { pub balance: u64, } #[derive(BorshSerialize, BorshDeserialize)] pub struct AccountDataV2 { pub balance: u64, pub last_updated: i64, // New field } // Deserialization with version handling pub fn load_versioned_account( account_info: &AccountInfo, ) -> ProgramResult { let versioned = VersionedAccount::try_from_slice(&account_info.data.borrow())?; match versioned.data { AccountDataEnum::V1(data_v1) => { msg!("V1 account: balance = {}", data_v1.balance); } AccountDataEnum::V2(data_v2) => { msg!("V2 account: balance = {}, updated = {}", data_v2.balance, data_v2.last_updated); } } Ok(()) } ``` ### Pattern 2: Optional Fields ```rust #[derive(BorshSerialize, BorshDeserialize)] pub struct Account { pub balance: u64, // V2: Added optional field pub metadata: Option, } #[derive(BorshSerialize, BorshDeserialize)] pub struct Metadata { pub name: String, pub url: String, } // Old accounts: metadata = None // New accounts: metadata = Some(Metadata { ... }) ``` ### Pattern 3: Migration Function ```rust pub fn migrate_account_v1_to_v2( account_info: &AccountInfo, ) -> ProgramResult { // Load V1 let data_v1 = AccountDataV1::try_from_slice(&account_info.data.borrow())?; // Convert to V2 let data_v2 = AccountDataV2 { balance: data_v1.balance, last_updated: Clock::get()?.unix_timestamp, }; // Reallocate if needed let new_size = data_v2.try_to_vec()?.len(); account_info.realloc(new_size, false)?; // Serialize V2 data_v2.serialize(&mut &mut account_info.data.borrow_mut()[..])?; Ok(()) } ``` --- ## Performance Considerations ### Compute Unit Costs **Serialization costs (approximate):** | Operation | CU Cost | |-----------|---------| | Serialize small struct (< 100 bytes) | ~500 CU | | Serialize large struct (> 1KB) | ~2,000 CU | | Deserialize small struct | ~800 CU | | Deserialize large struct | ~3,000 CU | | Zero-copy access | ~100 CU | ### Optimization Tips **1. Minimize serialization frequency:** ```rust // ❌ Wasteful - serializes twice let mut data = load_data(account)?; data.field1 = value1; save_data(account, &data)?; data.field2 = value2; save_data(account, &data)?; // Serialize again! // ✅ Efficient - serialize once let mut data = load_data(account)?; data.field1 = value1; data.field2 = value2; save_data(account, &data)?; ``` **2. Use fixed-size fields:** ```rust // ❌ Variable size - more expensive pub struct Account { pub name: String, // 4 + N bytes } // ✅ Fixed size - cheaper pub struct Account { pub name: [u8; 32], // Exactly 32 bytes } ``` **3. Order fields by size:** ```rust // ✅ Optimized layout (largest first) #[derive(BorshSerialize, BorshDeserialize)] #[repr(C)] pub struct OptimizedAccount { pub pubkey1: Pubkey, // 32 bytes pub pubkey2: Pubkey, // 32 bytes pub amount: u64, // 8 bytes pub timestamp: i64, // 8 bytes pub flags: u8, // 1 byte } ``` --- ## Common Pitfalls ### 1. Buffer Too Small ```rust // ❌ Error: buffer too small let mut buffer = vec![0u8; 10]; large_struct.serialize(&mut buffer.as_mut_slice())?; // Fails! // ✅ Correct: proper size let size = large_struct.try_to_vec()?.len(); let mut buffer = vec![0u8; size]; large_struct.serialize(&mut buffer.as_mut_slice())?; ``` ### 2. Forgetting to Borrow ```rust // ❌ Error: data moved let data = account_info.data; UserAccount::try_from_slice(&data)?; // Fails! // ✅ Correct: borrow data let data = account_info.data.borrow(); UserAccount::try_from_slice(&data)?; ``` ### 3. Mismatched Schema ```rust // Account created with V1 #[derive(BorshSerialize)] pub struct AccountV1 { pub balance: u64, } // Later, trying to deserialize as V2 #[derive(BorshDeserialize)] pub struct AccountV2 { pub balance: u64, pub timestamp: i64, // New field! } // ❌ Fails: not enough bytes let data = AccountV2::try_from_slice(&bytes)?; // Error! ``` **Solution:** Use versioning or optional fields. ### 4. String/Vec Limits ```rust // ❌ No validation #[derive(BorshSerialize, BorshDeserialize)] pub struct Account { pub name: String, // Could be 10MB! } // ✅ Validate before deserializing pub fn validate_name(name: &str) -> ProgramResult { if name.len() > 32 { return Err(ProgramError::InvalidArgument); } Ok(()) } ``` ### 5. Incorrect Size Calculation ```rust // ❌ Wrong: ignores vector length prefix let size = my_vec.len(); // ✅ Correct: includes 4-byte length prefix let size = 4 + my_vec.len(); ``` --- ## Summary **Key Takeaways:** 1. **Use Borsh** for all Solana program serialization 2. **Design fixed-size layouts** when possible for predictability 3. **Validate before deserializing** to prevent errors 4. **Use zero-copy** for large, frequently-accessed data 5. **Plan for versioning** from the start 6. **Minimize serialization frequency** to save compute units **Common Patterns:** ```rust // Deserialize let data = AccountData::try_from_slice(&account_info.data.borrow())?; // Modify let mut data = data; data.field = new_value; // Serialize data.serialize(&mut &mut account_info.data.borrow_mut()[..])?; ``` **Size Calculation:** ```rust // Fixed fields const FIXED_SIZE: usize = 1 + 32 + 8; // Variable fields let total_size = FIXED_SIZE + 4 + string.len() + 4 + vec.len(); ``` Proper serialization patterns are fundamental to efficient and correct Solana programs. Master Borsh for production-ready data handling.