Files
gh-tenequm-claude-plugins-s…/skills/solana-development/references/serialization.md
2025-11-30 09:01:25 +08:00

621 lines
14 KiB
Markdown

# Serialization and Data Handling
This reference provides comprehensive coverage of data serialization and deserialization patterns for native Rust Solana program development, focusing on Borsh and account data layout best practices.
## Table of Contents
1. [Why Borsh for Solana](#why-borsh-for-solana)
2. [Basic Borsh Usage](#basic-borsh-usage)
3. [Account Data Layout Design](#account-data-layout-design)
4. [Serialization Patterns](#serialization-patterns)
5. [Zero-Copy Deserialization](#zero-copy-deserialization)
6. [Data Versioning](#data-versioning)
7. [Performance Considerations](#performance-considerations)
8. [Common Pitfalls](#common-pitfalls)
---
## Why Borsh for Solana
**Borsh (Binary Object Representation Serializer for Hashing)** is the recommended serialization format for Solana programs.
### Advantages
1. **Deterministic:** Same data always produces same bytes
2. **Compact:** Efficient binary encoding
3. **Fast:** Lower compute unit cost than alternatives
4. **Strict Schema:** Type-safe serialization/deserialization
5. **No Metadata:** Unlike JSON, no field names in output
### vs Alternatives
| Format | CU Cost | Size | Type Safety | Deterministic |
|--------|---------|------|-------------|---------------|
| **Borsh** | ✅ Low | ✅ Compact | ✅ Yes | ✅ Yes |
| bincode | ❌ High | ✅ Compact | ✅ Yes | ⚠️ Config-dependent |
| JSON | ❌ Very High | ❌ Large | ❌ No | ❌ No |
| MessagePack | ⚠️ Medium | ✅ Compact | ⚠️ Partial | ⚠️ Mostly |
**Recommendation:** Use Borsh for all program account data.
---
## Basic Borsh Usage
### Dependencies
```toml
[dependencies]
borsh = { version = "1.5", features = ["derive"] }
```
### Deriving Borsh Traits
```rust
use borsh::{BorshDeserialize, BorshSerialize};
#[derive(BorshSerialize, BorshDeserialize, Debug, Clone)]
pub struct UserAccount {
pub user: Pubkey,
pub balance: u64,
pub created_at: i64,
}
```
### Serialization
**To bytes:**
```rust
let account_data = UserAccount {
user: Pubkey::new_unique(),
balance: 1000,
created_at: 1234567890,
};
// Serialize to Vec<u8>
let bytes = account_data.try_to_vec()?;
// Serialize to existing buffer
let mut buffer = vec![0u8; 100];
account_data.serialize(&mut buffer.as_mut_slice())?;
```
### Deserialization
**From bytes:**
```rust
// Deserialize from slice
let account_data = UserAccount::try_from_slice(&bytes)?;
// Deserialize with BorshDeserialize
let mut cursor = &bytes[..];
let account_data = UserAccount::deserialize(&mut cursor)?;
```
---
## Account Data Layout Design
### Basic Structure
```rust
#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountData {
// 1. Discriminator / Type Field (1 byte)
pub account_type: u8,
// 2. Flags / State (1 byte)
pub is_initialized: bool,
// 3. Fixed-size fields (predictable layout)
pub owner: Pubkey, // 32 bytes
pub created_at: i64, // 8 bytes
pub counter: u64, // 8 bytes
// 4. Variable-size fields (at end)
pub name: String, // 4 + length
pub metadata: Vec<u8>, // 4 + length
}
```
**Size calculation:**
```
1 (type) + 1 (flag) + 32 (pubkey) + 8 (i64) + 8 (u64) + 4 (string len) + N (string) + 4 (vec len) + M (vec)
= 58 + N + M bytes
```
### Size Calculation Helper
```rust
impl AccountData {
pub const FIXED_SIZE: usize = 58; // All fixed fields
pub fn calculate_size(name_len: usize, metadata_len: usize) -> usize {
Self::FIXED_SIZE + name_len + metadata_len
}
pub fn max_size(max_name: usize, max_metadata: usize) -> usize {
Self::calculate_size(max_name, max_metadata)
}
}
// Usage
let account_size = AccountData::max_size(32, 256); // 346 bytes
```
### Fixed-Size Accounts
**Best for performance:**
```rust
#[derive(BorshSerialize, BorshDeserialize)]
pub struct FixedAccount {
pub is_initialized: bool,
pub owner: Pubkey,
pub balance: u64,
pub last_updated: i64,
// Fixed-size array instead of Vec
pub data: [u8; 256],
}
impl FixedAccount {
pub const SIZE: usize = 1 + 32 + 8 + 8 + 256; // 305 bytes
}
```
---
## Serialization Patterns
### Pattern 1: try_from_slice (Recommended)
**Most common pattern for account deserialization:**
```rust
use borsh::BorshDeserialize;
pub fn load_account_data(
account_info: &AccountInfo,
) -> Result<UserAccount, ProgramError> {
let data = UserAccount::try_from_slice(&account_info.data.borrow())?;
Ok(data)
}
```
**Error handling:**
```rust
let data = UserAccount::try_from_slice(&account_info.data.borrow())
.map_err(|e| {
msg!("Failed to deserialize account: {}", e);
ProgramError::InvalidAccountData
})?;
```
### Pattern 2: Unchecked Deserialization
**Use when you've already validated the account:**
```rust
use borsh::try_from_slice_unchecked;
// After validation checks
let mut data = try_from_slice_unchecked::<UserAccount>(&account_info.data.borrow())
.unwrap(); // Safe because we validated
```
**⚠️ Warning:** Only use after thorough validation. Skips some safety checks.
### Pattern 3: Partial Deserialization
**Read only what you need:**
```rust
#[derive(BorshDeserialize)]
pub struct AccountHeader {
pub account_type: u8,
pub is_initialized: bool,
pub owner: Pubkey,
}
// Deserialize just the header
let header = AccountHeader::try_from_slice(&account_info.data.borrow()[..42])?;
if !header.is_initialized {
return Err(ProgramError::UninitializedAccount);
}
```
### Pattern 4: In-Place Modification
**Efficient for large accounts:**
```rust
pub fn update_balance(
account_info: &AccountInfo,
new_balance: u64,
) -> ProgramResult {
let mut data = account_info.data.borrow_mut();
// Deserialize
let mut account = UserAccount::try_from_slice(&data)?;
// Modify
account.balance = new_balance;
account.last_updated = Clock::get()?.unix_timestamp;
// Serialize back
account.serialize(&mut &mut data[..])?;
Ok(())
}
```
### Pattern 5: Bulk Operations
**Processing multiple accounts:**
```rust
pub fn process_accounts(
accounts: &[AccountInfo],
) -> ProgramResult {
let account_data: Vec<UserAccount> = accounts
.iter()
.map(|acc| UserAccount::try_from_slice(&acc.data.borrow()))
.collect::<Result<Vec<_>, _>>()?;
// Process all accounts
for (i, data) in account_data.iter().enumerate() {
msg!("Account {}: balance = {}", i, data.balance);
}
Ok(())
}
```
---
## Zero-Copy Deserialization
### When to Use Zero-Copy
**Benefits:**
- Avoids memory allocation
- Reduces compute units (50%+ savings for large structs)
- Direct access to account data
**Use when:**
- Account data is large (> 100 bytes)
- Frequent reads
- Performance-critical paths
### Bytemuck Pattern
```toml
[dependencies]
bytemuck = { version = "1.14", features = ["derive"] }
```
```rust
use bytemuck::{Pod, Zeroable};
#[repr(C)]
#[derive(Copy, Clone, Pod, Zeroable)]
pub struct ZeroCopyAccount {
pub is_initialized: u8, // bool as u8
pub owner: [u8; 32], // Pubkey as bytes
pub balance: u64,
pub counter: u64,
}
impl ZeroCopyAccount {
pub const SIZE: usize = std::mem::size_of::<Self>();
pub fn from_account_info(account_info: &AccountInfo) -> Result<&Self, ProgramError> {
let data = account_info.data.borrow();
bytemuck::try_from_bytes(&data)
.map_err(|_| ProgramError::InvalidAccountData)
}
pub fn from_account_info_mut(
account_info: &AccountInfo,
) -> Result<&mut Self, ProgramError> {
let data = account_info.data.borrow_mut();
bytemuck::try_from_bytes_mut(&mut data)
.map_err(|_| ProgramError::InvalidAccountData)
}
}
// Usage
let account = ZeroCopyAccount::from_account_info(account_info)?;
msg!("Balance: {}", account.balance);
// Mutable access
let account = ZeroCopyAccount::from_account_info_mut(account_info)?;
account.balance += 100;
```
**⚠️ Limitations:**
- Only works with types that are `Pod` (Plain Old Data)
- No `String`, `Vec`, or other heap-allocated types
- Must be `#[repr(C)]` for stable layout
---
## Data Versioning
### Pattern 1: Version Field
```rust
#[derive(BorshSerialize, BorshDeserialize)]
pub struct VersionedAccount {
pub version: u8,
pub data: AccountDataEnum,
}
#[derive(BorshSerialize, BorshDeserialize)]
pub enum AccountDataEnum {
V1(AccountDataV1),
V2(AccountDataV2),
}
#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountDataV1 {
pub balance: u64,
}
#[derive(BorshSerialize, BorshDeserialize)]
pub struct AccountDataV2 {
pub balance: u64,
pub last_updated: i64, // New field
}
// Deserialization with version handling
pub fn load_versioned_account(
account_info: &AccountInfo,
) -> ProgramResult {
let versioned = VersionedAccount::try_from_slice(&account_info.data.borrow())?;
match versioned.data {
AccountDataEnum::V1(data_v1) => {
msg!("V1 account: balance = {}", data_v1.balance);
}
AccountDataEnum::V2(data_v2) => {
msg!("V2 account: balance = {}, updated = {}",
data_v2.balance, data_v2.last_updated);
}
}
Ok(())
}
```
### Pattern 2: Optional Fields
```rust
#[derive(BorshSerialize, BorshDeserialize)]
pub struct Account {
pub balance: u64,
// V2: Added optional field
pub metadata: Option<Metadata>,
}
#[derive(BorshSerialize, BorshDeserialize)]
pub struct Metadata {
pub name: String,
pub url: String,
}
// Old accounts: metadata = None
// New accounts: metadata = Some(Metadata { ... })
```
### Pattern 3: Migration Function
```rust
pub fn migrate_account_v1_to_v2(
account_info: &AccountInfo,
) -> ProgramResult {
// Load V1
let data_v1 = AccountDataV1::try_from_slice(&account_info.data.borrow())?;
// Convert to V2
let data_v2 = AccountDataV2 {
balance: data_v1.balance,
last_updated: Clock::get()?.unix_timestamp,
};
// Reallocate if needed
let new_size = data_v2.try_to_vec()?.len();
account_info.realloc(new_size, false)?;
// Serialize V2
data_v2.serialize(&mut &mut account_info.data.borrow_mut()[..])?;
Ok(())
}
```
---
## Performance Considerations
### Compute Unit Costs
**Serialization costs (approximate):**
| Operation | CU Cost |
|-----------|---------|
| Serialize small struct (< 100 bytes) | ~500 CU |
| Serialize large struct (> 1KB) | ~2,000 CU |
| Deserialize small struct | ~800 CU |
| Deserialize large struct | ~3,000 CU |
| Zero-copy access | ~100 CU |
### Optimization Tips
**1. Minimize serialization frequency:**
```rust
// ❌ Wasteful - serializes twice
let mut data = load_data(account)?;
data.field1 = value1;
save_data(account, &data)?;
data.field2 = value2;
save_data(account, &data)?; // Serialize again!
// ✅ Efficient - serialize once
let mut data = load_data(account)?;
data.field1 = value1;
data.field2 = value2;
save_data(account, &data)?;
```
**2. Use fixed-size fields:**
```rust
// ❌ Variable size - more expensive
pub struct Account {
pub name: String, // 4 + N bytes
}
// ✅ Fixed size - cheaper
pub struct Account {
pub name: [u8; 32], // Exactly 32 bytes
}
```
**3. Order fields by size:**
```rust
// ✅ Optimized layout (largest first)
#[derive(BorshSerialize, BorshDeserialize)]
#[repr(C)]
pub struct OptimizedAccount {
pub pubkey1: Pubkey, // 32 bytes
pub pubkey2: Pubkey, // 32 bytes
pub amount: u64, // 8 bytes
pub timestamp: i64, // 8 bytes
pub flags: u8, // 1 byte
}
```
---
## Common Pitfalls
### 1. Buffer Too Small
```rust
// ❌ Error: buffer too small
let mut buffer = vec![0u8; 10];
large_struct.serialize(&mut buffer.as_mut_slice())?; // Fails!
// ✅ Correct: proper size
let size = large_struct.try_to_vec()?.len();
let mut buffer = vec![0u8; size];
large_struct.serialize(&mut buffer.as_mut_slice())?;
```
### 2. Forgetting to Borrow
```rust
// ❌ Error: data moved
let data = account_info.data;
UserAccount::try_from_slice(&data)?; // Fails!
// ✅ Correct: borrow data
let data = account_info.data.borrow();
UserAccount::try_from_slice(&data)?;
```
### 3. Mismatched Schema
```rust
// Account created with V1
#[derive(BorshSerialize)]
pub struct AccountV1 {
pub balance: u64,
}
// Later, trying to deserialize as V2
#[derive(BorshDeserialize)]
pub struct AccountV2 {
pub balance: u64,
pub timestamp: i64, // New field!
}
// ❌ Fails: not enough bytes
let data = AccountV2::try_from_slice(&bytes)?; // Error!
```
**Solution:** Use versioning or optional fields.
### 4. String/Vec Limits
```rust
// ❌ No validation
#[derive(BorshSerialize, BorshDeserialize)]
pub struct Account {
pub name: String, // Could be 10MB!
}
// ✅ Validate before deserializing
pub fn validate_name(name: &str) -> ProgramResult {
if name.len() > 32 {
return Err(ProgramError::InvalidArgument);
}
Ok(())
}
```
### 5. Incorrect Size Calculation
```rust
// ❌ Wrong: ignores vector length prefix
let size = my_vec.len();
// ✅ Correct: includes 4-byte length prefix
let size = 4 + my_vec.len();
```
---
## Summary
**Key Takeaways:**
1. **Use Borsh** for all Solana program serialization
2. **Design fixed-size layouts** when possible for predictability
3. **Validate before deserializing** to prevent errors
4. **Use zero-copy** for large, frequently-accessed data
5. **Plan for versioning** from the start
6. **Minimize serialization frequency** to save compute units
**Common Patterns:**
```rust
// Deserialize
let data = AccountData::try_from_slice(&account_info.data.borrow())?;
// Modify
let mut data = data;
data.field = new_value;
// Serialize
data.serialize(&mut &mut account_info.data.borrow_mut()[..])?;
```
**Size Calculation:**
```rust
// Fixed fields
const FIXED_SIZE: usize = 1 + 32 + 8;
// Variable fields
let total_size = FIXED_SIZE + 4 + string.len() + 4 + vec.len();
```
Proper serialization patterns are fundamental to efficient and correct Solana programs. Master Borsh for production-ready data handling.