gh-josiahsiegel-claude-code…/skills/data-cloud-2025.md

---
name: data-cloud-2025
description: Salesforce Data Cloud integration patterns and architecture (2025)
---

## 🚨 CRITICAL GUIDELINES

### Windows File Path Requirements

**MANDATORY: Always Use Backslashes on Windows for File Paths**

When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).

**Examples:**
- ❌ WRONG: `D:/repos/project/file.tsx`
- ✅ CORRECT: `D:\repos\project\file.tsx`

This applies to:
- Edit tool file_path parameter
- Write tool file_path parameter
- All file operations on Windows systems


### Documentation Guidelines

**NEVER create new documentation files unless explicitly requested by the user.**

- **Priority**: Update existing README.md files rather than creating new documentation
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
- **User preference**: Only create additional .md files when user specifically asks for documentation


---

# Salesforce Data Cloud Integration Patterns (2025)

## What is Salesforce Data Cloud?

Salesforce Data Cloud is a real-time customer data platform (CDP) that unifies data from any source to create a complete, actionable view of every customer. It powers AI, automation, and analytics across the entire Customer 360 platform.

**Key Capabilities**:
- **Data Ingestion**: Connect 200+ sources (Salesforce, external systems, data lakes)
- **Data Harmonization**: Map disparate data to unified data model
- **Identity Resolution**: Match and merge customer records across sources
- **Real-Time Activation**: Trigger actions based on streaming data
- **Zero Copy Architecture**: Query data in place without moving it
- **AI/ML Ready**: Powers Einstein, Agentforce, and predictive models
- **Vector Database** (GA March 2025): Store and query unstructured data with semantic search
- **Hybrid Search** (Pilot 2025): Combine semantic and keyword search for accuracy

## Data Cloud Architecture

```
┌──────────────────────────────────────────────────────────┐
│                    Data Sources                          │
├──────────────────────────────────────────────────────────┤
│  Salesforce CRM │ External Apps │ Data Warehouses │ APIs │
└────────┬─────────────────┬──────────────┬───────────┬────┘
         │                 │              │           │
    ┌────▼─────────────────▼──────────────▼───────────▼────┐
    │         Data Cloud Connectors & Ingestion            │
    │  ├─ Real-time Streaming (Change Data Capture)        │
    │  ├─ Batch Import (scheduled/on-demand)               │
    │  └─ Zero Copy (Snowflake, Databricks, BigQuery)      │
    └────────────────────────┬─────────────────────────────┘
                             │
    ┌────────────────────────▼─────────────────────────────┐
    │            Data Model & Harmonization                │
    │  ├─ Map to Common Data Model (DMO objects)           │
    │  ├─ Identity Resolution (match & merge)              │
    │  └─ Data Transformation (calculated insights)        │
    └────────────────────────┬─────────────────────────────┘
                             │
    ┌────────────────────────▼─────────────────────────────┐
    │         Unified Customer Profile (360° View)         │
    │  ├─ Demographics, Transactions, Behavior, Events     │
    │  └─ Real-time Profile API for instant access         │
    └────────────────────────┬─────────────────────────────┘
                             │
    ┌────────────────────────▼─────────────────────────────┐
    │              Activation & Actions                    │
    │  ├─ Salesforce Flow (real-time automation)           │
    │  ├─ Marketing Cloud (segmentation/journeys)          │
    │  ├─ Agentforce (AI agents)                           │
    │  ├─ Einstein AI (predictions/recommendations)        │
    │  └─ External Systems (reverse ETL)                   │
    └──────────────────────────────────────────────────────┘
```

## Data Ingestion Patterns

### Pattern 1: Real-Time Streaming with Change Data Capture

**Use Case**: Keep Data Cloud synchronized with Salesforce objects in real-time

```apex
// Enable Change Data Capture for objects
// Setup → Change Data Capture → Select: Account, Contact, Opportunity

// Data Cloud automatically subscribes to CDC channels
// No code needed - configure in Data Cloud UI

// Optional: Custom streaming logic
public class DataCloudStreamHandler {
    public static void publishCustomEvent(Id recordId, String changeType) {
        // Publish custom platform event for Data Cloud
        DataCloudChangeEvent__e event = new DataCloudChangeEvent__e(
            RecordId__c = recordId,
            ObjectType__c = 'Custom_Object__c',
            ChangeType__c = changeType,
            Timestamp__c = System.now(),
            PayloadJson__c = JSON.serialize(getRecordData(recordId))
        );

        EventBus.publish(event);
    }

    private static Map<String, Object> getRecordData(Id recordId) {
        // Retrieve and return record data
        String objectType = recordId.getSObjectType().getDescribe().getName();
        String query = 'SELECT FIELDS(ALL) FROM ' + objectType +
                      ' WHERE Id = :recordId LIMIT 1';
        SObject record = Database.query(query);
        return (Map<String, Object>)JSON.deserializeUntyped(JSON.serialize(record));
    }
}
```

### Pattern 2: Batch Import from External Systems

**Use Case**: Import data from ERP, e-commerce, or other business systems

**Data Cloud Configuration**:
```
1. Create Data Source (Setup → Data Cloud → Data Sources)
   - Type: Amazon S3, SFTP, Azure Blob, Google Cloud Storage
   - Authentication: API key, OAuth, IAM role
   - Schedule: Hourly, Daily, Weekly

2. Map to Data Model Objects (DMO)
   - Source Field → DMO Field mapping
   - Data type conversions
   - Formula fields and transformations

3. Configure Identity Resolution
   - Match rules (email, customer ID, phone)
   - Reconciliation rules (which source wins)
```

**API-Based Batch Import**:
```python
# Python example: Push data to Data Cloud via API
import requests
import pandas as pd

def upload_to_data_cloud(csv_file, object_name, access_token, instance_url):
    """Upload CSV to Data Cloud via Bulk API"""

    # Step 1: Create ingestion job
    job_url = f"{instance_url}/services/data/v62.0/jobs/ingest"
    job_payload = {
        "object": object_name,
        "operation": "upsert",
        "externalIdFieldName": "ExternalId__c"
    }

    response = requests.post(
        job_url,
        headers={
            "Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json"
        },
        json=job_payload
    )

    job_id = response.json()["id"]

    # Step 2: Upload CSV data
    with open(csv_file, 'rb') as f:
        csv_data = f.read()

    upload_url = f"{job_url}/{job_id}/batches"
    requests.put(
        upload_url,
        headers={
            "Authorization": f"Bearer {access_token}",
            "Content-Type": "text/csv"
        },
        data=csv_data
    )

    # Step 3: Close job
    close_url = f"{job_url}/{job_id}"
    requests.patch(
        close_url,
        headers={
            "Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json"
        },
        json={"state": "UploadComplete"}
    )

    return job_id
```

### Pattern 3: Zero Copy Integration (Snowflake, Databricks)

**Use Case**: Access data warehouse data without copying to Salesforce

**Benefits**:
- No data duplication (single source of truth)
- No data transfer costs
- Real-time access to warehouse data
- Maintain data governance in warehouse

**Snowflake Zero Copy Setup**:
```sql
-- In Snowflake: Grant access to Salesforce
GRANT USAGE ON DATABASE customer_data TO ROLE salesforce_role;
GRANT USAGE ON SCHEMA customer_data.public TO ROLE salesforce_role;
GRANT SELECT ON TABLE customer_data.public.orders TO ROLE salesforce_role;

-- Create secure share
CREATE SHARE salesforce_data_share;
GRANT USAGE ON DATABASE customer_data TO SHARE salesforce_data_share;
ALTER SHARE salesforce_data_share ADD ACCOUNTS = 'SALESFORCE_ORG_ID';
```

**Data Cloud Configuration**:
```
1. Add Zero Copy Connector (Data Cloud → Data Sources)
   - Type: Snowflake Zero Copy
   - Connection: Account URL, username, private key
   - Database/Schema selection

2. Create Data Stream (virtual tables)
   - Select Snowflake tables to expose
   - Map to DMO or keep as is
   - Configure refresh (real-time or scheduled)

3. Query in Salesforce
   - Use SOQL-like syntax to query Snowflake data
   - Join with Salesforce data
   - No data movement required
```

**Query Zero Copy Data**:
```apex
// Query Snowflake data from Apex (via Data Cloud)
public class DataCloudZeroCopyQuery {
    public static List<Map<String, Object>> querySnowflakeOrders(String customerId) {
        // Data Cloud Query API
        String query = 'SELECT order_id, total_amount, order_date ' +
                      'FROM snowflake_orders ' +
                      'WHERE customer_id = \'' + customerId + '\' ' +
                      'ORDER BY order_date DESC LIMIT 10';

        HttpRequest req = new HttpRequest();
        req.setEndpoint('callout:DataCloud/v1/query');
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');
        req.setBody(JSON.serialize(new Map<String, String>{'query' => query}));

        Http http = new Http();
        HttpResponse res = http.send(req);

        if (res.getStatusCode() == 200) {
            Map<String, Object> result = (Map<String, Object>)JSON.deserializeUntyped(res.getBody());
            return (List<Map<String, Object>>)result.get('data');
        }

        return new List<Map<String, Object>>();
    }
}
```

## Identity Resolution

### Matching Rules

**Configure identity resolution to create unified profiles**:

```
Match Rules Configuration:
├─ Primary Match (exact match on email)
│  └─ IF email matches THEN merge profiles
├─ Secondary Match (fuzzy match on name + phone)
│  └─ IF firstName + lastName similar AND phone matches THEN merge
└─ Tertiary Match (external ID)
   └─ IF ExternalCustomerId matches THEN merge

Reconciliation Rules (conflict resolution):
├─ Most Recent: Use most recently updated value
├─ Source Priority: Salesforce > ERP > Website
└─ Field-Level Rules: Email from Salesforce, Revenue from ERP
```

**Custom Matching Logic**:
```apex
// Custom matching for complex scenarios
public class DataCloudMatchingService {
    public static Boolean shouldMatch(Map<String, Object> profile1,
                                     Map<String, Object> profile2) {
        // Custom matching logic beyond standard rules

        String email1 = (String)profile1.get('email');
        String email2 = (String)profile2.get('email');

        // Exact email match
        if (email1 != null && email1.equalsIgnoreCase(email2)) {
            return true;
        }

        // Fuzzy name + address match
        String name1 = (String)profile1.get('fullName');
        String name2 = (String)profile2.get('fullName');
        String address1 = (String)profile1.get('address');
        String address2 = (String)profile2.get('address');

        if (isNameSimilar(name1, name2) && isSameAddress(address1, address2)) {
            return true;
        }

        return false;
    }

    private static Boolean isNameSimilar(String name1, String name2) {
        // Implement Levenshtein distance or phonetic matching
        return calculateSimilarity(name1, name2) > 0.85;
    }
}
```

## Real-Time Activation Patterns

### Pattern 1: Flow Automation Based on Data Cloud Events

**Use Case**: Trigger Flow when customer behavior detected in Data Cloud

```
Data Cloud Calculated Insight: "High-Value Customer at Risk"
- Logic: Purchase frequency decreased by 50% in last 30 days
- Trigger: When insight calculated
↓
Platform Event: HighValueCustomerRisk__e
↓
Salesforce Flow: "Retain High-Value Customer"
- Create Task for Account Manager
- Send personalized offer via Marketing Cloud
- Add to "At-Risk" campaign
- Log activity timeline
```

**Apex Implementation**:
```apex
// Subscribe to Data Cloud insights
trigger DataCloudInsightTrigger on HighValueCustomerRisk__e (after insert) {
    List<Task> tasks = new List<Task>();

    for (HighValueCustomerRisk__e event : Trigger.new) {
        // Create retention task
        Task task = new Task(
            Subject = 'Urgent: High-value customer at risk',
            Description = 'Customer ' + event.CustomerName__c +
                         ' shows declining engagement. Take action.',
            WhatId = event.AccountId__c,
            Priority = 'High',
            Status = 'Open',
            ActivityDate = Date.today().addDays(1)
        );
        tasks.add(task);

        // Trigger retention campaign
        RetentionCampaignService.addToRetentionCampaign(
            event.CustomerId__c,
            event.RiskScore__c
        );
    }

    if (!tasks.isEmpty()) {
        insert tasks;
    }
}
```

### Pattern 2: Agentforce with Data Cloud

**Use Case**: AI agent uses Data Cloud for complete customer context

```apex
// Agentforce action: Get unified customer view
public class AgentforceDataCloudActions {
    @InvocableMethod(label='Get Customer 360 Profile')
    public static List<CustomerProfile> getCustomer360(List<String> customerIds) {
        List<CustomerProfile> profiles = new List<CustomerProfile>();

        for (String customerId : customerIds) {
            // Query Data Cloud unified profile
            HttpRequest req = new HttpRequest();
            req.setEndpoint('callout:DataCloud/v1/profile/' + customerId);
            req.setMethod('GET');

            Http http = new Http();
            HttpResponse res = http.send(req);

            if (res.getStatusCode() == 200) {
                Map<String, Object> data = (Map<String, Object>)
                    JSON.deserializeUntyped(res.getBody());

                CustomerProfile profile = new CustomerProfile();
                profile.customerId = customerId;

                // Demographics
                profile.name = (String)data.get('name');
                profile.email = (String)data.get('email');
                profile.segment = (String)data.get('segment');

                // Behavioral
                profile.totalPurchases = (Decimal)data.get('total_purchases');
                profile.avgOrderValue = (Decimal)data.get('avg_order_value');
                profile.lastPurchaseDate = Date.valueOf((String)data.get('last_purchase_date'));
                profile.preferredChannel = (String)data.get('preferred_channel');

                // Engagement
                profile.emailEngagement = (Decimal)data.get('email_engagement_score');
                profile.websiteVisits = (Integer)data.get('website_visits_30d');
                profile.supportCases = (Integer)data.get('support_cases_90d');

                // Predictive
                profile.churnRisk = (Decimal)data.get('churn_risk_score');
                profile.lifetimeValue = (Decimal)data.get('predicted_lifetime_value');
                profile.nextBestAction = (String)data.get('next_best_action');

                profiles.add(profile);
            }
        }

        return profiles;
    }

    public class CustomerProfile {
        @InvocableVariable public String customerId;
        @InvocableVariable public String name;
        @InvocableVariable public String email;
        @InvocableVariable public String segment;
        @InvocableVariable public Decimal totalPurchases;
        @InvocableVariable public Decimal avgOrderValue;
        @InvocableVariable public Date lastPurchaseDate;
        @InvocableVariable public String preferredChannel;
        @InvocableVariable public Decimal emailEngagement;
        @InvocableVariable public Integer websiteVisits;
        @InvocableVariable public Integer supportCases;
        @InvocableVariable public Decimal churnRisk;
        @InvocableVariable public Decimal lifetimeValue;
        @InvocableVariable public String nextBestAction;
    }
}
```

### Pattern 3: Reverse ETL (Data Cloud → External Systems)

**Use Case**: Push enriched Data Cloud data back to external systems

**Configuration**:
```
Data Cloud → Data Actions → Create Data Action
- Target: External API endpoint
- Trigger: Segment membership change, insight calculated
- Payload: Customer profile fields
- Authentication: Named Credential
- Schedule: Real-time or batch
```

**Apex Outbound Sync**:
```apex
public class DataCloudReverseETL {
    @InvocableMethod(label='Sync Enriched Profile to External System')
    public static void syncToExternalSystem(List<String> customerIds) {
        for (String customerId : customerIds) {
            // Get enriched profile from Data Cloud
            Map<String, Object> profile = DataCloudService.getProfile(customerId);

            // Transform for external system
            Map<String, Object> payload = new Map<String, Object>{
                'customer_id' => customerId,
                'segment' => profile.get('segment'),
                'lifetime_value' => profile.get('ltv'),
                'churn_risk' => profile.get('churn_risk'),
                'next_best_product' => profile.get('next_best_product')
            };

            // Send to external system
            HttpRequest req = new HttpRequest();
            req.setEndpoint('callout:ExternalCRM/api/customers/' + customerId);
            req.setMethod('PUT');
            req.setHeader('Content-Type', 'application/json');
            req.setBody(JSON.serialize(payload));

            Http http = new Http();
            HttpResponse res = http.send(req);

            // Log result
            DataCloudSyncLog__c log = new DataCloudSyncLog__c(
                CustomerId__c = customerId,
                Direction__c = 'Outbound',
                Success__c = res.getStatusCode() == 200,
                Timestamp__c = System.now()
            );
            insert log;
        }
    }
}
```

## Calculated Insights and Segmentation

### Create Calculated Insights

**Use Case**: Define metrics and KPIs on unified data

```sql
-- Example: Customer Lifetime Value
CREATE CALCULATED INSIGHT customer_lifetime_value AS
SELECT
    customer_id,
    SUM(order_total) as total_revenue,
    COUNT(order_id) as total_orders,
    AVG(order_total) as avg_order_value,
    DATEDIFF(day, first_order_date, CURRENT_DATE) as customer_age_days,
    SUM(order_total) / NULLIF(DATEDIFF(day, first_order_date, CURRENT_DATE), 0) * 365 as annual_revenue,
    (SUM(order_total) / NULLIF(DATEDIFF(day, first_order_date, CURRENT_DATE), 0) * 365) * 5 as predicted_ltv_5yr
FROM unified_orders
GROUP BY customer_id, first_order_date
```

### Dynamic Segmentation

**Use Case**: Create segments that update in real-time

```sql
-- Segment: High-Value Active Customers
CREATE SEGMENT high_value_active_customers AS
SELECT customer_id
FROM customer_360_profile
WHERE
    predicted_ltv_5yr > 10000
    AND last_purchase_date >= CURRENT_DATE - INTERVAL '30' DAY
    AND email_engagement_score > 0.7
    AND churn_risk_score < 0.3
```

**Use in Salesforce**:
```apex
// Query segment membership
List<Contact> highValueContacts = [
    SELECT Id, Name, Email
    FROM Contact
    WHERE Id IN (
        SELECT ContactId__c
        FROM DataCloudSegmentMember__c
        WHERE SegmentName__c = 'high_value_active_customers'
    )
];
```

## Data Cloud Vector Database (GA March 2025)

### What is Vector Database?

Data Cloud Vector Database ingests, stores, unifies, indexes, and allows semantic queries of unstructured data using generative AI techniques. It creates embeddings that enable semantic querying and seamless integration with structured data in the Einstein platform.

**Supported Unstructured Data**:
- Emails and email threads
- Text documents (PDFs, Word, etc.)
- Social media content
- Web content and chat transcripts
- Call transcripts and recordings
- Knowledge base articles
- Customer reviews and feedback

### How Vector Database Works

```
┌──────────────────────────────────────────────────────────┐
│            Unstructured Data Sources                     │
│  Emails │ Documents │ Transcripts │ Social │ Knowledge  │
└─────────────────────┬────────────────────────────────────┘
                      │
    ┌─────────────────▼────────────────────────────────────┐
    │          Text Embedding Generation                   │
    │  Uses LLM to convert text → vector embeddings        │
    │  (768-dimensional numeric representations)           │
    └─────────────────┬────────────────────────────────────┘
                      │
    ┌─────────────────▼────────────────────────────────────┐
    │        Vector Database Storage & Indexing            │
    │  Stores embeddings with metadata                     │
    │  Creates high-performance vector index               │
    └─────────────────┬────────────────────────────────────┘
                      │
    ┌─────────────────▼────────────────────────────────────┐
    │           Semantic Search Queries                    │
    │  Natural language query → embedding → similarity     │
    │  Returns most semantically similar content           │
    └──────────────────────────────────────────────────────┘
```

### Semantic Search with Einstein Copilot Search

Semantic search understands the meaning and intent of queries, going beyond keyword matching:

**Example**:
- **Query**: "How do I return a defective product?"
- **Traditional Keyword Search**: Matches documents containing exact words "return", "defective", "product"
- **Semantic Search**: Finds documents about:
  - Return policies
  - Warranty claims
  - Product exchanges
  - Refund procedures
  - RMA processes
  - *Even if they use different wording*

### Implementing Vector Database

**Step 1: Configure Unstructured Data Sources**

```
Setup → Data Cloud → Data Sources → Create
- Source Type: Unstructured Data
- Options:
  ├─ Salesforce Knowledge
  ├─ EmailMessage object
  ├─ External documents (S3, Azure Blob, Google Drive)
  ├─ API-based ingestion
  └─ ContentDocument/File objects
```

**Step 2: Enable Vector Indexing**

```apex
// API to index unstructured content
public class VectorDatabaseService {
    public static void indexDocument(String documentId, String content, Map<String, Object> metadata) {
        // Create vector embedding request
        HttpRequest req = new HttpRequest();
        req.setEndpoint('callout:DataCloud/v1/vector/index');
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');

        Map<String, Object> payload = new Map<String, Object>{
            'documentId' => documentId,
            'content' => content,
            'metadata' => metadata,
            'source' => 'Salesforce',
            'timestamp' => System.now().getTime()
        };

        req.setBody(JSON.serialize(payload));

        Http http = new Http();
        HttpResponse res = http.send(req);

        if (res.getStatusCode() == 200) {
            System.debug('Document indexed: ' + documentId);
        } else {
            System.debug('Indexing failed: ' + res.getBody());
        }
    }
}

// Trigger to auto-index Knowledge articles
trigger KnowledgeArticleTrigger on Knowledge__kav (after insert, after update) {
    for (Knowledge__kav article : Trigger.new) {
        if (article.PublishStatus == 'Online') {
            Map<String, Object> metadata = new Map<String, Object>{
                'articleNumber' => article.ArticleNumber,
                'title' => article.Title,
                'category' => article.Category__c,
                'language' => article.Language
            };

            VectorDatabaseService.indexDocument(
                article.Id,
                article.Body__c,
                metadata
            );
        }
    }
}
```

**Step 3: Perform Semantic Search**

```apex
public class SemanticSearchService {
    @InvocableMethod(label='Semantic Search' description='Search unstructured data semantically')
    public static List<SearchResult> semanticSearch(List<SearchRequest> requests) {
        List<SearchResult> results = new List<SearchResult>();

        for (SearchRequest req : requests) {
            HttpRequest httpReq = new HttpRequest();
            httpReq.setEndpoint('callout:DataCloud/v1/vector/search');
            httpReq.setMethod('POST');
            httpReq.setHeader('Content-Type', 'application/json');

            Map<String, Object> payload = new Map<String, Object>{
                'query' => req.query,
                'topK' => req.maxResults,
                'filters' => req.filters,
                'includeMetadata' => true
            };

            httpReq.setBody(JSON.serialize(payload));

            Http http = new Http();
            HttpResponse httpRes = http.send(httpReq);

            if (httpRes.getStatusCode() == 200) {
                Map<String, Object> response = (Map<String, Object>)
                    JSON.deserializeUntyped(httpRes.getBody());

                List<Object> hits = (List<Object>)response.get('results');

                SearchResult result = new SearchResult();
                result.query = req.query;
                result.matches = new List<String>();

                for (Object hit : hits) {
                    Map<String, Object> doc = (Map<String, Object>)hit;
                    result.matches.add((String)doc.get('content'));
                }

                results.add(result);
            }
        }

        return results;
    }

    public class SearchRequest {
        @InvocableVariable(required=true)
        public String query;
        @InvocableVariable
        public Integer maxResults = 10;
        @InvocableVariable
        public Map<String, String> filters;
    }

    public class SearchResult {
        @InvocableVariable
        public String query;
        @InvocableVariable
        public List<String> matches;
    }
}
```

### Hybrid Search (Pilot 2025)

Hybrid search combines semantic search with traditional keyword search for improved accuracy:

**Benefits**:
- Understands semantic similarities and context (semantic search)
- Recognizes company-specific words and concepts (keyword search)
- Higher accuracy than either method alone
- Handles acronyms, product codes, and technical terms better

**Use Case Example**:
```
Service agent searches: "customer wants refund for SKU-12345"

Semantic Search finds:
- Return policy documents
- Refund procedures
- Customer satisfaction articles

Keyword Search finds:
- Specific SKU-12345 product documentation
- Previous cases mentioning SKU-12345
- Product-specific return windows

Hybrid Search combines both:
- Return procedures specifically for SKU-12345
- Previous refund cases for this product
- Product warranty terms
```

**Implementation**:
```apex
public class HybridSearchService {
    public static List<Map<String, Object>> hybridSearch(String query, Map<String, Object> filters) {
        HttpRequest req = new HttpRequest();
        req.setEndpoint('callout:DataCloud/v1/search/hybrid');
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');

        Map<String, Object> payload = new Map<String, Object>{
            'query' => query,
            'semantic' => new Map<String, Object>{
                'enabled' => true,
                'weight' => 0.6  // 60% semantic
            },
            'keyword' => new Map<String, Object>{
                'enabled' => true,
                'weight' => 0.4  // 40% keyword
            },
            'filters' => filters,
            'topK' => 20
        };

        req.setBody(JSON.serialize(payload));

        Http http = new Http();
        HttpResponse res = http.send(req);

        if (res.getStatusCode() == 200) {
            Map<String, Object> response = (Map<String, Object>)JSON.deserializeUntyped(res.getBody());
            return (List<Map<String, Object>>)response.get('results');
        }

        return new List<Map<String, Object>>();
    }
}
```

### Multi-Language Semantic Search

Vector database supports cross-language semantic search:

**Example**:
- Service agent types case subject in French: "Problème de connexion"
- Semantic search finds similar cases in English:
  - "Login issues"
  - "Connection problems"
  - "Unable to access account"
- Returns relevant solutions regardless of language

**Configuration**:
```
Data Cloud → Vector Database → Settings
- Enable multi-language support
- Supported languages: 100+ languages via multilingual embeddings
- Automatic language detection
- Cross-language similarity matching
```

### Use Cases for Vector Database

**1. Customer Service Knowledge Retrieval**
```apex
// Agentforce action: Find relevant knowledge articles
@InvocableMethod(label='Find Relevant Articles')
public static List<String> findRelevantArticles(List<String> customerQueries) {
    List<String> articles = new List<String>();

    for (String query : customerQueries) {
        // Semantic search finds conceptually similar articles
        List<SearchResult> results = SemanticSearchService.semanticSearch(
            new List<SearchRequest>{new SearchRequest(query, 5)}
        );

        if (!results.isEmpty()) {
            articles.addAll(results[0].matches);
        }
    }

    return articles;
}
```

**2. Case Similarity Detection**
```apex
// Find similar past cases to suggest solutions
public class CaseSimilarityService {
    public static List<Case> findSimilarCases(String caseDescription) {
        // Semantic search in past cases
        List<SearchResult> results = SemanticSearchService.semanticSearch(
            new List<SearchRequest>{new SearchRequest(caseDescription, 10)}
        );

        // Extract case IDs from metadata
        Set<Id> caseIds = new Set<Id>();
        // ... extract IDs from results

        return [SELECT Id, Subject, Description, Status, Resolution__c
                FROM Case
                WHERE Id IN :caseIds
                AND Status = 'Closed'
                ORDER BY ClosedDate DESC];
    }
}
```

**3. Lead Scoring from Unstructured Data**
```apex
// Analyze email content and web behavior for lead scoring
public class LeadScoringService {
    public static Decimal scoreLeadFromContent(Id leadId) {
        // Get all email interactions
        List<EmailMessage> emails = [SELECT Id, TextBody
                                      FROM EmailMessage
                                      WHERE RelatedToId = :leadId];

        Decimal score = 0;

        // Semantic search for buying intent keywords
        String allContent = '';
        for (EmailMessage email : emails) {
            allContent += email.TextBody + ' ';
        }

        // Check semantic similarity to high-intent phrases
        List<String> intentPhrases = new List<String>{
            'ready to purchase',
            'need pricing quote',
            'schedule demo',
            'implementation timeline'
        };

        for (String phrase : intentPhrases) {
            // Semantic similarity score
            Decimal similarity = calculateSemanticSimilarity(allContent, phrase);
            score += similarity * 10;
        }

        return score;
    }
}
```

## Data Cloud SQL (ANSI SQL Support)

Query Data Cloud using standard SQL:

```sql
-- Complex analytical query across multiple sources
SELECT
    c.customer_id,
    c.name,
    c.segment,
    COUNT(DISTINCT o.order_id) as total_orders,
    SUM(o.order_total) as revenue,
    AVG(s.satisfaction_score) as avg_satisfaction,
    MAX(o.order_date) as last_order_date
FROM
    unified_customer c
    INNER JOIN unified_orders o ON c.customer_id = o.customer_id
    LEFT JOIN support_interactions s ON c.customer_id = s.customer_id
WHERE
    o.order_date >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY
    c.customer_id, c.name, c.segment
HAVING
    COUNT(DISTINCT o.order_id) >= 3
ORDER BY
    revenue DESC
LIMIT 100
```

## Authentication Patterns

### OAuth 2.0 JWT Bearer Flow (Server-to-Server)

```python
# External system → Data Cloud authentication
import jwt
import time
import requests

def get_data_cloud_access_token(client_id, private_key, username, instance_url):
    """Get access token for Data Cloud API"""

    # Create JWT
    payload = {
        'iss': client_id,
        'sub': username,
        'aud': instance_url,
        'exp': int(time.time()) + 180  # 3 minutes
    }

    encoded_jwt = jwt.encode(payload, private_key, algorithm='RS256')

    # Exchange JWT for access token
    token_url = f"{instance_url}/services/oauth2/token"
    response = requests.post(token_url, data={
        'grant_type': 'urn:ietf:params:oauth:grant-type:jwt-bearer',
        'assertion': encoded_jwt
    })

    return response.json()['access_token']
```

## Best Practices

### Performance
- **Use Zero Copy** for large datasets (>10M records)
- **Batch imports** outside business hours
- **Index frequently queried fields** in Data Cloud
- **Limit real-time triggers** to critical events
- **Cache unified profiles** when possible

### Security
- **Field-level security** applies to Data Cloud queries from Salesforce
- **Data masking** for PII in non-production environments
- **Encryption at rest** and in transit (TLS 1.2+)
- **Audit logging** for all data access
- **Role-based access control** (RBAC) for Data Cloud users

### Data Quality
- **Data validation** before ingestion
- **Deduplication rules** at source and in Data Cloud
- **Data lineage tracking** (know source of each field)
- **Quality scores** for unified profiles
- **Regular data audits** and cleansing

## Resources

- **Data Cloud Documentation**: https://developer.salesforce.com/docs/data/data-cloud-int/guide
- **Zero Copy Partner Network**: https://www.salesforce.com/data/zero-copy/
- **Data Cloud Pricing**: Part of Customer 360 platform, usage-based pricing
- **Trailhead**: "Data Cloud Basics" and "Data Cloud for Developers"