Initial commit

2025-11-29 18:30:23 +08:00
commit d765cdd7eb
13 changed files with 1286 additions and 0 deletions
--- a/commands/waterfall-enrichment.md
+++ b/commands/waterfall-enrichment.md
@@ -0,0 +1,335 @@
+---
+name: waterfall-enrichment
+description: Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options.
+usage: /data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5
+---
+
+# Waterfall Enrichment Command
+
+Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.
+
+## Command Syntax
+
+```bash
+/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>
+```
+
+## Parameters
+
+- `--type`: Type of waterfall (email, phone, company, full)
+- `--input`: Input data (name+company, email, domain, CSV file)
+- `--max-credits`: Maximum credits to spend per record (default: 10)
+- `--providers`: Specific provider sequence (optional, uses optimized defaults)
+- `--validate`: Validate discovered data (default: true)
+- `--cache`: Use cached results (default: true, 30-day TTL)
+- `--parallel`: Process multiple records in parallel (default: true)
+- `--output`: Output format (json|csv|salesforce|hubspot)
+
+## Waterfall Sequences
+
+### Email Discovery Waterfall
+```yaml
+Default Sequence:
+  1. Cache Check (0 credits)
+  2. Apollo.io (1-2 credits)
+  3. Hunter (1-2 credits)
+  4. RocketReach (1-2 credits)
+  5. People Data Labs (1-2 credits)
+  6. ContactOut (1-2 credits)
+  7. Findymail (1-2 credits)
+  8. BetterContact (2-5 credits)
+  9. AI Web Research (2-5 credits)
+  
+Validation:
+  - ZeroBounce (0.5 credits)
+  - NeverBounce backup (0.5 credits)
+```
+
+### Phone Discovery Waterfall
+```yaml
+Default Sequence:
+  1. Cache Check (0 credits)
+  2. Apollo.io (1-2 credits)
+  3. RocketReach (1-2 credits)
+  4. LeadMagic (1-2 credits)
+  5. SignalHire (1-2 credits)
+  6. BetterContact Phone (2-5 credits)
+  7. People Data Labs (1-2 credits)
+  
+Validation:
+  - ClearoutPhone (0.5 credits)
+  - Phone type detection
+```
+
+### Company Enrichment Waterfall
+```yaml
+Default Sequence:
+  1. Clearbit (1-2 credits)
+  2. Ocean.io (2-3 credits)
+  3. ZoomInfo (2-3 credits) [if enterprise]
+  4. Crunchbase (1-2 credits) [if funded]
+  5. BuiltWith (1-2 credits) [technographics]
+  6. HG Insights (2-3 credits) [tech spend]
+  7. Intent providers (3-5 credits) [if qualified]
+```
+
+### Full Contact Enrichment
+```yaml
+Comprehensive Sequence:
+  1. Email discovery waterfall
+  2. Phone discovery waterfall
+  3. Social profile discovery
+  4. Company enrichment
+  5. Technographics
+  6. Intent signals
+  7. Validation & scoring
+```
+
+## Examples
+
+### Basic Email Discovery
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "John Smith, Acme Corp"
+```
+
+### Bulk Email Enrichment with Validation
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "prospects.csv" \
+  --validate true \
+  --max-credits 5
+```
+
+### Custom Provider Sequence
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "jane.doe@example.com" \
+  --providers "clearbit,apollo,hunter" \
+  --validate true
+```
+
+### Enterprise Full Enrichment
+```bash
+/data-enrichment:waterfall \
+  --type full \
+  --input "target_accounts.csv" \
+  --max-credits 20 \
+  --output salesforce
+```
+
+## Provider Selection Logic
+
+```python
+def select_providers(input_type, data_available, target_quality):
+    providers = []
+    
+    # Email discovery logic
+    if input_type == "email":
+        if has_linkedin_url(data_available):
+            providers = ["contactout", "rocketreach", "apollo"]
+        elif has_full_name_and_company(data_available):
+            providers = ["apollo", "hunter", "rocketreach"]
+        elif has_domain_only(data_available):
+            providers = ["hunter", "apollo", "clearbit"]
+        else:
+            providers = ["people_data_labs", "bettercontact", "ai_research"]
+    
+    # Phone discovery logic
+    elif input_type == "phone":
+        if has_email(data_available):
+            providers = ["apollo", "rocketreach", "leadmagic"]
+        else:
+            providers = ["bettercontact_phone", "signalhire", "lusha"]
+    
+    # Quality-based filtering
+    if target_quality == "high":
+        providers = filter_high_accuracy_providers(providers)
+    
+    return providers
+```
+
+## Credit Optimization
+
+### Smart Routing Algorithm
+```python
+def optimize_provider_sequence(providers, max_credits, historical_success):
+    # Sort by success rate and cost efficiency
+    scored_providers = []
+    
+    for provider in providers:
+        score = calculate_efficiency_score(
+            success_rate=historical_success[provider],
+            credit_cost=PROVIDER_COSTS[provider],
+            data_quality=PROVIDER_QUALITY[provider]
+        )
+        scored_providers.append((provider, score))
+    
+    # Sort by efficiency score
+    scored_providers.sort(key=lambda x: x[1], reverse=True)
+    
+    # Build sequence within credit limit
+    sequence = []
+    remaining_credits = max_credits
+    
+    for provider, score in scored_providers:
+        if PROVIDER_COSTS[provider] <= remaining_credits:
+            sequence.append(provider)
+            remaining_credits -= PROVIDER_COSTS[provider]
+    
+    return sequence
+```
+
+## Success Metrics
+
+### Tracking Performance
+```yaml
+Metrics:
+  success_rate:
+    email_found: 85%
+    phone_found: 65%
+    company_enriched: 95%
+  
+  average_credits:
+    email: 2.3 credits
+    phone: 3.1 credits
+    company: 4.5 credits
+    full_contact: 8.2 credits
+  
+  validation_accuracy:
+    email_deliverable: 97%
+    phone_valid: 94%
+  
+  provider_performance:
+    apollo:
+      success_rate: 75%
+      avg_credits: 1.5
+    hunter:
+      success_rate: 70%
+      avg_credits: 1.2
+    zoominfo:
+      success_rate: 90%
+      avg_credits: 2.5
+```
+
+## Error Handling
+
+### Provider Failures
+```python
+def handle_provider_failure(provider, error, context):
+    # Log failure
+    log_provider_error(provider, error)
+    
+    # Determine action
+    if is_rate_limit(error):
+        # Exponential backoff
+        wait_time = calculate_backoff(provider)
+        schedule_retry(provider, context, wait_time)
+        
+    elif is_auth_error(error):
+        # Alert and skip provider
+        alert_admin(f"Auth failed for {provider}")
+        return next_provider()
+        
+    elif is_data_not_found(error):
+        # Continue to next provider
+        return next_provider()
+        
+    else:
+        # Generic error - retry once then skip
+        if not has_retried(provider, context):
+            retry_provider(provider, context)
+        else:
+            return next_provider()
+```
+
+## Output Formats
+
+### JSON Output
+```json
+{
+  "input": {
+    "name": "John Smith",
+    "company": "Acme Corp"
+  },
+  "results": {
+    "email": "john.smith@acme.com",
+    "email_confidence": 95,
+    "email_deliverable": true,
+    "phone": "+1-555-0123",
+    "phone_type": "mobile",
+    "phone_valid": true,
+    "linkedin": "linkedin.com/in/johnsmith",
+    "providers_used": ["apollo", "zerobounce"],
+    "credits_used": 2.5
+  },
+  "metadata": {
+    "enriched_at": "2024-01-20T10:30:00Z",
+    "cache_hit": false,
+    "processing_time": 1.2
+  }
+}
+```
+
+### CSV Output
+```csv
+name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
+John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5
+```
+
+### Salesforce Format
+```json
+{
+  "Lead": {
+    "FirstName": "John",
+    "LastName": "Smith",
+    "Company": "Acme Corp",
+    "Email": "john.smith@acme.com",
+    "Phone": "+1-555-0123",
+    "LinkedIn__c": "linkedin.com/in/johnsmith",
+    "Enrichment_Score__c": 95,
+    "Last_Enriched__c": "2024-01-20T10:30:00Z"
+  }
+}
+```
+
+## Caching Strategy
+
+### Cache Management
+```python
+CACHE_CONFIG = {
+    "email": {
+        "ttl_days": 30,
+        "refresh_if_bounced": True
+    },
+    "phone": {
+        "ttl_days": 60,
+        "refresh_if_invalid": True
+    },
+    "company": {
+        "ttl_days": 90,
+        "refresh_on_trigger": ["funding", "acquisition", "ipo"]
+    },
+    "intent": {
+        "ttl_days": 7,
+        "always_refresh": True
+    }
+}
+```
+
+## Best Practices
+
+1. **Start with cached data** - Always check cache first
+2. **Set appropriate credit limits** - Balance cost vs. data quality
+3. **Use parallel processing** - For bulk enrichments
+4. **Validate critical data** - Especially emails before outreach
+5. **Monitor provider performance** - Adjust sequences based on success rates
+6. **Handle failures gracefully** - Automatic fallback to next provider
+7. **Track ROI** - Measure enrichment value vs. credit cost
+
+---
+
+*Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations*