gh-gtmagents-gtm-agents-plu…/commands/waterfall-enrichment.md

---
name: waterfall-enrichment
description: Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options.
usage: /data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5
---

# Waterfall Enrichment Command

Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.

## Command Syntax

```bash
/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>
```

## Parameters

- `--type`: Type of waterfall (email, phone, company, full)
- `--input`: Input data (name+company, email, domain, CSV file)
- `--max-credits`: Maximum credits to spend per record (default: 10)
- `--providers`: Specific provider sequence (optional, uses optimized defaults)
- `--validate`: Validate discovered data (default: true)
- `--cache`: Use cached results (default: true, 30-day TTL)
- `--parallel`: Process multiple records in parallel (default: true)
- `--output`: Output format (json|csv|salesforce|hubspot)

## Waterfall Sequences

### Email Discovery Waterfall
```yaml
Default Sequence:
  1. Cache Check (0 credits)
  2. Apollo.io (1-2 credits)
  3. Hunter (1-2 credits)
  4. RocketReach (1-2 credits)
  5. People Data Labs (1-2 credits)
  6. ContactOut (1-2 credits)
  7. Findymail (1-2 credits)
  8. BetterContact (2-5 credits)
  9. AI Web Research (2-5 credits)

Validation:
  - ZeroBounce (0.5 credits)
  - NeverBounce backup (0.5 credits)
```

### Phone Discovery Waterfall
```yaml
Default Sequence:
  1. Cache Check (0 credits)
  2. Apollo.io (1-2 credits)
  3. RocketReach (1-2 credits)
  4. LeadMagic (1-2 credits)
  5. SignalHire (1-2 credits)
  6. BetterContact Phone (2-5 credits)
  7. People Data Labs (1-2 credits)

Validation:
  - ClearoutPhone (0.5 credits)
  - Phone type detection
```

### Company Enrichment Waterfall
```yaml
Default Sequence:
  1. Clearbit (1-2 credits)
  2. Ocean.io (2-3 credits)
  3. ZoomInfo (2-3 credits) [if enterprise]
  4. Crunchbase (1-2 credits) [if funded]
  5. BuiltWith (1-2 credits) [technographics]
  6. HG Insights (2-3 credits) [tech spend]
  7. Intent providers (3-5 credits) [if qualified]
```

### Full Contact Enrichment
```yaml
Comprehensive Sequence:
  1. Email discovery waterfall
  2. Phone discovery waterfall
  3. Social profile discovery
  4. Company enrichment
  5. Technographics
  6. Intent signals
  7. Validation & scoring
```

## Examples

### Basic Email Discovery
```bash
/data-enrichment:waterfall \
  --type email \
  --input "John Smith, Acme Corp"
```

### Bulk Email Enrichment with Validation
```bash
/data-enrichment:waterfall \
  --type email \
  --input "prospects.csv" \
  --validate true \
  --max-credits 5
```

### Custom Provider Sequence
```bash
/data-enrichment:waterfall \
  --type email \
  --input "jane.doe@example.com" \
  --providers "clearbit,apollo,hunter" \
  --validate true
```

### Enterprise Full Enrichment
```bash
/data-enrichment:waterfall \
  --type full \
  --input "target_accounts.csv" \
  --max-credits 20 \
  --output salesforce
```

## Provider Selection Logic

```python
def select_providers(input_type, data_available, target_quality):
    providers = []

    # Email discovery logic
    if input_type == "email":
        if has_linkedin_url(data_available):
            providers = ["contactout", "rocketreach", "apollo"]
        elif has_full_name_and_company(data_available):
            providers = ["apollo", "hunter", "rocketreach"]
        elif has_domain_only(data_available):
            providers = ["hunter", "apollo", "clearbit"]
        else:
            providers = ["people_data_labs", "bettercontact", "ai_research"]

    # Phone discovery logic
    elif input_type == "phone":
        if has_email(data_available):
            providers = ["apollo", "rocketreach", "leadmagic"]
        else:
            providers = ["bettercontact_phone", "signalhire", "lusha"]

    # Quality-based filtering
    if target_quality == "high":
        providers = filter_high_accuracy_providers(providers)

    return providers
```

## Credit Optimization

### Smart Routing Algorithm
```python
def optimize_provider_sequence(providers, max_credits, historical_success):
    # Sort by success rate and cost efficiency
    scored_providers = []

    for provider in providers:
        score = calculate_efficiency_score(
            success_rate=historical_success[provider],
            credit_cost=PROVIDER_COSTS[provider],
            data_quality=PROVIDER_QUALITY[provider]
        )
        scored_providers.append((provider, score))

    # Sort by efficiency score
    scored_providers.sort(key=lambda x: x[1], reverse=True)

    # Build sequence within credit limit
    sequence = []
    remaining_credits = max_credits

    for provider, score in scored_providers:
        if PROVIDER_COSTS[provider] <= remaining_credits:
            sequence.append(provider)
            remaining_credits -= PROVIDER_COSTS[provider]

    return sequence
```

## Success Metrics

### Tracking Performance
```yaml
Metrics:
  success_rate:
    email_found: 85%
    phone_found: 65%
    company_enriched: 95%

  average_credits:
    email: 2.3 credits
    phone: 3.1 credits
    company: 4.5 credits
    full_contact: 8.2 credits

  validation_accuracy:
    email_deliverable: 97%
    phone_valid: 94%

  provider_performance:
    apollo:
      success_rate: 75%
      avg_credits: 1.5
    hunter:
      success_rate: 70%
      avg_credits: 1.2
    zoominfo:
      success_rate: 90%
      avg_credits: 2.5
```

## Error Handling

### Provider Failures
```python
def handle_provider_failure(provider, error, context):
    # Log failure
    log_provider_error(provider, error)

    # Determine action
    if is_rate_limit(error):
        # Exponential backoff
        wait_time = calculate_backoff(provider)
        schedule_retry(provider, context, wait_time)

    elif is_auth_error(error):
        # Alert and skip provider
        alert_admin(f"Auth failed for {provider}")
        return next_provider()

    elif is_data_not_found(error):
        # Continue to next provider
        return next_provider()

    else:
        # Generic error - retry once then skip
        if not has_retried(provider, context):
            retry_provider(provider, context)
        else:
            return next_provider()
```

## Output Formats

### JSON Output
```json
{
  "input": {
    "name": "John Smith",
    "company": "Acme Corp"
  },
  "results": {
    "email": "john.smith@acme.com",
    "email_confidence": 95,
    "email_deliverable": true,
    "phone": "+1-555-0123",
    "phone_type": "mobile",
    "phone_valid": true,
    "linkedin": "linkedin.com/in/johnsmith",
    "providers_used": ["apollo", "zerobounce"],
    "credits_used": 2.5
  },
  "metadata": {
    "enriched_at": "2024-01-20T10:30:00Z",
    "cache_hit": false,
    "processing_time": 1.2
  }
}
```

### CSV Output
```csv
name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5
```

### Salesforce Format
```json
{
  "Lead": {
    "FirstName": "John",
    "LastName": "Smith",
    "Company": "Acme Corp",
    "Email": "john.smith@acme.com",
    "Phone": "+1-555-0123",
    "LinkedIn__c": "linkedin.com/in/johnsmith",
    "Enrichment_Score__c": 95,
    "Last_Enriched__c": "2024-01-20T10:30:00Z"
  }
}
```

## Caching Strategy

### Cache Management
```python
CACHE_CONFIG = {
    "email": {
        "ttl_days": 30,
        "refresh_if_bounced": True
    },
    "phone": {
        "ttl_days": 60,
        "refresh_if_invalid": True
    },
    "company": {
        "ttl_days": 90,
        "refresh_on_trigger": ["funding", "acquisition", "ipo"]
    },
    "intent": {
        "ttl_days": 7,
        "always_refresh": True
    }
}
```

## Best Practices

1. **Start with cached data** - Always check cache first
2. **Set appropriate credit limits** - Balance cost vs. data quality
3. **Use parallel processing** - For bulk enrichments
4. **Validate critical data** - Especially emails before outreach
5. **Monitor provider performance** - Adjust sequences based on success rates
6. **Handle failures gracefully** - Automatic fallback to next provider
7. **Track ROI** - Measure enrichment value vs. credit cost

---

*Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations*