8.3 KiB
8.3 KiB
name, description, usage
| name | description | usage |
|---|---|---|
| waterfall-enrichment | Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options. | /data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5 |
Waterfall Enrichment Command
Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.
Command Syntax
/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>
Parameters
--type: Type of waterfall (email, phone, company, full)--input: Input data (name+company, email, domain, CSV file)--max-credits: Maximum credits to spend per record (default: 10)--providers: Specific provider sequence (optional, uses optimized defaults)--validate: Validate discovered data (default: true)--cache: Use cached results (default: true, 30-day TTL)--parallel: Process multiple records in parallel (default: true)--output: Output format (json|csv|salesforce|hubspot)
Waterfall Sequences
Email Discovery Waterfall
Default Sequence:
1. Cache Check (0 credits)
2. Apollo.io (1-2 credits)
3. Hunter (1-2 credits)
4. RocketReach (1-2 credits)
5. People Data Labs (1-2 credits)
6. ContactOut (1-2 credits)
7. Findymail (1-2 credits)
8. BetterContact (2-5 credits)
9. AI Web Research (2-5 credits)
Validation:
- ZeroBounce (0.5 credits)
- NeverBounce backup (0.5 credits)
Phone Discovery Waterfall
Default Sequence:
1. Cache Check (0 credits)
2. Apollo.io (1-2 credits)
3. RocketReach (1-2 credits)
4. LeadMagic (1-2 credits)
5. SignalHire (1-2 credits)
6. BetterContact Phone (2-5 credits)
7. People Data Labs (1-2 credits)
Validation:
- ClearoutPhone (0.5 credits)
- Phone type detection
Company Enrichment Waterfall
Default Sequence:
1. Clearbit (1-2 credits)
2. Ocean.io (2-3 credits)
3. ZoomInfo (2-3 credits) [if enterprise]
4. Crunchbase (1-2 credits) [if funded]
5. BuiltWith (1-2 credits) [technographics]
6. HG Insights (2-3 credits) [tech spend]
7. Intent providers (3-5 credits) [if qualified]
Full Contact Enrichment
Comprehensive Sequence:
1. Email discovery waterfall
2. Phone discovery waterfall
3. Social profile discovery
4. Company enrichment
5. Technographics
6. Intent signals
7. Validation & scoring
Examples
Basic Email Discovery
/data-enrichment:waterfall \
--type email \
--input "John Smith, Acme Corp"
Bulk Email Enrichment with Validation
/data-enrichment:waterfall \
--type email \
--input "prospects.csv" \
--validate true \
--max-credits 5
Custom Provider Sequence
/data-enrichment:waterfall \
--type email \
--input "jane.doe@example.com" \
--providers "clearbit,apollo,hunter" \
--validate true
Enterprise Full Enrichment
/data-enrichment:waterfall \
--type full \
--input "target_accounts.csv" \
--max-credits 20 \
--output salesforce
Provider Selection Logic
def select_providers(input_type, data_available, target_quality):
providers = []
# Email discovery logic
if input_type == "email":
if has_linkedin_url(data_available):
providers = ["contactout", "rocketreach", "apollo"]
elif has_full_name_and_company(data_available):
providers = ["apollo", "hunter", "rocketreach"]
elif has_domain_only(data_available):
providers = ["hunter", "apollo", "clearbit"]
else:
providers = ["people_data_labs", "bettercontact", "ai_research"]
# Phone discovery logic
elif input_type == "phone":
if has_email(data_available):
providers = ["apollo", "rocketreach", "leadmagic"]
else:
providers = ["bettercontact_phone", "signalhire", "lusha"]
# Quality-based filtering
if target_quality == "high":
providers = filter_high_accuracy_providers(providers)
return providers
Credit Optimization
Smart Routing Algorithm
def optimize_provider_sequence(providers, max_credits, historical_success):
# Sort by success rate and cost efficiency
scored_providers = []
for provider in providers:
score = calculate_efficiency_score(
success_rate=historical_success[provider],
credit_cost=PROVIDER_COSTS[provider],
data_quality=PROVIDER_QUALITY[provider]
)
scored_providers.append((provider, score))
# Sort by efficiency score
scored_providers.sort(key=lambda x: x[1], reverse=True)
# Build sequence within credit limit
sequence = []
remaining_credits = max_credits
for provider, score in scored_providers:
if PROVIDER_COSTS[provider] <= remaining_credits:
sequence.append(provider)
remaining_credits -= PROVIDER_COSTS[provider]
return sequence
Success Metrics
Tracking Performance
Metrics:
success_rate:
email_found: 85%
phone_found: 65%
company_enriched: 95%
average_credits:
email: 2.3 credits
phone: 3.1 credits
company: 4.5 credits
full_contact: 8.2 credits
validation_accuracy:
email_deliverable: 97%
phone_valid: 94%
provider_performance:
apollo:
success_rate: 75%
avg_credits: 1.5
hunter:
success_rate: 70%
avg_credits: 1.2
zoominfo:
success_rate: 90%
avg_credits: 2.5
Error Handling
Provider Failures
def handle_provider_failure(provider, error, context):
# Log failure
log_provider_error(provider, error)
# Determine action
if is_rate_limit(error):
# Exponential backoff
wait_time = calculate_backoff(provider)
schedule_retry(provider, context, wait_time)
elif is_auth_error(error):
# Alert and skip provider
alert_admin(f"Auth failed for {provider}")
return next_provider()
elif is_data_not_found(error):
# Continue to next provider
return next_provider()
else:
# Generic error - retry once then skip
if not has_retried(provider, context):
retry_provider(provider, context)
else:
return next_provider()
Output Formats
JSON Output
{
"input": {
"name": "John Smith",
"company": "Acme Corp"
},
"results": {
"email": "john.smith@acme.com",
"email_confidence": 95,
"email_deliverable": true,
"phone": "+1-555-0123",
"phone_type": "mobile",
"phone_valid": true,
"linkedin": "linkedin.com/in/johnsmith",
"providers_used": ["apollo", "zerobounce"],
"credits_used": 2.5
},
"metadata": {
"enriched_at": "2024-01-20T10:30:00Z",
"cache_hit": false,
"processing_time": 1.2
}
}
CSV Output
name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5
Salesforce Format
{
"Lead": {
"FirstName": "John",
"LastName": "Smith",
"Company": "Acme Corp",
"Email": "john.smith@acme.com",
"Phone": "+1-555-0123",
"LinkedIn__c": "linkedin.com/in/johnsmith",
"Enrichment_Score__c": 95,
"Last_Enriched__c": "2024-01-20T10:30:00Z"
}
}
Caching Strategy
Cache Management
CACHE_CONFIG = {
"email": {
"ttl_days": 30,
"refresh_if_bounced": True
},
"phone": {
"ttl_days": 60,
"refresh_if_invalid": True
},
"company": {
"ttl_days": 90,
"refresh_on_trigger": ["funding", "acquisition", "ipo"]
},
"intent": {
"ttl_days": 7,
"always_refresh": True
}
}
Best Practices
- Start with cached data - Always check cache first
- Set appropriate credit limits - Balance cost vs. data quality
- Use parallel processing - For bulk enrichments
- Validate critical data - Especially emails before outreach
- Monitor provider performance - Adjust sequences based on success rates
- Handle failures gracefully - Automatic fallback to next provider
- Track ROI - Measure enrichment value vs. credit cost
Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations