Initial commit
This commit is contained in:
335
commands/waterfall-enrichment.md
Normal file
335
commands/waterfall-enrichment.md
Normal file
@@ -0,0 +1,335 @@
|
||||
---
|
||||
name: waterfall-enrichment
|
||||
description: Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options.
|
||||
usage: /data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5
|
||||
---
|
||||
|
||||
# Waterfall Enrichment Command
|
||||
|
||||
Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.
|
||||
|
||||
## Command Syntax
|
||||
|
||||
```bash
|
||||
/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
- `--type`: Type of waterfall (email, phone, company, full)
|
||||
- `--input`: Input data (name+company, email, domain, CSV file)
|
||||
- `--max-credits`: Maximum credits to spend per record (default: 10)
|
||||
- `--providers`: Specific provider sequence (optional, uses optimized defaults)
|
||||
- `--validate`: Validate discovered data (default: true)
|
||||
- `--cache`: Use cached results (default: true, 30-day TTL)
|
||||
- `--parallel`: Process multiple records in parallel (default: true)
|
||||
- `--output`: Output format (json|csv|salesforce|hubspot)
|
||||
|
||||
## Waterfall Sequences
|
||||
|
||||
### Email Discovery Waterfall
|
||||
```yaml
|
||||
Default Sequence:
|
||||
1. Cache Check (0 credits)
|
||||
2. Apollo.io (1-2 credits)
|
||||
3. Hunter (1-2 credits)
|
||||
4. RocketReach (1-2 credits)
|
||||
5. People Data Labs (1-2 credits)
|
||||
6. ContactOut (1-2 credits)
|
||||
7. Findymail (1-2 credits)
|
||||
8. BetterContact (2-5 credits)
|
||||
9. AI Web Research (2-5 credits)
|
||||
|
||||
Validation:
|
||||
- ZeroBounce (0.5 credits)
|
||||
- NeverBounce backup (0.5 credits)
|
||||
```
|
||||
|
||||
### Phone Discovery Waterfall
|
||||
```yaml
|
||||
Default Sequence:
|
||||
1. Cache Check (0 credits)
|
||||
2. Apollo.io (1-2 credits)
|
||||
3. RocketReach (1-2 credits)
|
||||
4. LeadMagic (1-2 credits)
|
||||
5. SignalHire (1-2 credits)
|
||||
6. BetterContact Phone (2-5 credits)
|
||||
7. People Data Labs (1-2 credits)
|
||||
|
||||
Validation:
|
||||
- ClearoutPhone (0.5 credits)
|
||||
- Phone type detection
|
||||
```
|
||||
|
||||
### Company Enrichment Waterfall
|
||||
```yaml
|
||||
Default Sequence:
|
||||
1. Clearbit (1-2 credits)
|
||||
2. Ocean.io (2-3 credits)
|
||||
3. ZoomInfo (2-3 credits) [if enterprise]
|
||||
4. Crunchbase (1-2 credits) [if funded]
|
||||
5. BuiltWith (1-2 credits) [technographics]
|
||||
6. HG Insights (2-3 credits) [tech spend]
|
||||
7. Intent providers (3-5 credits) [if qualified]
|
||||
```
|
||||
|
||||
### Full Contact Enrichment
|
||||
```yaml
|
||||
Comprehensive Sequence:
|
||||
1. Email discovery waterfall
|
||||
2. Phone discovery waterfall
|
||||
3. Social profile discovery
|
||||
4. Company enrichment
|
||||
5. Technographics
|
||||
6. Intent signals
|
||||
7. Validation & scoring
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Email Discovery
|
||||
```bash
|
||||
/data-enrichment:waterfall \
|
||||
--type email \
|
||||
--input "John Smith, Acme Corp"
|
||||
```
|
||||
|
||||
### Bulk Email Enrichment with Validation
|
||||
```bash
|
||||
/data-enrichment:waterfall \
|
||||
--type email \
|
||||
--input "prospects.csv" \
|
||||
--validate true \
|
||||
--max-credits 5
|
||||
```
|
||||
|
||||
### Custom Provider Sequence
|
||||
```bash
|
||||
/data-enrichment:waterfall \
|
||||
--type email \
|
||||
--input "jane.doe@example.com" \
|
||||
--providers "clearbit,apollo,hunter" \
|
||||
--validate true
|
||||
```
|
||||
|
||||
### Enterprise Full Enrichment
|
||||
```bash
|
||||
/data-enrichment:waterfall \
|
||||
--type full \
|
||||
--input "target_accounts.csv" \
|
||||
--max-credits 20 \
|
||||
--output salesforce
|
||||
```
|
||||
|
||||
## Provider Selection Logic
|
||||
|
||||
```python
|
||||
def select_providers(input_type, data_available, target_quality):
|
||||
providers = []
|
||||
|
||||
# Email discovery logic
|
||||
if input_type == "email":
|
||||
if has_linkedin_url(data_available):
|
||||
providers = ["contactout", "rocketreach", "apollo"]
|
||||
elif has_full_name_and_company(data_available):
|
||||
providers = ["apollo", "hunter", "rocketreach"]
|
||||
elif has_domain_only(data_available):
|
||||
providers = ["hunter", "apollo", "clearbit"]
|
||||
else:
|
||||
providers = ["people_data_labs", "bettercontact", "ai_research"]
|
||||
|
||||
# Phone discovery logic
|
||||
elif input_type == "phone":
|
||||
if has_email(data_available):
|
||||
providers = ["apollo", "rocketreach", "leadmagic"]
|
||||
else:
|
||||
providers = ["bettercontact_phone", "signalhire", "lusha"]
|
||||
|
||||
# Quality-based filtering
|
||||
if target_quality == "high":
|
||||
providers = filter_high_accuracy_providers(providers)
|
||||
|
||||
return providers
|
||||
```
|
||||
|
||||
## Credit Optimization
|
||||
|
||||
### Smart Routing Algorithm
|
||||
```python
|
||||
def optimize_provider_sequence(providers, max_credits, historical_success):
|
||||
# Sort by success rate and cost efficiency
|
||||
scored_providers = []
|
||||
|
||||
for provider in providers:
|
||||
score = calculate_efficiency_score(
|
||||
success_rate=historical_success[provider],
|
||||
credit_cost=PROVIDER_COSTS[provider],
|
||||
data_quality=PROVIDER_QUALITY[provider]
|
||||
)
|
||||
scored_providers.append((provider, score))
|
||||
|
||||
# Sort by efficiency score
|
||||
scored_providers.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
# Build sequence within credit limit
|
||||
sequence = []
|
||||
remaining_credits = max_credits
|
||||
|
||||
for provider, score in scored_providers:
|
||||
if PROVIDER_COSTS[provider] <= remaining_credits:
|
||||
sequence.append(provider)
|
||||
remaining_credits -= PROVIDER_COSTS[provider]
|
||||
|
||||
return sequence
|
||||
```
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Tracking Performance
|
||||
```yaml
|
||||
Metrics:
|
||||
success_rate:
|
||||
email_found: 85%
|
||||
phone_found: 65%
|
||||
company_enriched: 95%
|
||||
|
||||
average_credits:
|
||||
email: 2.3 credits
|
||||
phone: 3.1 credits
|
||||
company: 4.5 credits
|
||||
full_contact: 8.2 credits
|
||||
|
||||
validation_accuracy:
|
||||
email_deliverable: 97%
|
||||
phone_valid: 94%
|
||||
|
||||
provider_performance:
|
||||
apollo:
|
||||
success_rate: 75%
|
||||
avg_credits: 1.5
|
||||
hunter:
|
||||
success_rate: 70%
|
||||
avg_credits: 1.2
|
||||
zoominfo:
|
||||
success_rate: 90%
|
||||
avg_credits: 2.5
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Provider Failures
|
||||
```python
|
||||
def handle_provider_failure(provider, error, context):
|
||||
# Log failure
|
||||
log_provider_error(provider, error)
|
||||
|
||||
# Determine action
|
||||
if is_rate_limit(error):
|
||||
# Exponential backoff
|
||||
wait_time = calculate_backoff(provider)
|
||||
schedule_retry(provider, context, wait_time)
|
||||
|
||||
elif is_auth_error(error):
|
||||
# Alert and skip provider
|
||||
alert_admin(f"Auth failed for {provider}")
|
||||
return next_provider()
|
||||
|
||||
elif is_data_not_found(error):
|
||||
# Continue to next provider
|
||||
return next_provider()
|
||||
|
||||
else:
|
||||
# Generic error - retry once then skip
|
||||
if not has_retried(provider, context):
|
||||
retry_provider(provider, context)
|
||||
else:
|
||||
return next_provider()
|
||||
```
|
||||
|
||||
## Output Formats
|
||||
|
||||
### JSON Output
|
||||
```json
|
||||
{
|
||||
"input": {
|
||||
"name": "John Smith",
|
||||
"company": "Acme Corp"
|
||||
},
|
||||
"results": {
|
||||
"email": "john.smith@acme.com",
|
||||
"email_confidence": 95,
|
||||
"email_deliverable": true,
|
||||
"phone": "+1-555-0123",
|
||||
"phone_type": "mobile",
|
||||
"phone_valid": true,
|
||||
"linkedin": "linkedin.com/in/johnsmith",
|
||||
"providers_used": ["apollo", "zerobounce"],
|
||||
"credits_used": 2.5
|
||||
},
|
||||
"metadata": {
|
||||
"enriched_at": "2024-01-20T10:30:00Z",
|
||||
"cache_hit": false,
|
||||
"processing_time": 1.2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### CSV Output
|
||||
```csv
|
||||
name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
|
||||
John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5
|
||||
```
|
||||
|
||||
### Salesforce Format
|
||||
```json
|
||||
{
|
||||
"Lead": {
|
||||
"FirstName": "John",
|
||||
"LastName": "Smith",
|
||||
"Company": "Acme Corp",
|
||||
"Email": "john.smith@acme.com",
|
||||
"Phone": "+1-555-0123",
|
||||
"LinkedIn__c": "linkedin.com/in/johnsmith",
|
||||
"Enrichment_Score__c": 95,
|
||||
"Last_Enriched__c": "2024-01-20T10:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
### Cache Management
|
||||
```python
|
||||
CACHE_CONFIG = {
|
||||
"email": {
|
||||
"ttl_days": 30,
|
||||
"refresh_if_bounced": True
|
||||
},
|
||||
"phone": {
|
||||
"ttl_days": 60,
|
||||
"refresh_if_invalid": True
|
||||
},
|
||||
"company": {
|
||||
"ttl_days": 90,
|
||||
"refresh_on_trigger": ["funding", "acquisition", "ipo"]
|
||||
},
|
||||
"intent": {
|
||||
"ttl_days": 7,
|
||||
"always_refresh": True
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with cached data** - Always check cache first
|
||||
2. **Set appropriate credit limits** - Balance cost vs. data quality
|
||||
3. **Use parallel processing** - For bulk enrichments
|
||||
4. **Validate critical data** - Especially emails before outreach
|
||||
5. **Monitor provider performance** - Adjust sequences based on success rates
|
||||
6. **Handle failures gracefully** - Automatic fallback to next provider
|
||||
7. **Track ROI** - Measure enrichment value vs. credit cost
|
||||
|
||||
---
|
||||
|
||||
*Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations*
|
||||
Reference in New Issue
Block a user