zhongwei/gh-gtmagents-gtm-agents-plugins-data-enrichment-master

Files

Zhongwei Li d765cdd7eb Initial commit

2025-11-29 18:30:23 +08:00

8.3 KiB

Raw Blame History

name, description, usage

name	description	usage
waterfall-enrichment	Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options.	/data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5

Waterfall Enrichment Command

Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.

Command Syntax

/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>

Parameters

--type: Type of waterfall (email, phone, company, full)
--input: Input data (name+company, email, domain, CSV file)
--max-credits: Maximum credits to spend per record (default: 10)
--providers: Specific provider sequence (optional, uses optimized defaults)
--validate: Validate discovered data (default: true)
--cache: Use cached results (default: true, 30-day TTL)
--parallel: Process multiple records in parallel (default: true)
--output: Output format (json|csv|salesforce|hubspot)

Waterfall Sequences

Email Discovery Waterfall

Default Sequence:
  1. Cache Check (0 credits)
  2. Apollo.io (1-2 credits)
  3. Hunter (1-2 credits)
  4. RocketReach (1-2 credits)
  5. People Data Labs (1-2 credits)
  6. ContactOut (1-2 credits)
  7. Findymail (1-2 credits)
  8. BetterContact (2-5 credits)
  9. AI Web Research (2-5 credits)
  
Validation:
  - ZeroBounce (0.5 credits)
  - NeverBounce backup (0.5 credits)

Phone Discovery Waterfall

Default Sequence:
  1. Cache Check (0 credits)
  2. Apollo.io (1-2 credits)
  3. RocketReach (1-2 credits)
  4. LeadMagic (1-2 credits)
  5. SignalHire (1-2 credits)
  6. BetterContact Phone (2-5 credits)
  7. People Data Labs (1-2 credits)
  
Validation:
  - ClearoutPhone (0.5 credits)
  - Phone type detection

Company Enrichment Waterfall

Default Sequence:
  1. Clearbit (1-2 credits)
  2. Ocean.io (2-3 credits)
  3. ZoomInfo (2-3 credits) [if enterprise]
  4. Crunchbase (1-2 credits) [if funded]
  5. BuiltWith (1-2 credits) [technographics]
  6. HG Insights (2-3 credits) [tech spend]
  7. Intent providers (3-5 credits) [if qualified]

Full Contact Enrichment

Comprehensive Sequence:
  1. Email discovery waterfall
  2. Phone discovery waterfall
  3. Social profile discovery
  4. Company enrichment
  5. Technographics
  6. Intent signals
  7. Validation & scoring

Examples

Basic Email Discovery

/data-enrichment:waterfall \
  --type email \
  --input "John Smith, Acme Corp"

Bulk Email Enrichment with Validation

/data-enrichment:waterfall \
  --type email \
  --input "prospects.csv" \
  --validate true \
  --max-credits 5

Custom Provider Sequence

/data-enrichment:waterfall \
  --type email \
  --input "jane.doe@example.com" \
  --providers "clearbit,apollo,hunter" \
  --validate true

Enterprise Full Enrichment

/data-enrichment:waterfall \
  --type full \
  --input "target_accounts.csv" \
  --max-credits 20 \
  --output salesforce

Provider Selection Logic

def select_providers(input_type, data_available, target_quality):
    providers = []
    
    # Email discovery logic
    if input_type == "email":
        if has_linkedin_url(data_available):
            providers = ["contactout", "rocketreach", "apollo"]
        elif has_full_name_and_company(data_available):
            providers = ["apollo", "hunter", "rocketreach"]
        elif has_domain_only(data_available):
            providers = ["hunter", "apollo", "clearbit"]
        else:
            providers = ["people_data_labs", "bettercontact", "ai_research"]
    
    # Phone discovery logic
    elif input_type == "phone":
        if has_email(data_available):
            providers = ["apollo", "rocketreach", "leadmagic"]
        else:
            providers = ["bettercontact_phone", "signalhire", "lusha"]
    
    # Quality-based filtering
    if target_quality == "high":
        providers = filter_high_accuracy_providers(providers)
    
    return providers

Credit Optimization

Smart Routing Algorithm

def optimize_provider_sequence(providers, max_credits, historical_success):
    # Sort by success rate and cost efficiency
    scored_providers = []
    
    for provider in providers:
        score = calculate_efficiency_score(
            success_rate=historical_success[provider],
            credit_cost=PROVIDER_COSTS[provider],
            data_quality=PROVIDER_QUALITY[provider]
        )
        scored_providers.append((provider, score))
    
    # Sort by efficiency score
    scored_providers.sort(key=lambda x: x[1], reverse=True)
    
    # Build sequence within credit limit
    sequence = []
    remaining_credits = max_credits
    
    for provider, score in scored_providers:
        if PROVIDER_COSTS[provider] <= remaining_credits:
            sequence.append(provider)
            remaining_credits -= PROVIDER_COSTS[provider]
    
    return sequence

Success Metrics

Tracking Performance

Metrics:
  success_rate:
    email_found: 85%
    phone_found: 65%
    company_enriched: 95%
  
  average_credits:
    email: 2.3 credits
    phone: 3.1 credits
    company: 4.5 credits
    full_contact: 8.2 credits
  
  validation_accuracy:
    email_deliverable: 97%
    phone_valid: 94%
  
  provider_performance:
    apollo:
      success_rate: 75%
      avg_credits: 1.5
    hunter:
      success_rate: 70%
      avg_credits: 1.2
    zoominfo:
      success_rate: 90%
      avg_credits: 2.5

Error Handling

Provider Failures

def handle_provider_failure(provider, error, context):
    # Log failure
    log_provider_error(provider, error)
    
    # Determine action
    if is_rate_limit(error):
        # Exponential backoff
        wait_time = calculate_backoff(provider)
        schedule_retry(provider, context, wait_time)
        
    elif is_auth_error(error):
        # Alert and skip provider
        alert_admin(f"Auth failed for {provider}")
        return next_provider()
        
    elif is_data_not_found(error):
        # Continue to next provider
        return next_provider()
        
    else:
        # Generic error - retry once then skip
        if not has_retried(provider, context):
            retry_provider(provider, context)
        else:
            return next_provider()

Output Formats

JSON Output

{
  "input": {
    "name": "John Smith",
    "company": "Acme Corp"
  },
  "results": {
    "email": "john.smith@acme.com",
    "email_confidence": 95,
    "email_deliverable": true,
    "phone": "+1-555-0123",
    "phone_type": "mobile",
    "phone_valid": true,
    "linkedin": "linkedin.com/in/johnsmith",
    "providers_used": ["apollo", "zerobounce"],
    "credits_used": 2.5
  },
  "metadata": {
    "enriched_at": "2024-01-20T10:30:00Z",
    "cache_hit": false,
    "processing_time": 1.2
  }
}

CSV Output

name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5

Salesforce Format

{
  "Lead": {
    "FirstName": "John",
    "LastName": "Smith",
    "Company": "Acme Corp",
    "Email": "john.smith@acme.com",
    "Phone": "+1-555-0123",
    "LinkedIn__c": "linkedin.com/in/johnsmith",
    "Enrichment_Score__c": 95,
    "Last_Enriched__c": "2024-01-20T10:30:00Z"
  }
}

Caching Strategy

Cache Management

CACHE_CONFIG = {
    "email": {
        "ttl_days": 30,
        "refresh_if_bounced": True
    },
    "phone": {
        "ttl_days": 60,
        "refresh_if_invalid": True
    },
    "company": {
        "ttl_days": 90,
        "refresh_on_trigger": ["funding", "acquisition", "ipo"]
    },
    "intent": {
        "ttl_days": 7,
        "always_refresh": True
    }
}

Best Practices

Start with cached data - Always check cache first
Set appropriate credit limits - Balance cost vs. data quality
Use parallel processing - For bulk enrichments
Validate critical data - Especially emails before outreach
Monitor provider performance - Adjust sequences based on success rates
Handle failures gracefully - Automatic fallback to next provider
Track ROI - Measure enrichment value vs. credit cost

Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations

8.3 KiB Raw Blame History