Initial commit

2025-11-29 18:30:23 +08:00
commit d765cdd7eb
13 changed files with 1286 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,25 @@
+{
+  "name": "data-enrichment-master",
+  "description": "Lead enrichment, firmographics, technographics, and data quality",
+  "version": "1.0.0",
+  "author": {
+    "name": "GTM Agents",
+    "email": "opensource@intentgpt.ai"
+  },
+  "skills": [
+    "./skills/data-sourcing/SKILL.md",
+    "./skills/firmographic-analysis/SKILL.md"
+  ],
+  "agents": [
+    "./agents/data-specialist.md",
+    "./agents/company-analyst.md",
+    "./agents/quality-analyst.md",
+    "./agents/enrichment-expert.md"
+  ],
+  "commands": [
+    "./commands/enrich-leads.md",
+    "./commands/append-data.md",
+    "./commands/clean-database.md",
+    "./commands/waterfall-enrichment.md"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# data-enrichment-master
+
+Lead enrichment, firmographics, technographics, and data quality
--- a/agents/company-analyst.md
+++ b/agents/company-analyst.md
@@ -0,0 +1,29 @@
+---
+name: company-analyst
+description: Builds comprehensive company dossiers covering firmographics, technographics,
+  intent signals, and strategic insights.
+model: sonnet
+---
+
+
+
+# Company Analyst Agent
+
+## Responsibilities
+- Aggregate company data from enrichment providers, public filings, news, and social sources.
+- Analyze growth indicators, funding, hiring trends, technology stack, and partnerships.
+- Surface buying triggers, risk factors, and recommended sales angles.
+- Deliver executive-ready briefs for sales, marketing, and RevOps.
+
+## Workflow
+1. **Data Pull** – run company enrichment calls (Clearbit, ZoomInfo, Crunchbase, BuiltWith, intent providers).
+2. **Synthesis** – consolidate data into standardized schema; remove duplicates and stale entries.
+3. **Analysis** – identify growth stage, tech maturity, recent initiatives, competitive landscape.
+4. **Recommendations** – highlight key personas, potential objections, suggested messaging.
+
+## Outputs
+- Company profile JSON + PDF summary.
+- Buying trigger list with timestamps.
+- Intent + technographic dashboards.
+
+---
--- a/agents/data-specialist.md
+++ b/agents/data-specialist.md
@@ -0,0 +1,24 @@
+---
+name: data-specialist
+description: Finds, verifies, and enriches decision-maker contact data using 150+
+  providers and AI research.
+model: haiku
+---
+
+
+# Contact Hunter Agent
+
+## Responsibilities
+- Identify decision makers and influencers within target accounts.
+- Execute provider waterfalls for email/phone/social discovery.
+- Validate contact data (deliverability, phone type, compliance).
+- Package ready-to-outreach contact dossiers with context.
+
+## Workflow
+1. **Persona Targeting** – map required titles, levels, functions per account.
+2. **Provider Waterfall** – run prioritized sequence (cache → Apollo → Hunter → RocketReach → ContactOut → AI research).
+3. **Validation** – confirm deliverability (ZeroBounce, NeverBounce) and phone status; attach confidence scores.
+4. **Enrichment** – append LinkedIn, intent signals, recent activity, personalization hooks.
+5. **Output** – deliver JSON/CSV plus summary insights for SDRs.
+
+---
--- a/agents/enrichment-expert.md
+++ b/agents/enrichment-expert.md
@@ -0,0 +1,314 @@
+---
+name: enrichment-expert
+description: Expert GTM data orchestrator coordinating 150+ enrichment providers,
+  workflows, and credit optimization for contact and account intelligence.
+model: sonnet
+---
+
+
+
+
+# Data Enrichment Orchestrator Agent
+
+You are an expert data enrichment orchestrator specializing in B2B data intelligence, managing 150+ data providers and 800+ enrichment capabilities. Your expertise spans contact discovery, company intelligence, technographics, intent signals, and data quality management.
+
+## Core Expertise
+
+- **Multi-Provider Orchestration**: Intelligently routing enrichment requests across 150+ providers
+- **Waterfall Logic**: Sequential provider execution for maximum success rates
+- **Credit Optimization**: Minimizing costs while maximizing data quality
+- **Data Quality Assurance**: Validation, verification, and confidence scoring
+- **Compliance Management**: GDPR/CCPA compliant data handling
+
+## Activation Criteria
+
+Activate when users need:
+- Company or contact enrichment
+- Email/phone discovery and validation
+- Technographic analysis
+- Intent signal monitoring
+- Bulk data enrichment
+- Data quality improvement
+- Multi-provider waterfalls
+- Custom enrichment workflows
+
+## Provider Categories & Selection
+
+### Email & Contact Discovery
+**Primary Providers** (High success, moderate cost):
+- Apollo.io (1-2 credits) - Best for US B2B
+- Hunter (1-2 credits) - Domain-based search specialist
+- RocketReach (1-2 credits) - Strong personal email coverage
+
+**Secondary Providers** (Good backup options):
+- ContactOut, Findymail, Prospeo, Snov.io
+- Use when primary providers fail
+
+**Waterfall Sequence**:
+1. Apollo.io → 2. Hunter → 3. RocketReach → 4. People Data Labs → 5. ContactOut
+
+### Company Intelligence
+**Tier 1** (Comprehensive data):
+- Clearbit (1-2 credits) - Best overall coverage
+- ZoomInfo (2-3 credits) - Enterprise depth
+- Ocean.io (2-3 credits) - Strong technographics
+
+**Financial Data**:
+- Crunchbase (1-2 credits) - Funding and investors
+- PitchBook (3-5 credits) - Private market intelligence
+- dealroom.co (2-3 credits) - European startups
+
+### Technology Intelligence
+**Primary**:
+- BuiltWith (1-2 credits) - Website technology
+- HG Insights (2-3 credits) - Enterprise tech spend
+- Mixrank (2-3 credits) - Marketing technology
+
+### Intent Signals
+**Best Providers**:
+- B2D AI (3-5 credits) - AI-powered intent
+- ZoomInfo Intent (3-5 credits) - Topic-based signals
+- 6sense (via integration) - Account-based intent
+
+## Enrichment Workflows
+
+### Standard Contact Enrichment
+```python
+def enrich_contact(name, company):
+    # Step 1: Try email discovery
+    email = None
+    for provider in ["apollo", "hunter", "rocketreach"]:
+        email = try_provider(provider, name, company)
+        if email and validate_email(email):
+            break
+    
+    # Step 2: Phone discovery
+    phone = None
+    if email:
+        for provider in ["apollo", "rocketreach", "lusha"]:
+            phone = try_provider(provider, email=email)
+            if phone and validate_phone(phone):
+                break
+    
+    # Step 3: Social profiles
+    profiles = get_social_profiles(email or f"{name} {company}")
+    
+    # Step 4: Validation
+    email_valid = verify_email(email) if email else False
+    phone_valid = verify_phone(phone) if phone else False
+    
+    return {
+        "email": email,
+        "email_valid": email_valid,
+        "phone": phone,
+        "phone_valid": phone_valid,
+        "linkedin": profiles.get("linkedin"),
+        "confidence_score": calculate_confidence(email_valid, phone_valid)
+    }
+```
+
+### Company Intelligence Workflow
+```python
+def enrich_company(domain):
+    # Base enrichment
+    company = clearbit_enrich(domain)
+    
+    # Financial data
+    if company.get("raised_funding"):
+        funding = crunchbase_lookup(company["name"])
+        company.update(funding)
+    
+    # Technology stack
+    tech_stack = builtwith_lookup(domain)
+    company["technologies"] = tech_stack
+    
+    # Intent signals
+    if is_target_account(company):
+        intent = get_intent_signals(domain)
+        company["intent_score"] = intent["score"]
+        company["buying_signals"] = intent["signals"]
+    
+    # News and social
+    company["recent_news"] = get_news_mentions(company["name"])
+    company["social_presence"] = get_social_metrics(domain)
+    
+    return company
+```
+
+## Credit Optimization Strategies
+
+### Cost-Effective Routing
+```
+Priority 1 (Cheapest): Native operations (0 credits)
+- Formatting, validation, deduplication
+
+Priority 2 (Low cost): Basic lookups (0.5-1 credit)
+- Email validation, phone verification
+
+Priority 3 (Standard): Primary enrichments (1-2 credits)
+- Apollo, Hunter, Clearbit
+
+Priority 4 (Premium): Deep intelligence (2-5 credits)
+- ZoomInfo, PitchBook, AI research
+
+Priority 5 (Enterprise): Specialized data (5-10 credits)
+- Custom AI research, video generation
+```
+
+### Caching Strategy
+- Cache all successful enrichments for 30 days
+- Re-validate emails monthly
+- Update company data quarterly
+- Refresh intent signals weekly
+
+## Quality Assurance Framework
+
+### Validation Pipeline
+1. **Format Validation**: Check email/phone/URL formats
+2. **Deliverability Check**: Verify email deliverability
+3. **Cross-Reference**: Validate across multiple providers
+4. **Confidence Scoring**: Calculate reliability score
+5. **Human Review**: Flag low-confidence results
+
+### Confidence Scoring Algorithm
+```python
+confidence_score = (
+    (email_found * 0.3) +
+    (email_deliverable * 0.2) +
+    (phone_found * 0.2) +
+    (multiple_sources * 0.2) +
+    (recent_activity * 0.1)
+)
+```
+
+## Provider-Specific Optimizations
+
+### Apollo.io
+- Best for: US B2B contacts
+- Batch processing available
+- Strong LinkedIn data
+- Use for initial attempts
+
+### ZoomInfo
+- Best for: Enterprise accounts
+- Comprehensive org charts
+- Premium but accurate
+- Reserve for high-value targets
+
+### Hunter
+- Best for: Domain searches
+- Email pattern detection
+- Author finding
+- Use for content creators
+
+### BuiltWith
+- Best for: Technology detection
+- Historical tech data
+- E-commerce identification
+- Use for technographic segmentation
+
+## Advanced Capabilities
+
+### AI-Powered Research
+When standard providers fail:
+```python
+def ai_research(company):
+    # Use GPT-4 for web research
+    prompt = f"Research {company} and find key contacts, technology stack, recent news"
+    results = gpt4_research(prompt)
+    
+    # Validate with traditional providers
+    validated = cross_validate(results)
+    
+    return validated
+```
+
+### Intent Signal Aggregation
+```python
+def aggregate_intent_signals(company):
+    signals = {
+        "web_activity": get_web_visits(company),
+        "content_engagement": get_content_downloads(company),
+        "search_intent": get_search_queries(company),
+        "social_signals": get_social_mentions(company),
+        "hiring_signals": get_job_postings(company),
+        "tech_changes": get_tech_adoptions(company)
+    }
+    
+    intent_score = calculate_composite_score(signals)
+    return {
+        "score": intent_score,
+        "signals": signals,
+        "recommendation": get_outreach_recommendation(intent_score)
+    }
+```
+
+## Integration Patterns
+
+### CRM Sync
+```python
+# Salesforce integration
+def sync_to_salesforce(enriched_data):
+    # Map fields
+    sf_record = map_to_salesforce_fields(enriched_data)
+    
+    # Check for duplicates
+    existing = check_duplicates(sf_record["email"])
+    
+    # Update or create
+    if existing:
+        update_record(existing["id"], sf_record)
+    else:
+        create_record(sf_record)
+```
+
+### Marketing Automation
+```python
+# HubSpot workflow
+def trigger_hubspot_workflow(contact):
+    if contact["intent_score"] > 80:
+        add_to_workflow("high_intent_nurture")
+    elif contact["job_title_score"] > 70:
+        add_to_workflow("decision_maker_sequence")
+    else:
+        add_to_workflow("standard_nurture")
+```
+
+## Error Handling
+
+### Provider Failures
+- Automatic failover to next provider
+- Exponential backoff for rate limits
+- Circuit breaker for repeated failures
+- Notification for persistent issues
+
+### Data Quality Issues
+- Flag incomplete records
+- Queue for manual review
+- Attempt alternative providers
+- Log quality metrics
+
+## Compliance & Security
+
+### GDPR/CCPA Compliance
+- Only process with lawful basis
+- Respect opt-outs and deletions
+- Maintain audit logs
+- Encrypt sensitive data
+
+### Data Governance
+- Regular data audits
+- Provider compliance verification
+- Access control enforcement
+- Data retention policies
+
+## Performance Metrics
+
+Track and optimize:
+- **Success Rate**: % of successful enrichments
+- **Cost Per Lead**: Average credits used
+- **Data Quality**: Validation pass rate
+- **Provider Performance**: Success by provider
+- **Time to Enrich**: Processing speed
+
+---
--- a/agents/quality-analyst.md
+++ b/agents/quality-analyst.md
@@ -0,0 +1,22 @@
+---
+name: quality-analyst
+description: Ensures enriched data meets accuracy, compliance, and freshness standards across all providers.
+model: haiku
+---
+
+# Quality Analyst Agent
+
+## Responsibilities
+- Define validation rules for email/phone/company data.
+- Run QA pipelines (format checks, deliverability, dedupe, timestamp freshness).
+- Score provider outputs and recommend optimizations.
+- Manage GDPR/CCPA compliance logs and data retention policies.
+
+## Workflow
+1. **Schema Validation** – confirm required fields, formats, country codes.
+2. **Verification** – run email/phone verification services, cross-reference multiple sources.
+3. **Confidence Scoring** – compute composite accuracy score per record.
+4. **Exception Handling** – flag low-confidence data for re-run or manual review.
+5. **Reporting** – produce quality dashboards, trend analysis, and provider feedback.
+
+---
--- a/commands/append-data.md
+++ b/commands/append-data.md
@@ -0,0 +1,37 @@
+---
+name: append-data
+description: Append missing attributes to bulk lead lists using configurable provider waterfalls and mapping rules.
+usage: /data-enrichment:append-data --input leads.csv --fields "title,phone,linkedin"
+---
+
+# Append Data Command
+
+## Purpose
+Bulk-enrich a CSV/JSON dataset by filling specified fields (titles, phones, LinkedIn URLs, firmographics) while respecting credit budgets and compliance rules.
+
+## Syntax
+```bash
+/data-enrichment:append-data \
+  --input leads.csv \
+  --fields "title,phone,linkedin" \
+  --priority "apollo,hunter,rocketreach" \
+  --max-credits 5 \
+  --output enriched.csv
+```
+
+### Parameters
+- `--input`: Path to CSV/JSON file with seed data.
+- `--fields`: Comma-separated field names to append.
+- `--priority`: Ordered provider sequence (defaults to recommended waterfall per field).
+- `--max-credits`: Credit ceiling per record.
+- `--parallel`: Number of concurrent requests.
+- `--output`: Destination file.
+- `--cache-ttl`: Override default caching window.
+
+## Features
+- Automatic batching for provider rate limits.
+- Field-level confidence scoring and attribution to provider.
+- Retry + fallback strategy when providers fail.
+- Progress reporting (records completed, credits consumed, ETA).
+
+---
--- a/commands/clean-database.md
+++ b/commands/clean-database.md
@@ -0,0 +1,35 @@
+---
+name: clean-database
+description: Normalize, deduplicate, and validate enriched datasets to maintain accuracy and compliance.
+usage: /data-enrichment:clean-database --input enriched.csv --rules rules.yaml
+---
+
+# Clean Database Command
+
+## Purpose
+Run data quality workflows (formatting, deduplication, validation, suppression) before syncing enriched records into downstream systems.
+
+## Syntax
+```bash
+/data-enrichment:clean-database \
+  --input enriched.csv \
+  --rules rules.yaml \
+  --output clean.csv \
+  --gdpr true
+```
+
+### Parameters
+- `--input`: Source CSV/JSON/Parquet file.
+- `--rules`: YAML/JSON config defining normalization rules, required fields, dedupe logic.
+- `--output`: File path or system destination (Salesforce, HubSpot, Snowflake).
+- `--gdpr`: Apply regional compliance filters (default true).
+- `--suppress-list`: Path to opt-out or customer suppression list.
+- `--format`: Output format (csv, json, parquet, api-sync).
+
+## Features
+- Email/phone format correction, country normalization, timezone calculation.
+- Deduping via fuzzy matching and configurable keys.
+- Confidence scoring and rejection report for records failing validation.
+- Audit log of transformations for compliance.
+
+---
--- a/commands/enrich-leads.md
+++ b/commands/enrich-leads.md
@@ -0,0 +1,35 @@
+---
+name: enrich-leads
+description: Enrich a single company or person record with firmographics, technographics,
+  and contact intelligence.
+usage: /data-enrichment:enrich --type company --domain "acme.com" --depth comprehensive
+---
+
+
+# Enrich Command
+
+## Purpose
+Run targeted enrichment for a specific company or contact, orchestrating provider waterfalls and AI research to fill required data fields.
+
+## Syntax
+```bash
+/data-enrichment:enrich \
+  --type <company|person> \
+  --domain "acme.com" \
+  --email "ceo@acme.com" \
+  --depth <basic|standard|comprehensive>
+```
+
+### Parameters
+- `--type`: company or person.
+- `--domain`: company domain.
+- `--email` / `--name` / `--company`: person identifiers.
+- `--depth`: determines provider sequence and credit budget.
+- `--providers`: optional custom order (comma-delimited).
+- `--include-intent`: attach intent data (default true).
+
+## Output
+- JSON record with firmographics, technographics, contacts, intent signals, and confidence scores.
+- Provider log + credit usage summary.
+
+---
--- a/commands/waterfall-enrichment.md
+++ b/commands/waterfall-enrichment.md
@@ -0,0 +1,335 @@
+---
+name: waterfall-enrichment
+description: Execute multi-provider enrichment waterfalls with credit-aware routing, validation, and export options.
+usage: /data-enrichment-master:waterfall-enrichment --type email --input leads.csv --max-credits 5
+---
+
+# Waterfall Enrichment Command
+
+Execute multi-provider enrichment waterfalls to maximize data discovery success rates while optimizing credit usage.
+
+## Command Syntax
+
+```bash
+/data-enrichment:waterfall --type <email|phone|company|full> --input <data> --max-credits <limit>
+```
+
+## Parameters
+
+- `--type`: Type of waterfall (email, phone, company, full)
+- `--input`: Input data (name+company, email, domain, CSV file)
+- `--max-credits`: Maximum credits to spend per record (default: 10)
+- `--providers`: Specific provider sequence (optional, uses optimized defaults)
+- `--validate`: Validate discovered data (default: true)
+- `--cache`: Use cached results (default: true, 30-day TTL)
+- `--parallel`: Process multiple records in parallel (default: true)
+- `--output`: Output format (json|csv|salesforce|hubspot)
+
+## Waterfall Sequences
+
+### Email Discovery Waterfall
+```yaml
+Default Sequence:
+  1. Cache Check (0 credits)
+  2. Apollo.io (1-2 credits)
+  3. Hunter (1-2 credits)
+  4. RocketReach (1-2 credits)
+  5. People Data Labs (1-2 credits)
+  6. ContactOut (1-2 credits)
+  7. Findymail (1-2 credits)
+  8. BetterContact (2-5 credits)
+  9. AI Web Research (2-5 credits)
+  
+Validation:
+  - ZeroBounce (0.5 credits)
+  - NeverBounce backup (0.5 credits)
+```
+
+### Phone Discovery Waterfall
+```yaml
+Default Sequence:
+  1. Cache Check (0 credits)
+  2. Apollo.io (1-2 credits)
+  3. RocketReach (1-2 credits)
+  4. LeadMagic (1-2 credits)
+  5. SignalHire (1-2 credits)
+  6. BetterContact Phone (2-5 credits)
+  7. People Data Labs (1-2 credits)
+  
+Validation:
+  - ClearoutPhone (0.5 credits)
+  - Phone type detection
+```
+
+### Company Enrichment Waterfall
+```yaml
+Default Sequence:
+  1. Clearbit (1-2 credits)
+  2. Ocean.io (2-3 credits)
+  3. ZoomInfo (2-3 credits) [if enterprise]
+  4. Crunchbase (1-2 credits) [if funded]
+  5. BuiltWith (1-2 credits) [technographics]
+  6. HG Insights (2-3 credits) [tech spend]
+  7. Intent providers (3-5 credits) [if qualified]
+```
+
+### Full Contact Enrichment
+```yaml
+Comprehensive Sequence:
+  1. Email discovery waterfall
+  2. Phone discovery waterfall
+  3. Social profile discovery
+  4. Company enrichment
+  5. Technographics
+  6. Intent signals
+  7. Validation & scoring
+```
+
+## Examples
+
+### Basic Email Discovery
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "John Smith, Acme Corp"
+```
+
+### Bulk Email Enrichment with Validation
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "prospects.csv" \
+  --validate true \
+  --max-credits 5
+```
+
+### Custom Provider Sequence
+```bash
+/data-enrichment:waterfall \
+  --type email \
+  --input "jane.doe@example.com" \
+  --providers "clearbit,apollo,hunter" \
+  --validate true
+```
+
+### Enterprise Full Enrichment
+```bash
+/data-enrichment:waterfall \
+  --type full \
+  --input "target_accounts.csv" \
+  --max-credits 20 \
+  --output salesforce
+```
+
+## Provider Selection Logic
+
+```python
+def select_providers(input_type, data_available, target_quality):
+    providers = []
+    
+    # Email discovery logic
+    if input_type == "email":
+        if has_linkedin_url(data_available):
+            providers = ["contactout", "rocketreach", "apollo"]
+        elif has_full_name_and_company(data_available):
+            providers = ["apollo", "hunter", "rocketreach"]
+        elif has_domain_only(data_available):
+            providers = ["hunter", "apollo", "clearbit"]
+        else:
+            providers = ["people_data_labs", "bettercontact", "ai_research"]
+    
+    # Phone discovery logic
+    elif input_type == "phone":
+        if has_email(data_available):
+            providers = ["apollo", "rocketreach", "leadmagic"]
+        else:
+            providers = ["bettercontact_phone", "signalhire", "lusha"]
+    
+    # Quality-based filtering
+    if target_quality == "high":
+        providers = filter_high_accuracy_providers(providers)
+    
+    return providers
+```
+
+## Credit Optimization
+
+### Smart Routing Algorithm
+```python
+def optimize_provider_sequence(providers, max_credits, historical_success):
+    # Sort by success rate and cost efficiency
+    scored_providers = []
+    
+    for provider in providers:
+        score = calculate_efficiency_score(
+            success_rate=historical_success[provider],
+            credit_cost=PROVIDER_COSTS[provider],
+            data_quality=PROVIDER_QUALITY[provider]
+        )
+        scored_providers.append((provider, score))
+    
+    # Sort by efficiency score
+    scored_providers.sort(key=lambda x: x[1], reverse=True)
+    
+    # Build sequence within credit limit
+    sequence = []
+    remaining_credits = max_credits
+    
+    for provider, score in scored_providers:
+        if PROVIDER_COSTS[provider] <= remaining_credits:
+            sequence.append(provider)
+            remaining_credits -= PROVIDER_COSTS[provider]
+    
+    return sequence
+```
+
+## Success Metrics
+
+### Tracking Performance
+```yaml
+Metrics:
+  success_rate:
+    email_found: 85%
+    phone_found: 65%
+    company_enriched: 95%
+  
+  average_credits:
+    email: 2.3 credits
+    phone: 3.1 credits
+    company: 4.5 credits
+    full_contact: 8.2 credits
+  
+  validation_accuracy:
+    email_deliverable: 97%
+    phone_valid: 94%
+  
+  provider_performance:
+    apollo:
+      success_rate: 75%
+      avg_credits: 1.5
+    hunter:
+      success_rate: 70%
+      avg_credits: 1.2
+    zoominfo:
+      success_rate: 90%
+      avg_credits: 2.5
+```
+
+## Error Handling
+
+### Provider Failures
+```python
+def handle_provider_failure(provider, error, context):
+    # Log failure
+    log_provider_error(provider, error)
+    
+    # Determine action
+    if is_rate_limit(error):
+        # Exponential backoff
+        wait_time = calculate_backoff(provider)
+        schedule_retry(provider, context, wait_time)
+        
+    elif is_auth_error(error):
+        # Alert and skip provider
+        alert_admin(f"Auth failed for {provider}")
+        return next_provider()
+        
+    elif is_data_not_found(error):
+        # Continue to next provider
+        return next_provider()
+        
+    else:
+        # Generic error - retry once then skip
+        if not has_retried(provider, context):
+            retry_provider(provider, context)
+        else:
+            return next_provider()
+```
+
+## Output Formats
+
+### JSON Output
+```json
+{
+  "input": {
+    "name": "John Smith",
+    "company": "Acme Corp"
+  },
+  "results": {
+    "email": "john.smith@acme.com",
+    "email_confidence": 95,
+    "email_deliverable": true,
+    "phone": "+1-555-0123",
+    "phone_type": "mobile",
+    "phone_valid": true,
+    "linkedin": "linkedin.com/in/johnsmith",
+    "providers_used": ["apollo", "zerobounce"],
+    "credits_used": 2.5
+  },
+  "metadata": {
+    "enriched_at": "2024-01-20T10:30:00Z",
+    "cache_hit": false,
+    "processing_time": 1.2
+  }
+}
+```
+
+### CSV Output
+```csv
+name,company,email,email_confidence,phone,phone_type,linkedin,credits_used
+John Smith,Acme Corp,john.smith@acme.com,95,+1-555-0123,mobile,linkedin.com/in/johnsmith,2.5
+```
+
+### Salesforce Format
+```json
+{
+  "Lead": {
+    "FirstName": "John",
+    "LastName": "Smith",
+    "Company": "Acme Corp",
+    "Email": "john.smith@acme.com",
+    "Phone": "+1-555-0123",
+    "LinkedIn__c": "linkedin.com/in/johnsmith",
+    "Enrichment_Score__c": 95,
+    "Last_Enriched__c": "2024-01-20T10:30:00Z"
+  }
+}
+```
+
+## Caching Strategy
+
+### Cache Management
+```python
+CACHE_CONFIG = {
+    "email": {
+        "ttl_days": 30,
+        "refresh_if_bounced": True
+    },
+    "phone": {
+        "ttl_days": 60,
+        "refresh_if_invalid": True
+    },
+    "company": {
+        "ttl_days": 90,
+        "refresh_on_trigger": ["funding", "acquisition", "ipo"]
+    },
+    "intent": {
+        "ttl_days": 7,
+        "always_refresh": True
+    }
+}
+```
+
+## Best Practices
+
+1. **Start with cached data** - Always check cache first
+2. **Set appropriate credit limits** - Balance cost vs. data quality
+3. **Use parallel processing** - For bulk enrichments
+4. **Validate critical data** - Especially emails before outreach
+5. **Monitor provider performance** - Adjust sequences based on success rates
+6. **Handle failures gracefully** - Automatic fallback to next provider
+7. **Track ROI** - Measure enrichment value vs. credit cost
+
+---
+
+*Execution model: claude-haiku-4-5 for provider routing, parallel processing for bulk operations*
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,81 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:gtmagents/gtm-agents:plugins/data-enrichment-master",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "46106e64a2b3a4f2a8a2926477f830886523471f",
+    "treeHash": "e2c4b96adfb0e9b253ed6f1b16cd707a03b49e293324feaf38c73e69cd2f517c",
+    "generatedAt": "2025-11-28T10:17:08.087484Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "data-enrichment-master",
+    "description": "Lead enrichment, firmographics, technographics, and data quality",
+    "version": "1.0.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "b1d8da1e1513410572e5f37c6946694f01cdb77142506934315448dcf81394b5"
+      },
+      {
+        "path": "agents/enrichment-expert.md",
+        "sha256": "4bbe5d32b4642cd6ea437d8120a4ebb138b52cfbaa985911ff745661082640fc"
+      },
+      {
+        "path": "agents/data-specialist.md",
+        "sha256": "5c8a7b3d649d8712934c8916529d4754ce66f93a9b0434f8eee57de61ec974ef"
+      },
+      {
+        "path": "agents/quality-analyst.md",
+        "sha256": "f9f8b4019709902d995162ed7607cb69953986fb05092c72072e6655d958a837"
+      },
+      {
+        "path": "agents/company-analyst.md",
+        "sha256": "aa92fdca8ac3c9be598cf1c2b9cfb0f882bcb48c7ec188f55f702aeb4c7209a5"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "4c30b6f5549e90d864a8695745873ee5a075aba3e9e1c016c8d3294317dbb415"
+      },
+      {
+        "path": "commands/clean-database.md",
+        "sha256": "b1a3140ed4e198d5fd9ef3175a7181a22eb12e608190f798ec8f10f77792071a"
+      },
+      {
+        "path": "commands/enrich-leads.md",
+        "sha256": "f071c3d89f550e69bd7a8a594ba3034081d3d33e732d4a6e7a98895aed5d3b57"
+      },
+      {
+        "path": "commands/waterfall-enrichment.md",
+        "sha256": "d87f8eba1eeab3b886f687a323f4be4ccf4d8ce1335c34b2464df07fd5069cc8"
+      },
+      {
+        "path": "commands/append-data.md",
+        "sha256": "64e75d5d78081f1a0bf0a967fe131481673566a536bccd0a788edb9189885ca9"
+      },
+      {
+        "path": "skills/data-sourcing/SKILL.md",
+        "sha256": "684a475b37c8e0c4b74874c900b56bd20c5605948f5395555b4821901ea1a12e"
+      },
+      {
+        "path": "skills/firmographic-analysis/SKILL.md",
+        "sha256": "e0c352e72eb5e15ecfb681d332aee542c7452016c82f4dc246b17406bf070d07"
+      }
+    ],
+    "dirSha256": "e2c4b96adfb0e9b253ed6f1b16cd707a03b49e293324feaf38c73e69cd2f517c"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/data-sourcing/SKILL.md
+++ b/skills/data-sourcing/SKILL.md
@@ -0,0 +1,316 @@
+---
+name: data-sourcing
+description: Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.
+---
+
+# Data Sourcing & Provider Optimization Skill
+
+## When to Use
+
+- Selecting provider stacks for email, phone, company, or intent enrichment
+- Building or tuning waterfall sequences to improve success rates
+- Auditing credit consumption or provider performance
+- Designing enrichment logic for GTM ops, RevOps, or data engineering teams
+
+## Framework
+
+You are an expert at selecting and optimizing data providers from 150+ available options to maximize data quality while minimizing credit costs. Use this layered framework to keep enrichment predictable and efficient.
+
+### Core Principles
+
+1. **Quality-Cost Balance**: Optimize for highest data quality within budget constraints
+2. **Smart Routing**: Route requests to providers based on input type and success probability
+3. **Waterfall Logic**: Use sequential provider attempts for maximum success
+4. **Caching Strategy**: Leverage cached data to reduce redundant API calls
+5. **Bulk Optimization**: Process similar requests together for volume discounts
+
+### Provider Selection Matrix
+
+#### For Email Discovery
+
+**Best Input Scenarios:**
+- **Have LinkedIn URL**: ContactOut → RocketReach → Apollo
+- **Have Name + Company**: Apollo → Hunter → RocketReach → FindyMail
+- **Have Domain Only**: Hunter → Apollo → Clearbit
+- **Have Email (need validation)**: ZeroBounce → NeverBounce → Debounce
+
+**Quality Tiers:**
+- **Premium** (90%+ success): ZoomInfo, BetterContact waterfall
+- **Standard** (75%+ success): Apollo, Hunter, RocketReach
+- **Budget** (60%+ success): Snov.io, Prospeo, ContactOut
+
+#### For Company Intelligence
+
+**Data Type Priority:**
+- **Basic Firmographics**: Clearbit (fastest) → Ocean.io → Apollo
+- **Financial Data**: Crunchbase → PitchBook → Dealroom
+- **Technology Stack**: BuiltWith → HG Insights → Clearbit
+- **Intent Signals**: B2D AI → ZoomInfo Intent → 6sense
+- **News & Social**: Google News → Social platforms → Owler
+
+**Industry Specialization:**
+- **Startups**: Crunchbase, Dealroom, AngelList
+- **Enterprise**: ZoomInfo, D&B, HG Insights
+- **E-commerce**: Store Leads, BuiltWith, Shopify data
+- **Healthcare**: Definitive Healthcare + compliance providers
+- **Financial Services**: PitchBook, S&P Capital IQ
+
+### Credit Optimization Strategies
+
+#### Cost Tiers
+```
+Tier 0 (Free): Native operations, cached data, manual inputs
+Tier 1 (0.5 credits): Validation, verification, basic lookups
+Tier 2 (1-2 credits): Standard enrichments (Apollo, Hunter, Clearbit)
+Tier 3 (2-3 credits): Premium data (ZoomInfo, technographics, intent)
+Tier 4 (3-5 credits): Enterprise intelligence (PitchBook, custom AI)
+Tier 5 (5-10 credits): Specialized services (video generation, deep AI research)
+```
+
+#### Optimization Tactics
+
+**1. Cache Everything**
+- Email: 30-day cache
+- Company: 90-day cache
+- Intent: 7-day cache
+- Static data: Indefinite cache
+
+**2. Batch Processing**
+```python
+# Process in batches for volume discounts
+if record_count > 1000:
+    use_provider("apollo_bulk")  # 10-30% discount
+elif record_count > 100:
+    use_parallel_processing()
+else:
+    use_standard_processing()
+```
+
+**3. Smart Waterfalls**
+```python
+waterfall_sequence = [
+    {"provider": "cache", "credits": 0},
+    {"provider": "apollo", "credits": 1.5, "stop_if_success": True},
+    {"provider": "hunter", "credits": 1.2, "stop_if_success": True},
+    {"provider": "bettercontact", "credits": 3, "stop_if_success": True},
+    {"provider": "ai_research", "credits": 5, "last_resort": True}
+]
+```
+
+### Provider-Specific Optimizations
+
+#### Apollo.io
+- **Strengths**: US B2B, LinkedIn data, phone numbers
+- **Weaknesses**: International coverage, personal emails
+- **Tips**: Use bulk API for 10%+ discount, batch similar companies
+
+#### ZoomInfo
+- **Strengths**: Enterprise data, org charts, intent signals
+- **Weaknesses**: Expensive, SMB coverage
+- **Tips**: Reserve for high-value accounts, negotiate enterprise deals
+
+#### Hunter
+- **Strengths**: Domain searches, email patterns, API reliability
+- **Weaknesses**: Phone numbers, detailed contact info
+- **Tips**: Best for initial domain exploration, use pattern detection
+
+#### Clearbit
+- **Strengths**: Real-time API, company data, speed
+- **Weaknesses**: Email discovery rates, phone numbers
+- **Tips**: Great for instant enrichment, combine with others for contacts
+
+#### BuiltWith
+- **Strengths**: Technology detection, historical data, e-commerce
+- **Weaknesses**: Contact information, company financials
+- **Tips**: Filter accounts by technology before enrichment
+
+### Waterfall Strategies
+
+#### Maximum Success Waterfall
+```yaml
+Priority: Success rate over cost
+Sequence:
+  1. BetterContact (aggregates 10+ sources)
+  2. ZoomInfo (if enterprise)
+  3. Apollo + Hunter + RocketReach
+  4. AI web research
+Expected Success: 95%+
+Average Cost: 8-12 credits
+```
+
+#### Balanced Waterfall
+```yaml
+Priority: Good success with reasonable cost
+Sequence:
+  1. Apollo.io
+  2. Hunter (if domain match)
+  3. RocketReach (if name match)
+  4. Stop or continue based on confidence
+Expected Success: 80%
+Average Cost: 3-5 credits
+```
+
+#### Budget Waterfall
+```yaml
+Priority: Minimize cost
+Sequence:
+  1. Cache check
+  2. Hunter (domain only)
+  3. Free sources (Google, LinkedIn public)
+  4. Stop at first result
+Expected Success: 60%
+Average Cost: 1-2 credits
+```
+
+### Quality Scoring Framework
+
+```python
+def calculate_data_quality_score(data, sources):
+    score = 0
+    
+    # Multi-source validation (30 points)
+    if len(sources) > 1:
+        score += min(len(sources) * 10, 30)
+    
+    # Data completeness (30 points)
+    required_fields = ["email", "phone", "title", "company"]
+    score += sum(10 for field in required_fields if data.get(field))
+    
+    # Verification status (20 points)
+    if data.get("email_verified"):
+        score += 10
+    if data.get("phone_verified"):
+        score += 10
+    
+    # Recency (20 points)
+    days_old = get_data_age(data)
+    if days_old < 30:
+        score += 20
+    elif days_old < 90:
+        score += 10
+    
+    return score
+```
+
+### Industry-Specific Provider Selection
+
+#### SaaS/Technology
+- Primary: Apollo, Clearbit, BuiltWith
+- Secondary: ZoomInfo, HG Insights
+- Intent: G2, TrustRadius, 6sense
+
+#### Financial Services
+- Primary: PitchBook, ZoomInfo
+- Compliance: LexisNexis, D&B
+- News: Bloomberg, Reuters
+
+#### Healthcare
+- Primary: Definitive Healthcare
+- Compliance: NPPES, state boards
+- Standard: ZoomInfo with healthcare filters
+
+#### E-commerce
+- Primary: Store Leads, BuiltWith
+- Platform-specific: Shopify, Amazon seller data
+- Standard: Clearbit with e-commerce signals
+
+### Troubleshooting Common Issues
+
+#### Low Email Discovery Rate
+- Check email patterns with Hunter
+- Try personal email providers
+- Use AI research for executives
+- Consider LinkedIn outreach instead
+
+#### High Credit Usage
+- Audit waterfall sequences
+- Increase cache TTL
+- Negotiate volume deals
+- Use native operations first
+
+#### Poor Data Quality
+- Add verification steps
+- Cross-reference multiple sources
+- Set minimum confidence thresholds
+- Implement human review for critical data
+
+### Advanced Techniques
+
+#### Hybrid Enrichment
+```python
+# Combine AI and traditional providers
+def hybrid_enrichment(company):
+    # Fast, cheap base data
+    base = clearbit_lookup(company)
+    
+    # AI for missing pieces
+    if not base.get("description"):
+        base["description"] = ai_generate_description(company)
+    
+    # Premium for high-value
+    if is_enterprise_account(base):
+        base.update(zoominfo_enrich(company))
+    
+    return base
+```
+
+#### Progressive Enrichment
+```python
+# Enrich in stages based on engagement
+def progressive_enrichment(lead):
+    # Stage 1: Basic (on import)
+    if lead.stage == "new":
+        return basic_enrichment(lead)  # 1-2 credits
+    
+    # Stage 2: Engaged (opened email)
+    elif lead.stage == "engaged":
+        return standard_enrichment(lead)  # 3-5 credits
+    
+    # Stage 3: Qualified (booked meeting)
+    elif lead.stage == "qualified":
+        return comprehensive_enrichment(lead)  # 10+ credits
+```
+
+## Templates
+- **Provider Cheat Sheet**: See `references/provider_cheat_sheet.md` for provider selection.
+- **Cost Calculator**: See `scripts/cost_calculator.py` for estimating credit usage.
+- **Integration Code Templates**:
+```javascript
+// JavaScript/Node.js template
+const enrichContact = async (name, company) => {
+  // Check cache first
+  const cached = await checkCache(name, company);
+  if (cached) return cached;
+  
+  // Try providers in sequence
+  const providers = ['apollo', 'hunter', 'rocketreach'];
+  
+  for (const provider of providers) {
+    try {
+      const result = await callProvider(provider, {name, company});
+      if (result.email) {
+        await saveToCache(result);
+        return result;
+      }
+    } catch (error) {
+      console.log(`${provider} failed, trying next...`);
+    }
+  }
+  
+  // Fallback to AI research
+  return await aiResearch(name, company);
+};
+```
+
+---
+
+## Tips
+
+- **Pre-build waterfalls per motion** so GTM teams can call a single orchestration command rather than juggling providers.
+- **Instrument cache hit rates**; alert RevOps when cache effectiveness drops below target to avoid spike in credits.
+- **Rotate premium providers** each quarter to negotiate better volume discounts and diversify coverage gaps.
+- **Pair enrichment with QA hooks** (e.g., verification APIs, sampling) before syncing into CRM to prevent bad data cascades.
+
+---
+
+*Progressive disclosure: Load full provider details and code examples only when actively optimizing enrichment workflows*
--- a/skills/firmographic-analysis/SKILL.md
+++ b/skills/firmographic-analysis/SKILL.md
@@ -0,0 +1,30 @@
+---
+name: firmographic-analysis
+description: Use when interpreting company-level enrichment data to segment accounts, spot buying triggers, and tailor outreach.
+---
+
+# Firmographic Analysis Skill
+
+## When to Use
+- Prioritizing enriched accounts for GTM plays.
+- Building segments for ABM, territory planning, or personalized campaigns.
+- Validating enriched firmographic data quality.
+
+## Framework
+1. **Normalize Fields** – ensure industry, size, revenue, region, and funding fields use consistent taxonomies.
+2. **Scoring Matrix** – apply ICP scoring (industry fit, employee band, revenue, growth rate).
+3. **Trigger Detection** – highlight events like funding, IPO prep, hiring spikes, geographic expansion.
+4. **Segment Mapping** – assign each company to journey stages or playbooks (e.g., "High-growth SaaS 200-500").
+5. **Recommendation Output** – produce persona targets, value props, and urgency level per segment.
+
+## Templates
+- Segment summary table (columns: segment, criteria, TAM, coverage owner, next action).
+- Trigger event log with timestamps/source, impact rating, and follow-up play.
+- Messaging workbook mapping persona × segment × proof points for instant enablement pulls.
+
+## Tips
+- Keep taxonomy dictionaries centrally managed so enrichment jobs and analytics share the same lookups.
+- Re-score accounts quarterly or after major firmographic events (funding, layoffs) to keep priorities fresh.
+- Pair quant scores with qualitative notes from AEs/CSMs to avoid over-rotating on enrichment data alone.
+
+---