Initial commit
This commit is contained in:
316
skills/data-sourcing/SKILL.md
Normal file
316
skills/data-sourcing/SKILL.md
Normal file
@@ -0,0 +1,316 @@
|
||||
---
|
||||
name: data-sourcing
|
||||
description: Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.
|
||||
---
|
||||
|
||||
# Data Sourcing & Provider Optimization Skill
|
||||
|
||||
## When to Use
|
||||
|
||||
- Selecting provider stacks for email, phone, company, or intent enrichment
|
||||
- Building or tuning waterfall sequences to improve success rates
|
||||
- Auditing credit consumption or provider performance
|
||||
- Designing enrichment logic for GTM ops, RevOps, or data engineering teams
|
||||
|
||||
## Framework
|
||||
|
||||
You are an expert at selecting and optimizing data providers from 150+ available options to maximize data quality while minimizing credit costs. Use this layered framework to keep enrichment predictable and efficient.
|
||||
|
||||
### Core Principles
|
||||
|
||||
1. **Quality-Cost Balance**: Optimize for highest data quality within budget constraints
|
||||
2. **Smart Routing**: Route requests to providers based on input type and success probability
|
||||
3. **Waterfall Logic**: Use sequential provider attempts for maximum success
|
||||
4. **Caching Strategy**: Leverage cached data to reduce redundant API calls
|
||||
5. **Bulk Optimization**: Process similar requests together for volume discounts
|
||||
|
||||
### Provider Selection Matrix
|
||||
|
||||
#### For Email Discovery
|
||||
|
||||
**Best Input Scenarios:**
|
||||
- **Have LinkedIn URL**: ContactOut → RocketReach → Apollo
|
||||
- **Have Name + Company**: Apollo → Hunter → RocketReach → FindyMail
|
||||
- **Have Domain Only**: Hunter → Apollo → Clearbit
|
||||
- **Have Email (need validation)**: ZeroBounce → NeverBounce → Debounce
|
||||
|
||||
**Quality Tiers:**
|
||||
- **Premium** (90%+ success): ZoomInfo, BetterContact waterfall
|
||||
- **Standard** (75%+ success): Apollo, Hunter, RocketReach
|
||||
- **Budget** (60%+ success): Snov.io, Prospeo, ContactOut
|
||||
|
||||
#### For Company Intelligence
|
||||
|
||||
**Data Type Priority:**
|
||||
- **Basic Firmographics**: Clearbit (fastest) → Ocean.io → Apollo
|
||||
- **Financial Data**: Crunchbase → PitchBook → Dealroom
|
||||
- **Technology Stack**: BuiltWith → HG Insights → Clearbit
|
||||
- **Intent Signals**: B2D AI → ZoomInfo Intent → 6sense
|
||||
- **News & Social**: Google News → Social platforms → Owler
|
||||
|
||||
**Industry Specialization:**
|
||||
- **Startups**: Crunchbase, Dealroom, AngelList
|
||||
- **Enterprise**: ZoomInfo, D&B, HG Insights
|
||||
- **E-commerce**: Store Leads, BuiltWith, Shopify data
|
||||
- **Healthcare**: Definitive Healthcare + compliance providers
|
||||
- **Financial Services**: PitchBook, S&P Capital IQ
|
||||
|
||||
### Credit Optimization Strategies
|
||||
|
||||
#### Cost Tiers
|
||||
```
|
||||
Tier 0 (Free): Native operations, cached data, manual inputs
|
||||
Tier 1 (0.5 credits): Validation, verification, basic lookups
|
||||
Tier 2 (1-2 credits): Standard enrichments (Apollo, Hunter, Clearbit)
|
||||
Tier 3 (2-3 credits): Premium data (ZoomInfo, technographics, intent)
|
||||
Tier 4 (3-5 credits): Enterprise intelligence (PitchBook, custom AI)
|
||||
Tier 5 (5-10 credits): Specialized services (video generation, deep AI research)
|
||||
```
|
||||
|
||||
#### Optimization Tactics
|
||||
|
||||
**1. Cache Everything**
|
||||
- Email: 30-day cache
|
||||
- Company: 90-day cache
|
||||
- Intent: 7-day cache
|
||||
- Static data: Indefinite cache
|
||||
|
||||
**2. Batch Processing**
|
||||
```python
|
||||
# Process in batches for volume discounts
|
||||
if record_count > 1000:
|
||||
use_provider("apollo_bulk") # 10-30% discount
|
||||
elif record_count > 100:
|
||||
use_parallel_processing()
|
||||
else:
|
||||
use_standard_processing()
|
||||
```
|
||||
|
||||
**3. Smart Waterfalls**
|
||||
```python
|
||||
waterfall_sequence = [
|
||||
{"provider": "cache", "credits": 0},
|
||||
{"provider": "apollo", "credits": 1.5, "stop_if_success": True},
|
||||
{"provider": "hunter", "credits": 1.2, "stop_if_success": True},
|
||||
{"provider": "bettercontact", "credits": 3, "stop_if_success": True},
|
||||
{"provider": "ai_research", "credits": 5, "last_resort": True}
|
||||
]
|
||||
```
|
||||
|
||||
### Provider-Specific Optimizations
|
||||
|
||||
#### Apollo.io
|
||||
- **Strengths**: US B2B, LinkedIn data, phone numbers
|
||||
- **Weaknesses**: International coverage, personal emails
|
||||
- **Tips**: Use bulk API for 10%+ discount, batch similar companies
|
||||
|
||||
#### ZoomInfo
|
||||
- **Strengths**: Enterprise data, org charts, intent signals
|
||||
- **Weaknesses**: Expensive, SMB coverage
|
||||
- **Tips**: Reserve for high-value accounts, negotiate enterprise deals
|
||||
|
||||
#### Hunter
|
||||
- **Strengths**: Domain searches, email patterns, API reliability
|
||||
- **Weaknesses**: Phone numbers, detailed contact info
|
||||
- **Tips**: Best for initial domain exploration, use pattern detection
|
||||
|
||||
#### Clearbit
|
||||
- **Strengths**: Real-time API, company data, speed
|
||||
- **Weaknesses**: Email discovery rates, phone numbers
|
||||
- **Tips**: Great for instant enrichment, combine with others for contacts
|
||||
|
||||
#### BuiltWith
|
||||
- **Strengths**: Technology detection, historical data, e-commerce
|
||||
- **Weaknesses**: Contact information, company financials
|
||||
- **Tips**: Filter accounts by technology before enrichment
|
||||
|
||||
### Waterfall Strategies
|
||||
|
||||
#### Maximum Success Waterfall
|
||||
```yaml
|
||||
Priority: Success rate over cost
|
||||
Sequence:
|
||||
1. BetterContact (aggregates 10+ sources)
|
||||
2. ZoomInfo (if enterprise)
|
||||
3. Apollo + Hunter + RocketReach
|
||||
4. AI web research
|
||||
Expected Success: 95%+
|
||||
Average Cost: 8-12 credits
|
||||
```
|
||||
|
||||
#### Balanced Waterfall
|
||||
```yaml
|
||||
Priority: Good success with reasonable cost
|
||||
Sequence:
|
||||
1. Apollo.io
|
||||
2. Hunter (if domain match)
|
||||
3. RocketReach (if name match)
|
||||
4. Stop or continue based on confidence
|
||||
Expected Success: 80%
|
||||
Average Cost: 3-5 credits
|
||||
```
|
||||
|
||||
#### Budget Waterfall
|
||||
```yaml
|
||||
Priority: Minimize cost
|
||||
Sequence:
|
||||
1. Cache check
|
||||
2. Hunter (domain only)
|
||||
3. Free sources (Google, LinkedIn public)
|
||||
4. Stop at first result
|
||||
Expected Success: 60%
|
||||
Average Cost: 1-2 credits
|
||||
```
|
||||
|
||||
### Quality Scoring Framework
|
||||
|
||||
```python
|
||||
def calculate_data_quality_score(data, sources):
|
||||
score = 0
|
||||
|
||||
# Multi-source validation (30 points)
|
||||
if len(sources) > 1:
|
||||
score += min(len(sources) * 10, 30)
|
||||
|
||||
# Data completeness (30 points)
|
||||
required_fields = ["email", "phone", "title", "company"]
|
||||
score += sum(10 for field in required_fields if data.get(field))
|
||||
|
||||
# Verification status (20 points)
|
||||
if data.get("email_verified"):
|
||||
score += 10
|
||||
if data.get("phone_verified"):
|
||||
score += 10
|
||||
|
||||
# Recency (20 points)
|
||||
days_old = get_data_age(data)
|
||||
if days_old < 30:
|
||||
score += 20
|
||||
elif days_old < 90:
|
||||
score += 10
|
||||
|
||||
return score
|
||||
```
|
||||
|
||||
### Industry-Specific Provider Selection
|
||||
|
||||
#### SaaS/Technology
|
||||
- Primary: Apollo, Clearbit, BuiltWith
|
||||
- Secondary: ZoomInfo, HG Insights
|
||||
- Intent: G2, TrustRadius, 6sense
|
||||
|
||||
#### Financial Services
|
||||
- Primary: PitchBook, ZoomInfo
|
||||
- Compliance: LexisNexis, D&B
|
||||
- News: Bloomberg, Reuters
|
||||
|
||||
#### Healthcare
|
||||
- Primary: Definitive Healthcare
|
||||
- Compliance: NPPES, state boards
|
||||
- Standard: ZoomInfo with healthcare filters
|
||||
|
||||
#### E-commerce
|
||||
- Primary: Store Leads, BuiltWith
|
||||
- Platform-specific: Shopify, Amazon seller data
|
||||
- Standard: Clearbit with e-commerce signals
|
||||
|
||||
### Troubleshooting Common Issues
|
||||
|
||||
#### Low Email Discovery Rate
|
||||
- Check email patterns with Hunter
|
||||
- Try personal email providers
|
||||
- Use AI research for executives
|
||||
- Consider LinkedIn outreach instead
|
||||
|
||||
#### High Credit Usage
|
||||
- Audit waterfall sequences
|
||||
- Increase cache TTL
|
||||
- Negotiate volume deals
|
||||
- Use native operations first
|
||||
|
||||
#### Poor Data Quality
|
||||
- Add verification steps
|
||||
- Cross-reference multiple sources
|
||||
- Set minimum confidence thresholds
|
||||
- Implement human review for critical data
|
||||
|
||||
### Advanced Techniques
|
||||
|
||||
#### Hybrid Enrichment
|
||||
```python
|
||||
# Combine AI and traditional providers
|
||||
def hybrid_enrichment(company):
|
||||
# Fast, cheap base data
|
||||
base = clearbit_lookup(company)
|
||||
|
||||
# AI for missing pieces
|
||||
if not base.get("description"):
|
||||
base["description"] = ai_generate_description(company)
|
||||
|
||||
# Premium for high-value
|
||||
if is_enterprise_account(base):
|
||||
base.update(zoominfo_enrich(company))
|
||||
|
||||
return base
|
||||
```
|
||||
|
||||
#### Progressive Enrichment
|
||||
```python
|
||||
# Enrich in stages based on engagement
|
||||
def progressive_enrichment(lead):
|
||||
# Stage 1: Basic (on import)
|
||||
if lead.stage == "new":
|
||||
return basic_enrichment(lead) # 1-2 credits
|
||||
|
||||
# Stage 2: Engaged (opened email)
|
||||
elif lead.stage == "engaged":
|
||||
return standard_enrichment(lead) # 3-5 credits
|
||||
|
||||
# Stage 3: Qualified (booked meeting)
|
||||
elif lead.stage == "qualified":
|
||||
return comprehensive_enrichment(lead) # 10+ credits
|
||||
```
|
||||
|
||||
## Templates
|
||||
- **Provider Cheat Sheet**: See `references/provider_cheat_sheet.md` for provider selection.
|
||||
- **Cost Calculator**: See `scripts/cost_calculator.py` for estimating credit usage.
|
||||
- **Integration Code Templates**:
|
||||
```javascript
|
||||
// JavaScript/Node.js template
|
||||
const enrichContact = async (name, company) => {
|
||||
// Check cache first
|
||||
const cached = await checkCache(name, company);
|
||||
if (cached) return cached;
|
||||
|
||||
// Try providers in sequence
|
||||
const providers = ['apollo', 'hunter', 'rocketreach'];
|
||||
|
||||
for (const provider of providers) {
|
||||
try {
|
||||
const result = await callProvider(provider, {name, company});
|
||||
if (result.email) {
|
||||
await saveToCache(result);
|
||||
return result;
|
||||
}
|
||||
} catch (error) {
|
||||
console.log(`${provider} failed, trying next...`);
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to AI research
|
||||
return await aiResearch(name, company);
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tips
|
||||
|
||||
- **Pre-build waterfalls per motion** so GTM teams can call a single orchestration command rather than juggling providers.
|
||||
- **Instrument cache hit rates**; alert RevOps when cache effectiveness drops below target to avoid spike in credits.
|
||||
- **Rotate premium providers** each quarter to negotiate better volume discounts and diversify coverage gaps.
|
||||
- **Pair enrichment with QA hooks** (e.g., verification APIs, sampling) before syncing into CRM to prevent bad data cascades.
|
||||
|
||||
---
|
||||
|
||||
*Progressive disclosure: Load full provider details and code examples only when actively optimizing enrichment workflows*
|
||||
30
skills/firmographic-analysis/SKILL.md
Normal file
30
skills/firmographic-analysis/SKILL.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
name: firmographic-analysis
|
||||
description: Use when interpreting company-level enrichment data to segment accounts, spot buying triggers, and tailor outreach.
|
||||
---
|
||||
|
||||
# Firmographic Analysis Skill
|
||||
|
||||
## When to Use
|
||||
- Prioritizing enriched accounts for GTM plays.
|
||||
- Building segments for ABM, territory planning, or personalized campaigns.
|
||||
- Validating enriched firmographic data quality.
|
||||
|
||||
## Framework
|
||||
1. **Normalize Fields** – ensure industry, size, revenue, region, and funding fields use consistent taxonomies.
|
||||
2. **Scoring Matrix** – apply ICP scoring (industry fit, employee band, revenue, growth rate).
|
||||
3. **Trigger Detection** – highlight events like funding, IPO prep, hiring spikes, geographic expansion.
|
||||
4. **Segment Mapping** – assign each company to journey stages or playbooks (e.g., "High-growth SaaS 200-500").
|
||||
5. **Recommendation Output** – produce persona targets, value props, and urgency level per segment.
|
||||
|
||||
## Templates
|
||||
- Segment summary table (columns: segment, criteria, TAM, coverage owner, next action).
|
||||
- Trigger event log with timestamps/source, impact rating, and follow-up play.
|
||||
- Messaging workbook mapping persona × segment × proof points for instant enablement pulls.
|
||||
|
||||
## Tips
|
||||
- Keep taxonomy dictionaries centrally managed so enrichment jobs and analytics share the same lookups.
|
||||
- Re-score accounts quarterly or after major firmographic events (funding, layoffs) to keep priorities fresh.
|
||||
- Pair quant scores with qualitative notes from AEs/CSMs to avoid over-rotating on enrichment data alone.
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user