Initial commit
This commit is contained in:
414
skills/reference-class-forecasting/resources/common-pitfalls.md
Normal file
414
skills/reference-class-forecasting/resources/common-pitfalls.md
Normal file
@@ -0,0 +1,414 @@
|
||||
# Common Pitfalls in Reference Class Forecasting
|
||||
|
||||
## The Traps That Even Experts Fall Into
|
||||
|
||||
---
|
||||
|
||||
## 1. Base Rate Neglect
|
||||
|
||||
### What It Is
|
||||
Ignoring statistical baselines in favor of specific case details.
|
||||
|
||||
### Example: The Taxi Problem
|
||||
|
||||
**Scenario:** A taxi was involved in a hit-and-run. Witness says it was Blue.
|
||||
- 85% of taxis in city are Green
|
||||
- 15% of taxis are Blue
|
||||
- Witness is 80% reliable in identifying colors
|
||||
|
||||
**Most people say:** 80% chance it was Blue (trusting the witness)
|
||||
|
||||
**Correct answer:**
|
||||
```
|
||||
P(Blue | Witness says Blue) = 41%
|
||||
```
|
||||
|
||||
Why? The **base rate** (15% Blue taxis) dominates the witness reliability.
|
||||
|
||||
### In Forecasting
|
||||
|
||||
**The Trap:**
|
||||
- Focusing on compelling story about this startup
|
||||
- Ignoring that 90% of startups fail
|
||||
|
||||
**The Fix:**
|
||||
- Start with 90% failure rate
|
||||
- Then adjust based on specific evidence
|
||||
- Weight base rate heavily (especially when base rate is extreme)
|
||||
|
||||
---
|
||||
|
||||
## 2. "This Time Is Different" Bias
|
||||
|
||||
### What It Is
|
||||
Belief that current situation is unique and historical patterns don't apply.
|
||||
|
||||
### Classic Examples
|
||||
|
||||
**Financial Markets:**
|
||||
- "This bubble is different" (said before every bubble burst)
|
||||
- "This time housing prices won't fall"
|
||||
- "This cryptocurrency is different from previous scams"
|
||||
|
||||
**Technology:**
|
||||
- "This social network will overtake Facebook" (dozens have failed)
|
||||
- "This time AI will achieve AGI" (predicted for 60 years)
|
||||
|
||||
**Startups:**
|
||||
- "Our team is better than average" (90% of founders say this)
|
||||
- "Our market timing is perfect" (survivorship bias)
|
||||
|
||||
### Why It Happens
|
||||
1. **Narrative fallacy:** Humans construct unique stories for everything
|
||||
2. **Overconfidence:** We overestimate our ability to judge uniqueness
|
||||
3. **Availability bias:** Recent cases feel more salient than statistical patterns
|
||||
4. **Ego:** Admitting you're "average" feels bad
|
||||
|
||||
### The Fix: The Reversal Test
|
||||
|
||||
**Question:** "If someone else claimed their case was unique for the same reasons I'm claiming, would I believe them?"
|
||||
|
||||
**If NO → You're applying special pleading to yourself**
|
||||
|
||||
### The Reality
|
||||
- ~95% of cases that feel unique are actually normal
|
||||
- The 5% that ARE unique still have some reference class (just more abstract)
|
||||
- Uniqueness is a matter of degree, not binary
|
||||
|
||||
---
|
||||
|
||||
## 3. Overfitting to Small Samples
|
||||
|
||||
### What It Is
|
||||
Drawing strong conclusions from limited data points.
|
||||
|
||||
### Example: The Hot Hand Fallacy
|
||||
|
||||
**Basketball:** Player makes 3 shots in a row → "Hot hand!"
|
||||
**Reality:** Random sequences produce streaks. 3-in-a-row is not evidence of skill change.
|
||||
|
||||
### In Reference Classes
|
||||
|
||||
**The Trap:**
|
||||
- Finding N = 5 companies similar to yours
|
||||
- All 5 succeeded
|
||||
- Concluding 100% success rate
|
||||
|
||||
**Why It's Wrong:**
|
||||
- Small samples have high variance
|
||||
- Sampling error dominates signal
|
||||
- Regression to the mean will occur
|
||||
|
||||
### The Fix: Minimum Sample Size Rule
|
||||
|
||||
**Minimum:** N ≥ 30 for meaningful statistics
|
||||
- N < 10: No statistical power
|
||||
- N = 10-30: Weak signal, wide confidence intervals
|
||||
- N > 30: Acceptable
|
||||
- N > 100: Good
|
||||
- N > 1000: Excellent
|
||||
|
||||
**If N < 30:**
|
||||
1. Widen reference class to get more data
|
||||
2. Acknowledge high uncertainty (wide CI)
|
||||
3. Don't trust extreme base rates from small samples
|
||||
|
||||
---
|
||||
|
||||
## 4. Survivorship Bias
|
||||
|
||||
### What It Is
|
||||
Only looking at cases that "survived" and ignoring failures.
|
||||
|
||||
### Classic Example: WW2 Bomber Armor
|
||||
|
||||
**Observation:** Returning bombers had bullet holes in wings and tail
|
||||
**Naive conclusion:** Reinforce wings and tail
|
||||
**Truth:** Planes shot in the engine didn't return (survivorship bias)
|
||||
**Correct action:** Reinforce engines
|
||||
|
||||
### In Reference Classes
|
||||
|
||||
**The Trap:**
|
||||
- "Reference class = Successful tech companies" (Apple, Google, Microsoft)
|
||||
- Base rate = 100% success
|
||||
- Ignoring all the failed companies
|
||||
|
||||
**Why It's Wrong:**
|
||||
You're only looking at winners, which biases the base rate upward.
|
||||
|
||||
### The Fix: Include All Attempts
|
||||
|
||||
**Correct reference class:**
|
||||
- "All companies that TRIED to do X"
|
||||
- Not "All companies that SUCCEEDED at X"
|
||||
|
||||
**Example:**
|
||||
- Wrong: "Companies like Facebook" → 100% success
|
||||
- Right: "Social network startups 2004-2010" → 5% success
|
||||
|
||||
---
|
||||
|
||||
## 5. Regression to the Mean Neglect
|
||||
|
||||
### What It Is
|
||||
Failing to expect extreme outcomes to return to average.
|
||||
|
||||
### The Sports Illustrated Jinx
|
||||
|
||||
**Observation:** Athletes on SI cover often perform worse after
|
||||
**Naive explanation:** Curse, pressure, jinx
|
||||
**Truth:** They were on cover BECAUSE of extreme (lucky) performance → Regression to mean is inevitable
|
||||
|
||||
### In Forecasting
|
||||
|
||||
**The Trap:**
|
||||
- Company has amazing quarter → Predict continued amazing performance
|
||||
- Ignoring: Some of that was luck, which won't repeat
|
||||
|
||||
**The Fix: Regression Formula**
|
||||
```
|
||||
Predicted = Mean + r × (Observed - Mean)
|
||||
```
|
||||
|
||||
Where `r` = skill/(skill + luck)
|
||||
|
||||
**If 50% luck, 50% skill:**
|
||||
```
|
||||
Predicted = Halfway between observed and mean
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
**Startup with $10M ARR in Year 1:**
|
||||
- Mean for seed startups: $500K ARR
|
||||
- Observed: $10M (extreme!)
|
||||
- Likely some luck (viral moment, lucky timing)
|
||||
- Predicted Year 2: Somewhere between $500K and $10M, not $20M
|
||||
|
||||
**Student who aces one test:**
|
||||
- Don't predict straight A's (could have been easy test, lucky guessing)
|
||||
- Predict performance closer to class average
|
||||
|
||||
---
|
||||
|
||||
## 6. Availability Bias in Class Selection
|
||||
|
||||
### What It Is
|
||||
Choosing reference class based on what's memorable, not what's statistically valid.
|
||||
|
||||
### Example: Terrorism Risk
|
||||
|
||||
**After 9/11:**
|
||||
- Availability: Terrorism is extremely salient
|
||||
- Many people chose reference class: "Major terrorist attacks on US"
|
||||
- Base rate felt high (because it was memorable)
|
||||
|
||||
**Reality:**
|
||||
- Correct reference class: "Risk of death from various causes"
|
||||
- Terrorism: ~0.0001% annual risk
|
||||
- Car accidents: ~0.01% annual risk (100× higher)
|
||||
|
||||
### The Fix
|
||||
1. **Don't use "what comes to mind" as reference class**
|
||||
2. **Use systematic search** for historical data
|
||||
3. **Weight by frequency, not memorability**
|
||||
|
||||
---
|
||||
|
||||
## 7. Confirmation Bias in Reference Class Selection
|
||||
|
||||
### What It Is
|
||||
Choosing a reference class that supports your pre-existing belief.
|
||||
|
||||
### Example: Political Predictions
|
||||
|
||||
**If you want candidate X to win:**
|
||||
- Choose reference class: "Elections where candidate had high favorability"
|
||||
- Ignore reference class: "Elections in this demographic"
|
||||
|
||||
**If you want candidate X to lose:**
|
||||
- Choose reference class: "Candidates with this scandal type"
|
||||
- Ignore reference class: "Incumbents in strong economy"
|
||||
|
||||
### The Fix: Blind Selection
|
||||
|
||||
**Process:**
|
||||
1. Choose reference class BEFORE you know the base rate
|
||||
2. Write down selection criteria
|
||||
3. Only then look up the base rate
|
||||
4. Stick with it even if you don't like the answer
|
||||
|
||||
---
|
||||
|
||||
## 8. Ignoring Time Decay
|
||||
|
||||
### What It Is
|
||||
Using old data when conditions have changed.
|
||||
|
||||
### Example: Newspaper Industry
|
||||
|
||||
**Reference class:** "Newspaper companies, 1950-2000"
|
||||
**Base rate:** ~90% profitability
|
||||
|
||||
**Problem:** Internet fundamentally changed the industry
|
||||
**Reality in 2010s:** ~10% profitability
|
||||
|
||||
### When Time Decay Matters
|
||||
|
||||
**High decay (don't use old data):**
|
||||
- Technology industries (5-year half-life)
|
||||
- Regulatory changes (e.g., crypto)
|
||||
- Structural market shifts (e.g., remote work post-COVID)
|
||||
|
||||
**Low decay (old data OK):**
|
||||
- Human behavior (still same evolutionary psychology)
|
||||
- Physical laws (still same physics)
|
||||
- Basic business economics (margins, etc.)
|
||||
|
||||
### The Fix
|
||||
1. **Use last 5-10 years** as default
|
||||
2. **Segment by era** if structural change occurred
|
||||
3. **Weight recent data more heavily**
|
||||
|
||||
---
|
||||
|
||||
## 9. Causation Confusion
|
||||
|
||||
### What It Is
|
||||
Including features in reference class that correlate but don't cause outcomes.
|
||||
|
||||
### Example: Ice Cream and Drowning
|
||||
|
||||
**Observation:** Ice cream sales correlate with drowning deaths
|
||||
**Naive reference class:** "Days with high ice cream sales"
|
||||
**Base rate:** Higher drowning rate
|
||||
|
||||
**Problem:** Ice cream doesn't cause drowning
|
||||
**Confound:** Summer weather causes both
|
||||
|
||||
### In Forecasting
|
||||
|
||||
**The Trap:**
|
||||
Adding irrelevant details to reference class:
|
||||
- "Startups founded by left-handed CEOs"
|
||||
- "Projects started on Tuesdays"
|
||||
- "People born under Scorpio"
|
||||
|
||||
**Why It's Wrong:**
|
||||
These features don't causally affect outcomes, they just add noise.
|
||||
|
||||
### The Fix: Causal Screening
|
||||
|
||||
**Ask:** "Does this feature **cause** different outcomes, or just **correlate**?"
|
||||
|
||||
**Include if causal:**
|
||||
- Business model (causes different unit economics)
|
||||
- Market size (causes different TAM)
|
||||
- Technology maturity (causes different risk)
|
||||
|
||||
**Exclude if just correlation:**
|
||||
- Founder's birthday
|
||||
- Office location aesthetics
|
||||
- Random demographic details
|
||||
|
||||
---
|
||||
|
||||
## 10. Narrow Framing
|
||||
|
||||
### What It Is
|
||||
Defining the reference class too narrowly around the specific case.
|
||||
|
||||
### Example
|
||||
|
||||
**You're forecasting:** "Will this specific bill pass Congress?"
|
||||
|
||||
**Too narrow:** "Bills with this exact text in this specific session"
|
||||
→ N = 1, no data
|
||||
|
||||
**Better:** "Bills of this type in similar political climate"
|
||||
→ N = 50, usable data
|
||||
|
||||
### The Fix
|
||||
If you can't find data, your reference class is too narrow. Go up one level of abstraction.
|
||||
|
||||
---
|
||||
|
||||
## 11. Extremeness Aversion
|
||||
|
||||
### What It Is
|
||||
Reluctance to use extreme base rates (close to 0% or 100%).
|
||||
|
||||
### Psychological Bias
|
||||
|
||||
**People feel uncomfortable saying:**
|
||||
- 5% chance
|
||||
- 95% chance
|
||||
|
||||
**They retreat to:**
|
||||
- 20% ("to be safe")
|
||||
- 80% ("to be safe")
|
||||
|
||||
**This is wrong.** If the base rate is 5%, START at 5%.
|
||||
|
||||
### When Extreme Base Rates Are Correct
|
||||
|
||||
- Drug development: 5-10% reach market
|
||||
- Lottery tickets: 0.0001% chance
|
||||
- Sun rising tomorrow: 99.9999% chance
|
||||
|
||||
### The Fix
|
||||
- **Trust the base rate** even if it feels extreme
|
||||
- Extreme base rates are still better than gut feelings
|
||||
- Use inside view to adjust, but don't automatically moderate
|
||||
|
||||
---
|
||||
|
||||
## 12. Scope Insensitivity
|
||||
|
||||
### What It Is
|
||||
Not adjusting forecasts proportionally to scale.
|
||||
|
||||
### Example: Charity Donations
|
||||
|
||||
**Study:** People willing to pay same amount to save:
|
||||
- 2,000 birds
|
||||
- 20,000 birds
|
||||
- 200,000 birds
|
||||
|
||||
**Problem:** Scope doesn't change emotional response
|
||||
|
||||
### In Reference Classes
|
||||
|
||||
**The Trap:**
|
||||
- "Startup raising $1M" feels same as "Startup raising $10M"
|
||||
- Reference class doesn't distinguish by scale
|
||||
- Base rate is wrong
|
||||
|
||||
**The Fix:**
|
||||
- Segment reference class by scale
|
||||
- "$1M raises" have different success rate than "$10M raises"
|
||||
- Make sure scope is specific
|
||||
|
||||
---
|
||||
|
||||
## Avoiding Pitfalls: Quick Checklist
|
||||
|
||||
Before finalizing your reference class, check:
|
||||
|
||||
- [ ] **Not neglecting base rate** - Started with statistics, not story
|
||||
- [ ] **Not claiming uniqueness** - Passed reversal test
|
||||
- [ ] **Sample size ≥ 30** - Not overfitting to small sample
|
||||
- [ ] **Included failures** - No survivorship bias
|
||||
- [ ] **Expected regression** - Extreme outcomes won't persist
|
||||
- [ ] **Systematic selection** - Not using availability bias
|
||||
- [ ] **Chose class first** - Not confirmation bias
|
||||
- [ ] **Recent data** - Not ignoring time decay
|
||||
- [ ] **Causal features only** - Not including random correlations
|
||||
- [ ] **Right level of abstraction** - Not too narrow
|
||||
- [ ] **OK with extremes** - Not moderating extreme base rates
|
||||
- [ ] **Scope-specific** - Scale is accounted for
|
||||
|
||||
---
|
||||
|
||||
**Return to:** [Main Skill](../SKILL.md#interactive-menu)
|
||||
@@ -0,0 +1,219 @@
|
||||
# Outside View Principles
|
||||
|
||||
## Theory and Foundation
|
||||
|
||||
### What is the Outside View?
|
||||
|
||||
The **Outside View** is a forecasting method that relies on statistical baselines from similar historical cases rather than detailed analysis of the specific case at hand.
|
||||
|
||||
**Coined by:** Daniel Kahneman and Amos Tversky
|
||||
**Alternative names:** Reference class forecasting, actuarial prediction, statistical prediction
|
||||
|
||||
---
|
||||
|
||||
## The Two Views Framework
|
||||
|
||||
### Inside View (The Trap)
|
||||
- Focuses on unique details of the specific case
|
||||
- Constructs causal narratives
|
||||
- Emphasizes what makes "this time different"
|
||||
- Relies on expert intuition and judgment
|
||||
- Feels more satisfying and controllable
|
||||
|
||||
**Example:** "Our startup will succeed because we have a great team, unique technology, strong market timing, and passionate founders."
|
||||
|
||||
### Outside View (The Discipline)
|
||||
- Focuses on statistical patterns from similar cases
|
||||
- Ignores unique narratives initially
|
||||
- Emphasizes what usually happens to things like this
|
||||
- Relies on base rates and frequencies
|
||||
- Feels cold and impersonal
|
||||
|
||||
**Example:** "Seed-stage B2B SaaS startups have a 10% success rate. We start at 10%."
|
||||
|
||||
---
|
||||
|
||||
## Why the Outside View Wins
|
||||
|
||||
### Research Evidence
|
||||
|
||||
**The Planning Fallacy Study (Kahneman)**
|
||||
- Students asked to predict thesis completion time
|
||||
- Inside view: Average prediction = 33 days
|
||||
- Actual average: 55 days
|
||||
- Outside view (based on past students): 48 days
|
||||
- **Result:** Outside view was 7× more accurate than inside view
|
||||
|
||||
**Expert Predictions vs Base Rates**
|
||||
- Expert forecasters using inside view: 60% accuracy
|
||||
- Simple base rate models: 70% accuracy
|
||||
- **Result:** Ignoring expert judgment improves predictions
|
||||
|
||||
**Why Experts Fail:**
|
||||
1. **Overweight unique details** (availability bias)
|
||||
2. **Construct plausible narratives** (hindsight bias)
|
||||
3. **Underweight statistical patterns** (base rate neglect)
|
||||
4. **Overconfident in causal understanding** (illusion of control)
|
||||
|
||||
---
|
||||
|
||||
## When Outside View Works Best
|
||||
|
||||
### High-Signal Situations
|
||||
✓ Large historical datasets exist
|
||||
✓ Cases are reasonably similar
|
||||
✓ Outcomes are measurable
|
||||
✓ No major structural changes
|
||||
✓ Randomness plays a significant role
|
||||
|
||||
**Examples:**
|
||||
- Startup success rates
|
||||
- Construction project delays
|
||||
- Drug approval timelines
|
||||
- Movie box office performance
|
||||
- Sports team performance
|
||||
|
||||
---
|
||||
|
||||
## When Outside View Fails
|
||||
|
||||
### Low-Signal Situations
|
||||
✗ Truly novel events (no reference class)
|
||||
✗ Structural regime changes (e.g., new technology disrupts all patterns)
|
||||
✗ Extremely heterogeneous reference class
|
||||
✗ Small sample sizes (N < 20)
|
||||
✗ Deterministic physics-based systems
|
||||
|
||||
**Examples:**
|
||||
- First moon landing (no reference class)
|
||||
- Pandemic with novel pathogen (limited reference class)
|
||||
- Cryptocurrency regulation (regime change)
|
||||
- Your friend's personality (N = 1)
|
||||
|
||||
**What to do:** Use outside view as starting point, then heavily weight specific evidence
|
||||
|
||||
---
|
||||
|
||||
## Statistical Thinking vs Narrative Thinking
|
||||
|
||||
### Narrative Thinking (Human Default)
|
||||
- Brain constructs causal stories
|
||||
- Connects dots into coherent explanations
|
||||
- Feels satisfying and convincing
|
||||
- **Problem:** Narratives are selected for coherence, not accuracy
|
||||
|
||||
**Example narrative:** "Startup X will fail because the CEO is inexperienced, the market is crowded, and they're burning cash."
|
||||
|
||||
This might be true, but:
|
||||
- Experienced CEOs also fail
|
||||
- Crowded markets have winners
|
||||
- Cash burn is normal for startups
|
||||
|
||||
The narrative cherry-picks evidence.
|
||||
|
||||
### Statistical Thinking (Discipline Required)
|
||||
- Brain resists cold numbers
|
||||
- Requires active effort to override intuition
|
||||
- Feels unsatisfying and reductive
|
||||
- **Benefit:** Statistics aggregate all past evidence, not just confirming cases
|
||||
|
||||
**Example statistical:** "80% of startups with this profile fail within 3 years. Start at 80% failure probability."
|
||||
|
||||
---
|
||||
|
||||
## The Planning Fallacy in Depth
|
||||
|
||||
### What It Is
|
||||
Systematic tendency to underestimate time, costs, and risks while overestimating benefits.
|
||||
|
||||
### Why It Happens
|
||||
1. **Focus on success plan:** Ignore failure modes
|
||||
2. **Best-case scenario bias:** Assume things go smoothly
|
||||
3. **Neglect of base rates:** "Our project is different"
|
||||
4. **Anchoring on ideal conditions:** Forget reality intrudes
|
||||
|
||||
### The Fix: Outside View
|
||||
Instead of asking "How long will our project take?" ask:
|
||||
- "How long did similar projects take?"
|
||||
- "What was the distribution of outcomes?"
|
||||
- "What percentage ran late? By how much?"
|
||||
|
||||
**Rule:** Assume your project is **average** for its class until proven otherwise.
|
||||
|
||||
---
|
||||
|
||||
## Regression to the Mean
|
||||
|
||||
### The Phenomenon
|
||||
Extreme outcomes tend to be followed by more average outcomes.
|
||||
|
||||
**Examples:**
|
||||
- Hot hand in basketball → Returns to average
|
||||
- Stellar quarterly earnings → Next quarter closer to mean
|
||||
- Brilliant startup idea → Execution regresses to mean
|
||||
|
||||
### Implication for Forecasting
|
||||
If you're predicting based on an extreme observation:
|
||||
- **Adjust toward the mean** unless you have evidence the extreme is sustainable
|
||||
- Extreme outcomes are often luck + skill; luck doesn't persist
|
||||
|
||||
**Formula:**
|
||||
```
|
||||
Predicted = Mean + r × (Observed - Mean)
|
||||
```
|
||||
Where `r` = correlation (skill component)
|
||||
|
||||
If 50% skill, 50% luck → r = 0.5 → Expect halfway between observed and mean
|
||||
|
||||
---
|
||||
|
||||
## Integration with Inside View
|
||||
|
||||
### The Proper Sequence
|
||||
|
||||
**Phase 1: Outside View (Base Rate)**
|
||||
1. Identify reference class
|
||||
2. Find base rate
|
||||
3. Set starting probability = base rate
|
||||
|
||||
**Phase 2: Inside View (Adjustment)**
|
||||
4. Identify specific evidence
|
||||
5. Calculate how much evidence shifts probability
|
||||
6. Apply Bayesian update
|
||||
|
||||
**Phase 3: Calibration**
|
||||
7. Check confidence intervals
|
||||
8. Stress test with premortem
|
||||
9. Remove biases
|
||||
|
||||
**Never skip Phase 1.** Even if you plan to heavily adjust, the base rate is your anchor.
|
||||
|
||||
---
|
||||
|
||||
## Common Objections (And Rebuttals)
|
||||
|
||||
### "But my case really IS different!"
|
||||
**Response:** Maybe. But 90% of people say this, and 90% are wrong. Prove it with evidence, not narrative.
|
||||
|
||||
### "Base rates are too pessimistic!"
|
||||
**Response:** Optimism doesn't change reality. If the base rate is 10%, being optimistic doesn't make it 50%.
|
||||
|
||||
### "I have insider knowledge!"
|
||||
**Response:** Great! Use Bayesian updating to adjust from the base rate. But start with the base rate.
|
||||
|
||||
### "This feels too mechanical!"
|
||||
**Response:** Good forecasting should feel mechanical. Intuition is for generating hypotheses, not estimating probabilities.
|
||||
|
||||
---
|
||||
|
||||
## Practical Takeaways
|
||||
|
||||
1. **Always start with base rate** - Non-negotiable
|
||||
2. **Resist narrative seduction** - Stories feel true but aren't predictive
|
||||
3. **Expect regression to mean** - Extreme outcomes are temporary
|
||||
4. **Use inside view as update** - Not replacement for outside view
|
||||
5. **Trust frequencies over judgment** - Especially when N is large
|
||||
|
||||
---
|
||||
|
||||
**Return to:** [Main Skill](../SKILL.md#interactive-menu)
|
||||
@@ -0,0 +1,357 @@
|
||||
# Reference Class Selection Guide
|
||||
|
||||
## The Art and Science of Choosing Comparison Sets
|
||||
|
||||
Selecting the right reference class is the most critical judgment call in forecasting. Too broad and the base rate is meaningless. Too narrow and you have no data.
|
||||
|
||||
---
|
||||
|
||||
## The Goldilocks Principle
|
||||
|
||||
### Too Broad
|
||||
**Problem:** High variance, low signal
|
||||
**Example:** "Companies" as reference class for a fintech startup
|
||||
**Base rate:** ~50% fail? 90% fail? Meaningless.
|
||||
**Why it fails:** Includes everything from lemonade stands to Apple
|
||||
|
||||
### Too Narrow
|
||||
**Problem:** No data, overfitting
|
||||
**Example:** "Fintech startups founded in Q2 2024 by Stanford CS grads in SF"
|
||||
**Base rate:** N = 3 companies, no outcomes yet
|
||||
**Why it fails:** So specific there's no statistical pattern
|
||||
|
||||
### Just Right
|
||||
**Sweet spot:** Specific enough to be homogeneous, broad enough to have data
|
||||
**Example:** "Seed-stage B2B SaaS startups in financial services"
|
||||
**Base rate:** Can find N = 200+ companies with 5-year outcomes
|
||||
**Why it works:** Specific enough to be meaningful, broad enough for statistics
|
||||
|
||||
---
|
||||
|
||||
## Systematic Selection Method
|
||||
|
||||
### Step 1: Define the Core Entity Type
|
||||
|
||||
**Question:** What is the fundamental category?
|
||||
|
||||
**Examples:**
|
||||
- Company (startup, public company, nonprofit)
|
||||
- Project (software, construction, research)
|
||||
- Person (athlete, politician, scientist)
|
||||
- Event (election, war, natural disaster)
|
||||
- Product (consumer, enterprise, service)
|
||||
|
||||
**Output:** "This is a [TYPE]"
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Add Specificity Layers
|
||||
|
||||
Work through these dimensions **in order of importance:**
|
||||
|
||||
#### Layer 1: Stage/Size
|
||||
- Startups: Pre-seed, Seed, Series A, B, C, Growth
|
||||
- Projects: Small (<$1M), Medium ($1-10M), Large (>$10M)
|
||||
- People: Beginner, Intermediate, Expert
|
||||
- Products: MVP, Version 1.0, Mature
|
||||
|
||||
#### Layer 2: Category/Domain
|
||||
- Startups: B2B, B2C, B2B2C
|
||||
- Industry: Fintech, Healthcare, SaaS, Hardware
|
||||
- Projects: Software, Construction, Pharmaceutical
|
||||
- People: Role (CEO, Engineer, Designer)
|
||||
|
||||
#### Layer 3: Geography/Market
|
||||
- US, Europe, Global
|
||||
- Urban, Rural, Suburban
|
||||
- Developed, Emerging markets
|
||||
|
||||
#### Layer 4: Time Period
|
||||
- Current decade (2020s)
|
||||
- Previous decade (2010s)
|
||||
- Historical (pre-2010)
|
||||
|
||||
**Output:** "This is a [Stage] [Category] [Geography] [Type] from [Time Period]"
|
||||
|
||||
**Example:** "This is a Seed-stage B2B SaaS startup in the US from 2020-2024"
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Test for Data Availability
|
||||
|
||||
**Search queries:**
|
||||
```
|
||||
"[Reference Class] success rate"
|
||||
"[Reference Class] statistics"
|
||||
"[Reference Class] survival rate"
|
||||
"How many [Reference Class] succeed"
|
||||
```
|
||||
|
||||
**Data availability check:**
|
||||
- ✓ Found published studies/reports → Good reference class
|
||||
- ⚠ Found anecdotal data → Usable but weak
|
||||
- ✗ No data found → Reference class too narrow
|
||||
|
||||
**If no data:** Remove least important specificity layer and retry
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Validate Homogeneity
|
||||
|
||||
**Question:** Are members of this class similar enough that averaging makes sense?
|
||||
|
||||
**Test: Variance Check**
|
||||
If you have outcome data, calculate variance:
|
||||
- Low variance → Good reference class (outcomes cluster)
|
||||
- High variance → Bad reference class (outcomes all over the place)
|
||||
|
||||
**Heuristic: The Substitution Test**
|
||||
Pick any two members of the reference class at random.
|
||||
|
||||
**Ask:** "If I swapped one for the other, would the prediction change dramatically?"
|
||||
- No → Good homogeneity
|
||||
- Yes → Too broad, needs subdivision
|
||||
|
||||
**Example:**
|
||||
- "Tech startups" → Swap consumer mobile app for enterprise database company → Prediction changes drastically → **Too broad**
|
||||
- "Seed-stage B2B SaaS" → Swap CRM tool for analytics platform → Prediction mostly same → **Good homogeneity**
|
||||
|
||||
---
|
||||
|
||||
## Similarity Metrics
|
||||
|
||||
### When You Can't Find Exact Match
|
||||
|
||||
If no perfect reference class exists, use **similarity matching** to find nearest neighbors.
|
||||
|
||||
### Dimensions of Similarity
|
||||
|
||||
**For Startups:**
|
||||
1. Business model (B2B, B2C, marketplace, SaaS)
|
||||
2. Revenue model (subscription, transaction, ads)
|
||||
3. Stage/funding (seed, Series A, etc.)
|
||||
4. Team size
|
||||
5. Market size
|
||||
6. Technology complexity
|
||||
|
||||
**For Projects:**
|
||||
1. Size (budget, team size, duration)
|
||||
2. Complexity (simple, moderate, complex)
|
||||
3. Technology maturity (proven, emerging, experimental)
|
||||
4. Team experience
|
||||
5. Dependencies (few, many)
|
||||
|
||||
**For People:**
|
||||
1. Experience level
|
||||
2. Domain expertise
|
||||
3. Resources available
|
||||
4. Historical track record
|
||||
5. Contextual factors (support, environment)
|
||||
|
||||
### Similarity Scoring
|
||||
|
||||
**Method: Nearest Neighbors**
|
||||
|
||||
1. List all dimensions of similarity (5-7 dimensions)
|
||||
2. For each dimension, score how similar the case is to reference class (0-10)
|
||||
3. Average the scores
|
||||
4. Threshold: If similarity < 7/10, the reference class may not apply
|
||||
|
||||
**Example:**
|
||||
Comparing "AI startup in 2024" to "Software startups 2010-2020" reference class:
|
||||
- Business model: 9/10 (same)
|
||||
- Revenue model: 8/10 (mostly SaaS)
|
||||
- Technology maturity: 4/10 (AI is newer)
|
||||
- Market size: 7/10 (comparable)
|
||||
- Team size: 8/10 (similar)
|
||||
- Funding environment: 5/10 (tighter in 2024)
|
||||
|
||||
**Average: 6.8/10** → Marginal reference class; use with caution
|
||||
|
||||
---
|
||||
|
||||
## Edge Cases and Judgment Calls
|
||||
|
||||
### Case 1: Structural Regime Change
|
||||
|
||||
**Problem:** Conditions have changed fundamentally since historical data
|
||||
|
||||
**Examples:**
|
||||
- Pre-internet vs post-internet business
|
||||
- Pre-COVID vs post-COVID work patterns
|
||||
- Pre-AI vs post-AI software development
|
||||
|
||||
**Solution:**
|
||||
1. Segment data by era if possible
|
||||
2. Use most recent data only
|
||||
3. Adjust base rate for known structural differences
|
||||
4. Increase uncertainty bounds
|
||||
|
||||
---
|
||||
|
||||
### Case 2: The N=1 Problem
|
||||
|
||||
**Problem:** The case is literally unique (first of its kind)
|
||||
|
||||
**Examples:**
|
||||
- First moon landing
|
||||
- First pandemic of a novel pathogen
|
||||
- First AGI system
|
||||
|
||||
**Solution:**
|
||||
1. **Widen the class** - Go up one abstraction level
|
||||
- "First moon landing" → "First major engineering projects"
|
||||
- "Novel pandemic" → "Past pandemics of any type"
|
||||
2. **Component decomposition** - Break into parts that have reference classes
|
||||
- "Moon landing" → Rocket success rate × Navigation accuracy × Life support reliability
|
||||
3. **Expert aggregation** - When no data, aggregate expert predictions (but with humility)
|
||||
|
||||
---
|
||||
|
||||
### Case 3: Multiple Plausible Reference Classes
|
||||
|
||||
**Problem:** Event could belong to multiple classes with different base rates
|
||||
|
||||
**Example:** "Elon Musk starting a brain-computer interface company"
|
||||
|
||||
Possible reference classes:
|
||||
- "Startups by serial entrepreneurs" → 40% success
|
||||
- "Medical device startups" → 10% success
|
||||
- "Moonshot technology ventures" → 5% success
|
||||
- "Companies founded by Elon Musk" → 80% success
|
||||
|
||||
**Solution: Ensemble Averaging**
|
||||
1. Identify all plausible reference classes
|
||||
2. Find base rate for each
|
||||
3. Weight by relevance/similarity
|
||||
4. Calculate weighted average
|
||||
|
||||
**Example weights:**
|
||||
- Medical device (40%): 10% × 0.4 = 4%
|
||||
- Moonshot tech (30%): 5% × 0.3 = 1.5%
|
||||
- Serial entrepreneur (20%): 40% × 0.2 = 8%
|
||||
- Elon track record (10%): 80% × 0.1 = 8%
|
||||
|
||||
**Weighted base rate: 21.5%**
|
||||
|
||||
---
|
||||
|
||||
## Common Selection Mistakes
|
||||
|
||||
### Mistake 1: Cherry-Picking Success Examples
|
||||
**What it looks like:** "Reference class = Companies like Apple, Google, Facebook"
|
||||
**Why it's wrong:** Survivorship bias - only looking at winners
|
||||
**Fix:** Include all attempts, not just successes
|
||||
|
||||
### Mistake 2: Availability Bias
|
||||
**What it looks like:** Reference class = Recent, memorable cases
|
||||
**Why it's wrong:** Recent events are overweighted in memory
|
||||
**Fix:** Use systematic data collection, not what comes to mind
|
||||
|
||||
### Mistake 3: Confirmation Bias
|
||||
**What it looks like:** Choosing reference class that supports your prior belief
|
||||
**Why it's wrong:** You're reverse-engineering the answer
|
||||
**Fix:** Choose reference class BEFORE looking at base rate
|
||||
|
||||
### Mistake 4: Overfitting to Irrelevant Details
|
||||
**What it looks like:** "Female, left-handed CEOs who went to Ivy League schools"
|
||||
**Why it's wrong:** Most details don't matter; you're adding noise
|
||||
**Fix:** Only include features that causally affect outcomes
|
||||
|
||||
### Mistake 5: Ignoring Time Decay
|
||||
**What it looks like:** Using data from 1970s for 2024 prediction
|
||||
**Why it's wrong:** World has changed
|
||||
**Fix:** Weight recent data more heavily, or segment by era
|
||||
|
||||
---
|
||||
|
||||
## Reference Class Hierarchy
|
||||
|
||||
### Start Specific, Widen as Needed
|
||||
|
||||
**Level 1: Maximally Specific** (Try this first)
|
||||
- Example: "Seed-stage B2B cybersecurity SaaS in US, 2020-2024"
|
||||
- Check for data → If N > 30, use this
|
||||
|
||||
**Level 2: Drop One Feature** (If L1 has no data)
|
||||
- Example: "Seed-stage B2B SaaS in US, 2020-2024" (removed "cybersecurity")
|
||||
- Check for data → If N > 30, use this
|
||||
|
||||
**Level 3: Drop Two Features** (If L2 has no data)
|
||||
- Example: "Seed-stage B2B SaaS, 2020-2024" (removed "US")
|
||||
- Check for data → If N > 30, use this
|
||||
|
||||
**Level 4: Generic Category** (Last resort)
|
||||
- Example: "Seed-stage startups"
|
||||
- Always has data, but high variance
|
||||
|
||||
**Rule:** Use the most specific level that still gives you N ≥ 30 data points.
|
||||
|
||||
---
|
||||
|
||||
## Checklist: Is This a Good Reference Class?
|
||||
|
||||
Use this to validate your choice:
|
||||
|
||||
- [ ] **Sample size** ≥ 30 historical cases
|
||||
- [ ] **Homogeneity**: Members are similar enough that averaging makes sense
|
||||
- [ ] **Relevance**: Data is from appropriate time period (last 10 years preferred)
|
||||
- [ ] **Specificity**: Class is narrow enough to be meaningful
|
||||
- [ ] **Data availability**: Base rate is published or calculable
|
||||
- [ ] **No survivorship bias**: Includes failures, not just successes
|
||||
- [ ] **No cherry-picking**: Class chosen before looking at base rate
|
||||
- [ ] **Causal relevance**: Features included actually affect outcomes
|
||||
|
||||
**If ≥ 6 checked:** Good reference class
|
||||
**If 4-5 checked:** Acceptable, but increase uncertainty
|
||||
**If < 4 checked:** Find a better reference class
|
||||
|
||||
---
|
||||
|
||||
## Advanced: Bayesian Reference Class Selection
|
||||
|
||||
When you have multiple plausible reference classes, you can use Bayesian reasoning:
|
||||
|
||||
### Step 1: Prior Distribution Over Classes
|
||||
Assign probability to each reference class being the "true" one
|
||||
|
||||
**Example:**
|
||||
- P(Class = "B2B SaaS") = 60%
|
||||
- P(Class = "All SaaS") = 30%
|
||||
- P(Class = "All startups") = 10%
|
||||
|
||||
### Step 2: Likelihood of Observed Features
|
||||
How likely is this specific case under each class?
|
||||
|
||||
### Step 3: Posterior Distribution
|
||||
Update class probabilities using Bayes' rule
|
||||
|
||||
### Step 4: Weighted Base Rate
|
||||
Average base rates weighted by posterior probabilities
|
||||
|
||||
**This is advanced.** Default to the systematic selection method above unless you have strong quantitative skills.
|
||||
|
||||
---
|
||||
|
||||
## Practical Workflow
|
||||
|
||||
### Quick Protocol (5 minutes)
|
||||
|
||||
1. **Name the core type:** "This is a [X]"
|
||||
2. **Add 2-3 specificity layers:** Stage, category, geography
|
||||
3. **Google the base rate:** "[Reference class] success rate"
|
||||
4. **Sanity check:** Does N > 30? Are members similar?
|
||||
5. **Use it:** This is your starting probability
|
||||
|
||||
### Rigorous Protocol (30 minutes)
|
||||
|
||||
1. Systematic selection (Steps 1-4 above)
|
||||
2. Similarity scoring for validation
|
||||
3. Check for structural regime changes
|
||||
4. Consider multiple reference classes
|
||||
5. Weighted ensemble if multiple classes
|
||||
6. Document assumptions and limitations
|
||||
|
||||
---
|
||||
|
||||
**Return to:** [Main Skill](../SKILL.md#interactive-menu)
|
||||
Reference in New Issue
Block a user