11 KiB
Common Pitfalls in Reference Class Forecasting
The Traps That Even Experts Fall Into
1. Base Rate Neglect
What It Is
Ignoring statistical baselines in favor of specific case details.
Example: The Taxi Problem
Scenario: A taxi was involved in a hit-and-run. Witness says it was Blue.
- 85% of taxis in city are Green
- 15% of taxis are Blue
- Witness is 80% reliable in identifying colors
Most people say: 80% chance it was Blue (trusting the witness)
Correct answer:
P(Blue | Witness says Blue) = 41%
Why? The base rate (15% Blue taxis) dominates the witness reliability.
In Forecasting
The Trap:
- Focusing on compelling story about this startup
- Ignoring that 90% of startups fail
The Fix:
- Start with 90% failure rate
- Then adjust based on specific evidence
- Weight base rate heavily (especially when base rate is extreme)
2. "This Time Is Different" Bias
What It Is
Belief that current situation is unique and historical patterns don't apply.
Classic Examples
Financial Markets:
- "This bubble is different" (said before every bubble burst)
- "This time housing prices won't fall"
- "This cryptocurrency is different from previous scams"
Technology:
- "This social network will overtake Facebook" (dozens have failed)
- "This time AI will achieve AGI" (predicted for 60 years)
Startups:
- "Our team is better than average" (90% of founders say this)
- "Our market timing is perfect" (survivorship bias)
Why It Happens
- Narrative fallacy: Humans construct unique stories for everything
- Overconfidence: We overestimate our ability to judge uniqueness
- Availability bias: Recent cases feel more salient than statistical patterns
- Ego: Admitting you're "average" feels bad
The Fix: The Reversal Test
Question: "If someone else claimed their case was unique for the same reasons I'm claiming, would I believe them?"
If NO → You're applying special pleading to yourself
The Reality
- ~95% of cases that feel unique are actually normal
- The 5% that ARE unique still have some reference class (just more abstract)
- Uniqueness is a matter of degree, not binary
3. Overfitting to Small Samples
What It Is
Drawing strong conclusions from limited data points.
Example: The Hot Hand Fallacy
Basketball: Player makes 3 shots in a row → "Hot hand!" Reality: Random sequences produce streaks. 3-in-a-row is not evidence of skill change.
In Reference Classes
The Trap:
- Finding N = 5 companies similar to yours
- All 5 succeeded
- Concluding 100% success rate
Why It's Wrong:
- Small samples have high variance
- Sampling error dominates signal
- Regression to the mean will occur
The Fix: Minimum Sample Size Rule
Minimum: N ≥ 30 for meaningful statistics
- N < 10: No statistical power
- N = 10-30: Weak signal, wide confidence intervals
- N > 30: Acceptable
- N > 100: Good
- N > 1000: Excellent
If N < 30:
- Widen reference class to get more data
- Acknowledge high uncertainty (wide CI)
- Don't trust extreme base rates from small samples
4. Survivorship Bias
What It Is
Only looking at cases that "survived" and ignoring failures.
Classic Example: WW2 Bomber Armor
Observation: Returning bombers had bullet holes in wings and tail Naive conclusion: Reinforce wings and tail Truth: Planes shot in the engine didn't return (survivorship bias) Correct action: Reinforce engines
In Reference Classes
The Trap:
- "Reference class = Successful tech companies" (Apple, Google, Microsoft)
- Base rate = 100% success
- Ignoring all the failed companies
Why It's Wrong: You're only looking at winners, which biases the base rate upward.
The Fix: Include All Attempts
Correct reference class:
- "All companies that TRIED to do X"
- Not "All companies that SUCCEEDED at X"
Example:
- Wrong: "Companies like Facebook" → 100% success
- Right: "Social network startups 2004-2010" → 5% success
5. Regression to the Mean Neglect
What It Is
Failing to expect extreme outcomes to return to average.
The Sports Illustrated Jinx
Observation: Athletes on SI cover often perform worse after Naive explanation: Curse, pressure, jinx Truth: They were on cover BECAUSE of extreme (lucky) performance → Regression to mean is inevitable
In Forecasting
The Trap:
- Company has amazing quarter → Predict continued amazing performance
- Ignoring: Some of that was luck, which won't repeat
The Fix: Regression Formula
Predicted = Mean + r × (Observed - Mean)
Where r = skill/(skill + luck)
If 50% luck, 50% skill:
Predicted = Halfway between observed and mean
Examples
Startup with $10M ARR in Year 1:
- Mean for seed startups: $500K ARR
- Observed: $10M (extreme!)
- Likely some luck (viral moment, lucky timing)
- Predicted Year 2: Somewhere between $500K and $10M, not $20M
Student who aces one test:
- Don't predict straight A's (could have been easy test, lucky guessing)
- Predict performance closer to class average
6. Availability Bias in Class Selection
What It Is
Choosing reference class based on what's memorable, not what's statistically valid.
Example: Terrorism Risk
After 9/11:
- Availability: Terrorism is extremely salient
- Many people chose reference class: "Major terrorist attacks on US"
- Base rate felt high (because it was memorable)
Reality:
- Correct reference class: "Risk of death from various causes"
- Terrorism: ~0.0001% annual risk
- Car accidents: ~0.01% annual risk (100× higher)
The Fix
- Don't use "what comes to mind" as reference class
- Use systematic search for historical data
- Weight by frequency, not memorability
7. Confirmation Bias in Reference Class Selection
What It Is
Choosing a reference class that supports your pre-existing belief.
Example: Political Predictions
If you want candidate X to win:
- Choose reference class: "Elections where candidate had high favorability"
- Ignore reference class: "Elections in this demographic"
If you want candidate X to lose:
- Choose reference class: "Candidates with this scandal type"
- Ignore reference class: "Incumbents in strong economy"
The Fix: Blind Selection
Process:
- Choose reference class BEFORE you know the base rate
- Write down selection criteria
- Only then look up the base rate
- Stick with it even if you don't like the answer
8. Ignoring Time Decay
What It Is
Using old data when conditions have changed.
Example: Newspaper Industry
Reference class: "Newspaper companies, 1950-2000" Base rate: ~90% profitability
Problem: Internet fundamentally changed the industry Reality in 2010s: ~10% profitability
When Time Decay Matters
High decay (don't use old data):
- Technology industries (5-year half-life)
- Regulatory changes (e.g., crypto)
- Structural market shifts (e.g., remote work post-COVID)
Low decay (old data OK):
- Human behavior (still same evolutionary psychology)
- Physical laws (still same physics)
- Basic business economics (margins, etc.)
The Fix
- Use last 5-10 years as default
- Segment by era if structural change occurred
- Weight recent data more heavily
9. Causation Confusion
What It Is
Including features in reference class that correlate but don't cause outcomes.
Example: Ice Cream and Drowning
Observation: Ice cream sales correlate with drowning deaths Naive reference class: "Days with high ice cream sales" Base rate: Higher drowning rate
Problem: Ice cream doesn't cause drowning Confound: Summer weather causes both
In Forecasting
The Trap: Adding irrelevant details to reference class:
- "Startups founded by left-handed CEOs"
- "Projects started on Tuesdays"
- "People born under Scorpio"
Why It's Wrong: These features don't causally affect outcomes, they just add noise.
The Fix: Causal Screening
Ask: "Does this feature cause different outcomes, or just correlate?"
Include if causal:
- Business model (causes different unit economics)
- Market size (causes different TAM)
- Technology maturity (causes different risk)
Exclude if just correlation:
- Founder's birthday
- Office location aesthetics
- Random demographic details
10. Narrow Framing
What It Is
Defining the reference class too narrowly around the specific case.
Example
You're forecasting: "Will this specific bill pass Congress?"
Too narrow: "Bills with this exact text in this specific session" → N = 1, no data
Better: "Bills of this type in similar political climate" → N = 50, usable data
The Fix
If you can't find data, your reference class is too narrow. Go up one level of abstraction.
11. Extremeness Aversion
What It Is
Reluctance to use extreme base rates (close to 0% or 100%).
Psychological Bias
People feel uncomfortable saying:
- 5% chance
- 95% chance
They retreat to:
- 20% ("to be safe")
- 80% ("to be safe")
This is wrong. If the base rate is 5%, START at 5%.
When Extreme Base Rates Are Correct
- Drug development: 5-10% reach market
- Lottery tickets: 0.0001% chance
- Sun rising tomorrow: 99.9999% chance
The Fix
- Trust the base rate even if it feels extreme
- Extreme base rates are still better than gut feelings
- Use inside view to adjust, but don't automatically moderate
12. Scope Insensitivity
What It Is
Not adjusting forecasts proportionally to scale.
Example: Charity Donations
Study: People willing to pay same amount to save:
- 2,000 birds
- 20,000 birds
- 200,000 birds
Problem: Scope doesn't change emotional response
In Reference Classes
The Trap:
- "Startup raising $1M" feels same as "Startup raising $10M"
- Reference class doesn't distinguish by scale
- Base rate is wrong
The Fix:
- Segment reference class by scale
- "$1M raises" have different success rate than "$10M raises"
- Make sure scope is specific
Avoiding Pitfalls: Quick Checklist
Before finalizing your reference class, check:
- Not neglecting base rate - Started with statistics, not story
- Not claiming uniqueness - Passed reversal test
- Sample size ≥ 30 - Not overfitting to small sample
- Included failures - No survivorship bias
- Expected regression - Extreme outcomes won't persist
- Systematic selection - Not using availability bias
- Chose class first - Not confirmation bias
- Recent data - Not ignoring time decay
- Causal features only - Not including random correlations
- Right level of abstraction - Not too narrow
- OK with extremes - Not moderating extreme base rates
- Scope-specific - Scale is accounted for
Return to: Main Skill