# Common Pitfalls in Reference Class Forecasting ## The Traps That Even Experts Fall Into --- ## 1. Base Rate Neglect ### What It Is Ignoring statistical baselines in favor of specific case details. ### Example: The Taxi Problem **Scenario:** A taxi was involved in a hit-and-run. Witness says it was Blue. - 85% of taxis in city are Green - 15% of taxis are Blue - Witness is 80% reliable in identifying colors **Most people say:** 80% chance it was Blue (trusting the witness) **Correct answer:** ``` P(Blue | Witness says Blue) = 41% ``` Why? The **base rate** (15% Blue taxis) dominates the witness reliability. ### In Forecasting **The Trap:** - Focusing on compelling story about this startup - Ignoring that 90% of startups fail **The Fix:** - Start with 90% failure rate - Then adjust based on specific evidence - Weight base rate heavily (especially when base rate is extreme) --- ## 2. "This Time Is Different" Bias ### What It Is Belief that current situation is unique and historical patterns don't apply. ### Classic Examples **Financial Markets:** - "This bubble is different" (said before every bubble burst) - "This time housing prices won't fall" - "This cryptocurrency is different from previous scams" **Technology:** - "This social network will overtake Facebook" (dozens have failed) - "This time AI will achieve AGI" (predicted for 60 years) **Startups:** - "Our team is better than average" (90% of founders say this) - "Our market timing is perfect" (survivorship bias) ### Why It Happens 1. **Narrative fallacy:** Humans construct unique stories for everything 2. **Overconfidence:** We overestimate our ability to judge uniqueness 3. **Availability bias:** Recent cases feel more salient than statistical patterns 4. **Ego:** Admitting you're "average" feels bad ### The Fix: The Reversal Test **Question:** "If someone else claimed their case was unique for the same reasons I'm claiming, would I believe them?" **If NO → You're applying special pleading to yourself** ### The Reality - ~95% of cases that feel unique are actually normal - The 5% that ARE unique still have some reference class (just more abstract) - Uniqueness is a matter of degree, not binary --- ## 3. Overfitting to Small Samples ### What It Is Drawing strong conclusions from limited data points. ### Example: The Hot Hand Fallacy **Basketball:** Player makes 3 shots in a row → "Hot hand!" **Reality:** Random sequences produce streaks. 3-in-a-row is not evidence of skill change. ### In Reference Classes **The Trap:** - Finding N = 5 companies similar to yours - All 5 succeeded - Concluding 100% success rate **Why It's Wrong:** - Small samples have high variance - Sampling error dominates signal - Regression to the mean will occur ### The Fix: Minimum Sample Size Rule **Minimum:** N ≥ 30 for meaningful statistics - N < 10: No statistical power - N = 10-30: Weak signal, wide confidence intervals - N > 30: Acceptable - N > 100: Good - N > 1000: Excellent **If N < 30:** 1. Widen reference class to get more data 2. Acknowledge high uncertainty (wide CI) 3. Don't trust extreme base rates from small samples --- ## 4. Survivorship Bias ### What It Is Only looking at cases that "survived" and ignoring failures. ### Classic Example: WW2 Bomber Armor **Observation:** Returning bombers had bullet holes in wings and tail **Naive conclusion:** Reinforce wings and tail **Truth:** Planes shot in the engine didn't return (survivorship bias) **Correct action:** Reinforce engines ### In Reference Classes **The Trap:** - "Reference class = Successful tech companies" (Apple, Google, Microsoft) - Base rate = 100% success - Ignoring all the failed companies **Why It's Wrong:** You're only looking at winners, which biases the base rate upward. ### The Fix: Include All Attempts **Correct reference class:** - "All companies that TRIED to do X" - Not "All companies that SUCCEEDED at X" **Example:** - Wrong: "Companies like Facebook" → 100% success - Right: "Social network startups 2004-2010" → 5% success --- ## 5. Regression to the Mean Neglect ### What It Is Failing to expect extreme outcomes to return to average. ### The Sports Illustrated Jinx **Observation:** Athletes on SI cover often perform worse after **Naive explanation:** Curse, pressure, jinx **Truth:** They were on cover BECAUSE of extreme (lucky) performance → Regression to mean is inevitable ### In Forecasting **The Trap:** - Company has amazing quarter → Predict continued amazing performance - Ignoring: Some of that was luck, which won't repeat **The Fix: Regression Formula** ``` Predicted = Mean + r × (Observed - Mean) ``` Where `r` = skill/(skill + luck) **If 50% luck, 50% skill:** ``` Predicted = Halfway between observed and mean ``` ### Examples **Startup with $10M ARR in Year 1:** - Mean for seed startups: $500K ARR - Observed: $10M (extreme!) - Likely some luck (viral moment, lucky timing) - Predicted Year 2: Somewhere between $500K and $10M, not $20M **Student who aces one test:** - Don't predict straight A's (could have been easy test, lucky guessing) - Predict performance closer to class average --- ## 6. Availability Bias in Class Selection ### What It Is Choosing reference class based on what's memorable, not what's statistically valid. ### Example: Terrorism Risk **After 9/11:** - Availability: Terrorism is extremely salient - Many people chose reference class: "Major terrorist attacks on US" - Base rate felt high (because it was memorable) **Reality:** - Correct reference class: "Risk of death from various causes" - Terrorism: ~0.0001% annual risk - Car accidents: ~0.01% annual risk (100× higher) ### The Fix 1. **Don't use "what comes to mind" as reference class** 2. **Use systematic search** for historical data 3. **Weight by frequency, not memorability** --- ## 7. Confirmation Bias in Reference Class Selection ### What It Is Choosing a reference class that supports your pre-existing belief. ### Example: Political Predictions **If you want candidate X to win:** - Choose reference class: "Elections where candidate had high favorability" - Ignore reference class: "Elections in this demographic" **If you want candidate X to lose:** - Choose reference class: "Candidates with this scandal type" - Ignore reference class: "Incumbents in strong economy" ### The Fix: Blind Selection **Process:** 1. Choose reference class BEFORE you know the base rate 2. Write down selection criteria 3. Only then look up the base rate 4. Stick with it even if you don't like the answer --- ## 8. Ignoring Time Decay ### What It Is Using old data when conditions have changed. ### Example: Newspaper Industry **Reference class:** "Newspaper companies, 1950-2000" **Base rate:** ~90% profitability **Problem:** Internet fundamentally changed the industry **Reality in 2010s:** ~10% profitability ### When Time Decay Matters **High decay (don't use old data):** - Technology industries (5-year half-life) - Regulatory changes (e.g., crypto) - Structural market shifts (e.g., remote work post-COVID) **Low decay (old data OK):** - Human behavior (still same evolutionary psychology) - Physical laws (still same physics) - Basic business economics (margins, etc.) ### The Fix 1. **Use last 5-10 years** as default 2. **Segment by era** if structural change occurred 3. **Weight recent data more heavily** --- ## 9. Causation Confusion ### What It Is Including features in reference class that correlate but don't cause outcomes. ### Example: Ice Cream and Drowning **Observation:** Ice cream sales correlate with drowning deaths **Naive reference class:** "Days with high ice cream sales" **Base rate:** Higher drowning rate **Problem:** Ice cream doesn't cause drowning **Confound:** Summer weather causes both ### In Forecasting **The Trap:** Adding irrelevant details to reference class: - "Startups founded by left-handed CEOs" - "Projects started on Tuesdays" - "People born under Scorpio" **Why It's Wrong:** These features don't causally affect outcomes, they just add noise. ### The Fix: Causal Screening **Ask:** "Does this feature **cause** different outcomes, or just **correlate**?" **Include if causal:** - Business model (causes different unit economics) - Market size (causes different TAM) - Technology maturity (causes different risk) **Exclude if just correlation:** - Founder's birthday - Office location aesthetics - Random demographic details --- ## 10. Narrow Framing ### What It Is Defining the reference class too narrowly around the specific case. ### Example **You're forecasting:** "Will this specific bill pass Congress?" **Too narrow:** "Bills with this exact text in this specific session" → N = 1, no data **Better:** "Bills of this type in similar political climate" → N = 50, usable data ### The Fix If you can't find data, your reference class is too narrow. Go up one level of abstraction. --- ## 11. Extremeness Aversion ### What It Is Reluctance to use extreme base rates (close to 0% or 100%). ### Psychological Bias **People feel uncomfortable saying:** - 5% chance - 95% chance **They retreat to:** - 20% ("to be safe") - 80% ("to be safe") **This is wrong.** If the base rate is 5%, START at 5%. ### When Extreme Base Rates Are Correct - Drug development: 5-10% reach market - Lottery tickets: 0.0001% chance - Sun rising tomorrow: 99.9999% chance ### The Fix - **Trust the base rate** even if it feels extreme - Extreme base rates are still better than gut feelings - Use inside view to adjust, but don't automatically moderate --- ## 12. Scope Insensitivity ### What It Is Not adjusting forecasts proportionally to scale. ### Example: Charity Donations **Study:** People willing to pay same amount to save: - 2,000 birds - 20,000 birds - 200,000 birds **Problem:** Scope doesn't change emotional response ### In Reference Classes **The Trap:** - "Startup raising $1M" feels same as "Startup raising $10M" - Reference class doesn't distinguish by scale - Base rate is wrong **The Fix:** - Segment reference class by scale - "$1M raises" have different success rate than "$10M raises" - Make sure scope is specific --- ## Avoiding Pitfalls: Quick Checklist Before finalizing your reference class, check: - [ ] **Not neglecting base rate** - Started with statistics, not story - [ ] **Not claiming uniqueness** - Passed reversal test - [ ] **Sample size ≥ 30** - Not overfitting to small sample - [ ] **Included failures** - No survivorship bias - [ ] **Expected regression** - Extreme outcomes won't persist - [ ] **Systematic selection** - Not using availability bias - [ ] **Chose class first** - Not confirmation bias - [ ] **Recent data** - Not ignoring time decay - [ ] **Causal features only** - Not including random correlations - [ ] **Right level of abstraction** - Not too narrow - [ ] **OK with extremes** - Not moderating extreme base rates - [ ] **Scope-specific** - Scale is accounted for --- **Return to:** [Main Skill](../SKILL.md#interactive-menu)