Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,338 @@
---
name: reference-class-forecasting
description: Use when starting a forecast to establish a statistical baseline (base rate) before analyzing specifics. Invoke when need to anchor predictions in historical reality, avoid "this time is different" bias, or establish outside view before inside view analysis. Use when user mentions base rates, reference classes, outside view, or starting a new prediction.
---
# Reference Class Forecasting
## Table of Contents
- [What is Reference Class Forecasting?](#what-is-reference-class-forecasting)
- [When to Use This Skill](#when-to-use-this-skill)
- [Interactive Menu](#interactive-menu)
- [Quick Reference](#quick-reference)
- [Resource Files](#resource-files)
---
## What is Reference Class Forecasting?
Reference class forecasting is the practice of anchoring predictions in **historical reality** by identifying a class of similar past events and using their statistical frequency as a starting point. This is the "Outside View" - looking at what usually happens to things like this, before getting distracted by the specific details of "this case."
**Core Principle:** Assume this event is **average** until you have specific evidence proving otherwise.
**Why It Matters:**
- Defeats "inside view" bias (thinking your case is unique)
- Prevents base rate neglect (ignoring statistical baselines)
- Provides objective anchor before subjective analysis
- Forces humility and statistical thinking
---
## When to Use This Skill
Use this skill when:
- **Starting any forecast** - Establish base rate FIRST
- **Someone says "this time is different"** - Test if it really is
- **Making predictions about success/failure** - Find historical frequencies
- **Evaluating startup/project outcomes** - Anchor in class statistics
- **Challenged by confident predictions** - Ground in reality
- **Before detailed analysis** - Get outside view baseline
Do NOT use when:
- Event has literally never happened (novel situation)
- Working with deterministic physical laws
- Pure chaos with no patterns
---
## Interactive Menu
**What would you like to do?**
### Core Workflows
**1. [Find My Base Rate](#1-find-my-base-rate)** - Identify reference class and get statistical baseline
- Guided process to select correct reference class
- Search strategies for finding historical frequencies
- Validation that you have the right anchor
**2. [Test "This Time Is Different"](#2-test-this-time-is-different)** - Challenge uniqueness claims
- Reversal test for uniqueness bias
- Similarity matching framework
- Burden of proof calculator
**3. [Calculate Funnel Base Rates](#3-calculate-funnel-base-rates)** - Multi-stage probability chains
- When no single base rate exists
- Sequential probability modeling
- Product rule for compound events
**4. [Validate My Reference Class](#4-validate-my-reference-class)** - Ensure you chose the right comparison set
- Too broad vs too narrow test
- Homogeneity check
- Sample size evaluation
**5. [Learn the Framework](#5-learn-the-framework)** - Deep dive into methodology
- Read [Outside View Principles](resources/outside-view-principles.md)
- Read [Reference Class Selection Guide](resources/reference-class-selection.md)
- Read [Common Pitfalls](resources/common-pitfalls.md)
**6. Exit** - Return to main forecasting workflow
---
## 1. Find My Base Rate
**Let's establish your statistical baseline.**
### Step 1: What are you forecasting?
Tell me the specific event or outcome you're predicting.
**Example prompts:**
- "Will this startup succeed?"
- "Will this bill pass Congress?"
- "Will this project launch on time?"
---
### Step 2: Identify the Reference Class
I'll help you identify what bucket this belongs to.
**Framework:**
- **Too broad:** "All companies" → meaningless
- **Just right:** "Seed-stage B2B SaaS startups in fintech"
- **Too narrow:** "Companies founded by people named Steve in 2024" → no data
**Key Questions:**
1. What type of entity is this? (company, bill, project, person, etc.)
2. What stage/size/category?
3. What industry/domain?
4. What time period is relevant?
I'll work with you to refine this until we have a specific, searchable class.
---
### Step 3: Search for Historical Data
I'll help you find the base rate using:
- **Web search** for published statistics
- **Academic studies** on success rates
- **Government/industry reports**
- **Proxy metrics** if direct data unavailable
**Search Strategy:**
```
"historical success rate of [reference class]"
"[reference class] failure statistics"
"[reference class] survival rate"
"what percentage of [reference class]"
```
---
### Step 4: Set Your Anchor
Once we find the base rate, that becomes your **starting probability**.
**The Rule:**
> You are NOT allowed to move from this base rate until you have specific,
> evidence-based reasons in your "inside view" analysis.
**Default anchors if no data found:**
- Novel innovation: 10-20% (most innovations fail)
- Established industry: 50% (uncertain)
- Regulated/proven process: 70-80% (systems work)
**Next:** Return to [menu](#interactive-menu) or proceed to inside view analysis.
---
## 2. Test "This Time Is Different"
**Challenge uniqueness bias.**
When someone (including yourself) believes "this case is special," we need to stress-test that belief.
### The Uniqueness Audit
**Question 1: Similarity Matching**
- What are 5 historical cases that are most similar to this one?
- For each, what was the outcome?
- How is your case materially different from these?
**Question 2: The Reversal Test**
- If someone claimed a different case was "unique" for the same reasons you're claiming, would you accept it?
- Are you applying special pleading?
**Question 3: Burden of Proof**
The base rate says [X]%. You claim it should be [Y]%.
Calculate the gap: `|Y - X|`
**Required evidence strength:**
- Gap < 10%: Minimal evidence needed
- Gap 10-30%: Moderate evidence needed (2-3 specific factors)
- Gap > 30%: Extraordinary evidence needed (multiple independent strong signals)
### Output
I'll tell you:
1. Whether "this time is different" is justified
2. How much you can reasonably adjust from the base rate
3. What evidence would be needed to justify larger moves
**Next:** Return to [menu](#interactive-menu)
---
## 3. Calculate Funnel Base Rates
**For multi-stage processes without a single base rate.**
### When to Use
- No direct statistic exists (e.g., "success rate of X")
- Event requires multiple sequential steps
- Each stage has independent probabilities
### The Funnel Method
**Example: "Will Bill X become law?"**
No direct data on "Bill X success rate," but we can model the funnel:
1. **Stage 1:** Bills introduced → Bills that reach committee
- P(committee | introduced) = ?
2. **Stage 2:** Bills in committee → Bills that reach floor vote
- P(floor | committee) = ?
3. **Stage 3:** Bills voted on → Bills that pass
- P(pass | floor vote) = ?
**Final Base Rate:**
```
P(law) = P(committee) × P(floor) × P(pass)
```
### Process
I'll help you:
1. **Decompose** the event into sequential stages
2. **Search** for statistics on each stage
3. **Multiply** probabilities using the product rule
4. **Validate** the model (are stages truly independent?)
### Common Funnels
- Startup success: Seed → Series A → Profitability → Exit
- Drug approval: Discovery → Trials → FDA → Market
- Project delivery: Planning → Development → Testing → Launch
**Next:** Return to [menu](#interactive-menu)
---
## 4. Validate My Reference Class
**Ensure you chose the right comparison set.**
### The Three Tests
**Test 1: Homogeneity**
- Are the members of this class actually similar enough?
- Is there high variance in outcomes?
- Should you subdivide further?
**Example:** "Tech startups" is too broad (consumer vs B2B vs hardware are very different). Subdivide.
---
**Test 2: Sample Size**
- Do you have enough historical cases?
- Minimum: 20-30 cases for meaningful statistics
- If N < 20: Widen the class or acknowledge high uncertainty
---
**Test 3: Relevance**
- Have conditions changed since the historical data?
- Are there structural differences (regulation, technology, market)?
- Time decay: Data from >10 years ago may be stale
### Validation Checklist
I'll walk you through:
- [ ] Class has 20+ historical examples
- [ ] Members are reasonably homogeneous
- [ ] Data is from relevant time period
- [ ] No major structural changes since data collection
- [ ] Class is specific enough to be meaningful
- [ ] Class is broad enough to have data
**Output:** Confidence level in your reference class (High/Medium/Low)
**Next:** Return to [menu](#interactive-menu)
---
## 5. Learn the Framework
**Deep dive into the methodology.**
### Resource Files
📄 **[Outside View Principles](resources/outside-view-principles.md)**
- Statistical thinking vs narrative thinking
- Why the outside view beats experts
- Kahneman's planning fallacy research
- When outside view fails
📄 **[Reference Class Selection Guide](resources/reference-class-selection.md)**
- Systematic method for choosing comparison sets
- Balancing specificity vs data availability
- Similarity metrics and matching
- Edge cases and judgment calls
📄 **[Common Pitfalls](resources/common-pitfalls.md)**
- Base rate neglect examples
- "This time is different" bias
- Overfitting to small samples
- Ignoring regression to the mean
- Availability bias in class selection
**Next:** Return to [menu](#interactive-menu)
---
## Quick Reference
### The Outside View Commandments
1. **Base Rate First:** Establish statistical baseline BEFORE analyzing specifics
2. **Assume Average:** Treat case as typical until proven otherwise
3. **Burden of Proof:** Large deviations from base rate require strong evidence
4. **Class Precision:** Reference class should be specific but data-rich
5. **No Narratives:** Resist compelling stories; trust frequencies
### One-Sentence Summary
> Find what usually happens to things like this, start there, and only move with evidence.
### Integration with Other Skills
- **Before:** Use `estimation-fermi` if you need to calculate base rate from components
- **After:** Use `bayesian-reasoning-calibration` to update from base rate with new evidence
- **Companion:** Use `scout-mindset-bias-check` to validate you're not cherry-picking the reference class
---
## Resource Files
📁 **resources/**
- [outside-view-principles.md](resources/outside-view-principles.md) - Theory and research
- [reference-class-selection.md](resources/reference-class-selection.md) - Systematic selection method
- [common-pitfalls.md](resources/common-pitfalls.md) - What to avoid
---
**Ready to start? Choose a number from the [menu](#interactive-menu) above.**

View File

@@ -0,0 +1,414 @@
# Common Pitfalls in Reference Class Forecasting
## The Traps That Even Experts Fall Into
---
## 1. Base Rate Neglect
### What It Is
Ignoring statistical baselines in favor of specific case details.
### Example: The Taxi Problem
**Scenario:** A taxi was involved in a hit-and-run. Witness says it was Blue.
- 85% of taxis in city are Green
- 15% of taxis are Blue
- Witness is 80% reliable in identifying colors
**Most people say:** 80% chance it was Blue (trusting the witness)
**Correct answer:**
```
P(Blue | Witness says Blue) = 41%
```
Why? The **base rate** (15% Blue taxis) dominates the witness reliability.
### In Forecasting
**The Trap:**
- Focusing on compelling story about this startup
- Ignoring that 90% of startups fail
**The Fix:**
- Start with 90% failure rate
- Then adjust based on specific evidence
- Weight base rate heavily (especially when base rate is extreme)
---
## 2. "This Time Is Different" Bias
### What It Is
Belief that current situation is unique and historical patterns don't apply.
### Classic Examples
**Financial Markets:**
- "This bubble is different" (said before every bubble burst)
- "This time housing prices won't fall"
- "This cryptocurrency is different from previous scams"
**Technology:**
- "This social network will overtake Facebook" (dozens have failed)
- "This time AI will achieve AGI" (predicted for 60 years)
**Startups:**
- "Our team is better than average" (90% of founders say this)
- "Our market timing is perfect" (survivorship bias)
### Why It Happens
1. **Narrative fallacy:** Humans construct unique stories for everything
2. **Overconfidence:** We overestimate our ability to judge uniqueness
3. **Availability bias:** Recent cases feel more salient than statistical patterns
4. **Ego:** Admitting you're "average" feels bad
### The Fix: The Reversal Test
**Question:** "If someone else claimed their case was unique for the same reasons I'm claiming, would I believe them?"
**If NO → You're applying special pleading to yourself**
### The Reality
- ~95% of cases that feel unique are actually normal
- The 5% that ARE unique still have some reference class (just more abstract)
- Uniqueness is a matter of degree, not binary
---
## 3. Overfitting to Small Samples
### What It Is
Drawing strong conclusions from limited data points.
### Example: The Hot Hand Fallacy
**Basketball:** Player makes 3 shots in a row → "Hot hand!"
**Reality:** Random sequences produce streaks. 3-in-a-row is not evidence of skill change.
### In Reference Classes
**The Trap:**
- Finding N = 5 companies similar to yours
- All 5 succeeded
- Concluding 100% success rate
**Why It's Wrong:**
- Small samples have high variance
- Sampling error dominates signal
- Regression to the mean will occur
### The Fix: Minimum Sample Size Rule
**Minimum:** N ≥ 30 for meaningful statistics
- N < 10: No statistical power
- N = 10-30: Weak signal, wide confidence intervals
- N > 30: Acceptable
- N > 100: Good
- N > 1000: Excellent
**If N < 30:**
1. Widen reference class to get more data
2. Acknowledge high uncertainty (wide CI)
3. Don't trust extreme base rates from small samples
---
## 4. Survivorship Bias
### What It Is
Only looking at cases that "survived" and ignoring failures.
### Classic Example: WW2 Bomber Armor
**Observation:** Returning bombers had bullet holes in wings and tail
**Naive conclusion:** Reinforce wings and tail
**Truth:** Planes shot in the engine didn't return (survivorship bias)
**Correct action:** Reinforce engines
### In Reference Classes
**The Trap:**
- "Reference class = Successful tech companies" (Apple, Google, Microsoft)
- Base rate = 100% success
- Ignoring all the failed companies
**Why It's Wrong:**
You're only looking at winners, which biases the base rate upward.
### The Fix: Include All Attempts
**Correct reference class:**
- "All companies that TRIED to do X"
- Not "All companies that SUCCEEDED at X"
**Example:**
- Wrong: "Companies like Facebook" → 100% success
- Right: "Social network startups 2004-2010" → 5% success
---
## 5. Regression to the Mean Neglect
### What It Is
Failing to expect extreme outcomes to return to average.
### The Sports Illustrated Jinx
**Observation:** Athletes on SI cover often perform worse after
**Naive explanation:** Curse, pressure, jinx
**Truth:** They were on cover BECAUSE of extreme (lucky) performance → Regression to mean is inevitable
### In Forecasting
**The Trap:**
- Company has amazing quarter → Predict continued amazing performance
- Ignoring: Some of that was luck, which won't repeat
**The Fix: Regression Formula**
```
Predicted = Mean + r × (Observed - Mean)
```
Where `r` = skill/(skill + luck)
**If 50% luck, 50% skill:**
```
Predicted = Halfway between observed and mean
```
### Examples
**Startup with $10M ARR in Year 1:**
- Mean for seed startups: $500K ARR
- Observed: $10M (extreme!)
- Likely some luck (viral moment, lucky timing)
- Predicted Year 2: Somewhere between $500K and $10M, not $20M
**Student who aces one test:**
- Don't predict straight A's (could have been easy test, lucky guessing)
- Predict performance closer to class average
---
## 6. Availability Bias in Class Selection
### What It Is
Choosing reference class based on what's memorable, not what's statistically valid.
### Example: Terrorism Risk
**After 9/11:**
- Availability: Terrorism is extremely salient
- Many people chose reference class: "Major terrorist attacks on US"
- Base rate felt high (because it was memorable)
**Reality:**
- Correct reference class: "Risk of death from various causes"
- Terrorism: ~0.0001% annual risk
- Car accidents: ~0.01% annual risk (100× higher)
### The Fix
1. **Don't use "what comes to mind" as reference class**
2. **Use systematic search** for historical data
3. **Weight by frequency, not memorability**
---
## 7. Confirmation Bias in Reference Class Selection
### What It Is
Choosing a reference class that supports your pre-existing belief.
### Example: Political Predictions
**If you want candidate X to win:**
- Choose reference class: "Elections where candidate had high favorability"
- Ignore reference class: "Elections in this demographic"
**If you want candidate X to lose:**
- Choose reference class: "Candidates with this scandal type"
- Ignore reference class: "Incumbents in strong economy"
### The Fix: Blind Selection
**Process:**
1. Choose reference class BEFORE you know the base rate
2. Write down selection criteria
3. Only then look up the base rate
4. Stick with it even if you don't like the answer
---
## 8. Ignoring Time Decay
### What It Is
Using old data when conditions have changed.
### Example: Newspaper Industry
**Reference class:** "Newspaper companies, 1950-2000"
**Base rate:** ~90% profitability
**Problem:** Internet fundamentally changed the industry
**Reality in 2010s:** ~10% profitability
### When Time Decay Matters
**High decay (don't use old data):**
- Technology industries (5-year half-life)
- Regulatory changes (e.g., crypto)
- Structural market shifts (e.g., remote work post-COVID)
**Low decay (old data OK):**
- Human behavior (still same evolutionary psychology)
- Physical laws (still same physics)
- Basic business economics (margins, etc.)
### The Fix
1. **Use last 5-10 years** as default
2. **Segment by era** if structural change occurred
3. **Weight recent data more heavily**
---
## 9. Causation Confusion
### What It Is
Including features in reference class that correlate but don't cause outcomes.
### Example: Ice Cream and Drowning
**Observation:** Ice cream sales correlate with drowning deaths
**Naive reference class:** "Days with high ice cream sales"
**Base rate:** Higher drowning rate
**Problem:** Ice cream doesn't cause drowning
**Confound:** Summer weather causes both
### In Forecasting
**The Trap:**
Adding irrelevant details to reference class:
- "Startups founded by left-handed CEOs"
- "Projects started on Tuesdays"
- "People born under Scorpio"
**Why It's Wrong:**
These features don't causally affect outcomes, they just add noise.
### The Fix: Causal Screening
**Ask:** "Does this feature **cause** different outcomes, or just **correlate**?"
**Include if causal:**
- Business model (causes different unit economics)
- Market size (causes different TAM)
- Technology maturity (causes different risk)
**Exclude if just correlation:**
- Founder's birthday
- Office location aesthetics
- Random demographic details
---
## 10. Narrow Framing
### What It Is
Defining the reference class too narrowly around the specific case.
### Example
**You're forecasting:** "Will this specific bill pass Congress?"
**Too narrow:** "Bills with this exact text in this specific session"
→ N = 1, no data
**Better:** "Bills of this type in similar political climate"
→ N = 50, usable data
### The Fix
If you can't find data, your reference class is too narrow. Go up one level of abstraction.
---
## 11. Extremeness Aversion
### What It Is
Reluctance to use extreme base rates (close to 0% or 100%).
### Psychological Bias
**People feel uncomfortable saying:**
- 5% chance
- 95% chance
**They retreat to:**
- 20% ("to be safe")
- 80% ("to be safe")
**This is wrong.** If the base rate is 5%, START at 5%.
### When Extreme Base Rates Are Correct
- Drug development: 5-10% reach market
- Lottery tickets: 0.0001% chance
- Sun rising tomorrow: 99.9999% chance
### The Fix
- **Trust the base rate** even if it feels extreme
- Extreme base rates are still better than gut feelings
- Use inside view to adjust, but don't automatically moderate
---
## 12. Scope Insensitivity
### What It Is
Not adjusting forecasts proportionally to scale.
### Example: Charity Donations
**Study:** People willing to pay same amount to save:
- 2,000 birds
- 20,000 birds
- 200,000 birds
**Problem:** Scope doesn't change emotional response
### In Reference Classes
**The Trap:**
- "Startup raising $1M" feels same as "Startup raising $10M"
- Reference class doesn't distinguish by scale
- Base rate is wrong
**The Fix:**
- Segment reference class by scale
- "$1M raises" have different success rate than "$10M raises"
- Make sure scope is specific
---
## Avoiding Pitfalls: Quick Checklist
Before finalizing your reference class, check:
- [ ] **Not neglecting base rate** - Started with statistics, not story
- [ ] **Not claiming uniqueness** - Passed reversal test
- [ ] **Sample size ≥ 30** - Not overfitting to small sample
- [ ] **Included failures** - No survivorship bias
- [ ] **Expected regression** - Extreme outcomes won't persist
- [ ] **Systematic selection** - Not using availability bias
- [ ] **Chose class first** - Not confirmation bias
- [ ] **Recent data** - Not ignoring time decay
- [ ] **Causal features only** - Not including random correlations
- [ ] **Right level of abstraction** - Not too narrow
- [ ] **OK with extremes** - Not moderating extreme base rates
- [ ] **Scope-specific** - Scale is accounted for
---
**Return to:** [Main Skill](../SKILL.md#interactive-menu)

View File

@@ -0,0 +1,219 @@
# Outside View Principles
## Theory and Foundation
### What is the Outside View?
The **Outside View** is a forecasting method that relies on statistical baselines from similar historical cases rather than detailed analysis of the specific case at hand.
**Coined by:** Daniel Kahneman and Amos Tversky
**Alternative names:** Reference class forecasting, actuarial prediction, statistical prediction
---
## The Two Views Framework
### Inside View (The Trap)
- Focuses on unique details of the specific case
- Constructs causal narratives
- Emphasizes what makes "this time different"
- Relies on expert intuition and judgment
- Feels more satisfying and controllable
**Example:** "Our startup will succeed because we have a great team, unique technology, strong market timing, and passionate founders."
### Outside View (The Discipline)
- Focuses on statistical patterns from similar cases
- Ignores unique narratives initially
- Emphasizes what usually happens to things like this
- Relies on base rates and frequencies
- Feels cold and impersonal
**Example:** "Seed-stage B2B SaaS startups have a 10% success rate. We start at 10%."
---
## Why the Outside View Wins
### Research Evidence
**The Planning Fallacy Study (Kahneman)**
- Students asked to predict thesis completion time
- Inside view: Average prediction = 33 days
- Actual average: 55 days
- Outside view (based on past students): 48 days
- **Result:** Outside view was 7× more accurate than inside view
**Expert Predictions vs Base Rates**
- Expert forecasters using inside view: 60% accuracy
- Simple base rate models: 70% accuracy
- **Result:** Ignoring expert judgment improves predictions
**Why Experts Fail:**
1. **Overweight unique details** (availability bias)
2. **Construct plausible narratives** (hindsight bias)
3. **Underweight statistical patterns** (base rate neglect)
4. **Overconfident in causal understanding** (illusion of control)
---
## When Outside View Works Best
### High-Signal Situations
✓ Large historical datasets exist
✓ Cases are reasonably similar
✓ Outcomes are measurable
✓ No major structural changes
✓ Randomness plays a significant role
**Examples:**
- Startup success rates
- Construction project delays
- Drug approval timelines
- Movie box office performance
- Sports team performance
---
## When Outside View Fails
### Low-Signal Situations
✗ Truly novel events (no reference class)
✗ Structural regime changes (e.g., new technology disrupts all patterns)
✗ Extremely heterogeneous reference class
✗ Small sample sizes (N < 20)
✗ Deterministic physics-based systems
**Examples:**
- First moon landing (no reference class)
- Pandemic with novel pathogen (limited reference class)
- Cryptocurrency regulation (regime change)
- Your friend's personality (N = 1)
**What to do:** Use outside view as starting point, then heavily weight specific evidence
---
## Statistical Thinking vs Narrative Thinking
### Narrative Thinking (Human Default)
- Brain constructs causal stories
- Connects dots into coherent explanations
- Feels satisfying and convincing
- **Problem:** Narratives are selected for coherence, not accuracy
**Example narrative:** "Startup X will fail because the CEO is inexperienced, the market is crowded, and they're burning cash."
This might be true, but:
- Experienced CEOs also fail
- Crowded markets have winners
- Cash burn is normal for startups
The narrative cherry-picks evidence.
### Statistical Thinking (Discipline Required)
- Brain resists cold numbers
- Requires active effort to override intuition
- Feels unsatisfying and reductive
- **Benefit:** Statistics aggregate all past evidence, not just confirming cases
**Example statistical:** "80% of startups with this profile fail within 3 years. Start at 80% failure probability."
---
## The Planning Fallacy in Depth
### What It Is
Systematic tendency to underestimate time, costs, and risks while overestimating benefits.
### Why It Happens
1. **Focus on success plan:** Ignore failure modes
2. **Best-case scenario bias:** Assume things go smoothly
3. **Neglect of base rates:** "Our project is different"
4. **Anchoring on ideal conditions:** Forget reality intrudes
### The Fix: Outside View
Instead of asking "How long will our project take?" ask:
- "How long did similar projects take?"
- "What was the distribution of outcomes?"
- "What percentage ran late? By how much?"
**Rule:** Assume your project is **average** for its class until proven otherwise.
---
## Regression to the Mean
### The Phenomenon
Extreme outcomes tend to be followed by more average outcomes.
**Examples:**
- Hot hand in basketball → Returns to average
- Stellar quarterly earnings → Next quarter closer to mean
- Brilliant startup idea → Execution regresses to mean
### Implication for Forecasting
If you're predicting based on an extreme observation:
- **Adjust toward the mean** unless you have evidence the extreme is sustainable
- Extreme outcomes are often luck + skill; luck doesn't persist
**Formula:**
```
Predicted = Mean + r × (Observed - Mean)
```
Where `r` = correlation (skill component)
If 50% skill, 50% luck → r = 0.5 → Expect halfway between observed and mean
---
## Integration with Inside View
### The Proper Sequence
**Phase 1: Outside View (Base Rate)**
1. Identify reference class
2. Find base rate
3. Set starting probability = base rate
**Phase 2: Inside View (Adjustment)**
4. Identify specific evidence
5. Calculate how much evidence shifts probability
6. Apply Bayesian update
**Phase 3: Calibration**
7. Check confidence intervals
8. Stress test with premortem
9. Remove biases
**Never skip Phase 1.** Even if you plan to heavily adjust, the base rate is your anchor.
---
## Common Objections (And Rebuttals)
### "But my case really IS different!"
**Response:** Maybe. But 90% of people say this, and 90% are wrong. Prove it with evidence, not narrative.
### "Base rates are too pessimistic!"
**Response:** Optimism doesn't change reality. If the base rate is 10%, being optimistic doesn't make it 50%.
### "I have insider knowledge!"
**Response:** Great! Use Bayesian updating to adjust from the base rate. But start with the base rate.
### "This feels too mechanical!"
**Response:** Good forecasting should feel mechanical. Intuition is for generating hypotheses, not estimating probabilities.
---
## Practical Takeaways
1. **Always start with base rate** - Non-negotiable
2. **Resist narrative seduction** - Stories feel true but aren't predictive
3. **Expect regression to mean** - Extreme outcomes are temporary
4. **Use inside view as update** - Not replacement for outside view
5. **Trust frequencies over judgment** - Especially when N is large
---
**Return to:** [Main Skill](../SKILL.md#interactive-menu)

View File

@@ -0,0 +1,357 @@
# Reference Class Selection Guide
## The Art and Science of Choosing Comparison Sets
Selecting the right reference class is the most critical judgment call in forecasting. Too broad and the base rate is meaningless. Too narrow and you have no data.
---
## The Goldilocks Principle
### Too Broad
**Problem:** High variance, low signal
**Example:** "Companies" as reference class for a fintech startup
**Base rate:** ~50% fail? 90% fail? Meaningless.
**Why it fails:** Includes everything from lemonade stands to Apple
### Too Narrow
**Problem:** No data, overfitting
**Example:** "Fintech startups founded in Q2 2024 by Stanford CS grads in SF"
**Base rate:** N = 3 companies, no outcomes yet
**Why it fails:** So specific there's no statistical pattern
### Just Right
**Sweet spot:** Specific enough to be homogeneous, broad enough to have data
**Example:** "Seed-stage B2B SaaS startups in financial services"
**Base rate:** Can find N = 200+ companies with 5-year outcomes
**Why it works:** Specific enough to be meaningful, broad enough for statistics
---
## Systematic Selection Method
### Step 1: Define the Core Entity Type
**Question:** What is the fundamental category?
**Examples:**
- Company (startup, public company, nonprofit)
- Project (software, construction, research)
- Person (athlete, politician, scientist)
- Event (election, war, natural disaster)
- Product (consumer, enterprise, service)
**Output:** "This is a [TYPE]"
---
### Step 2: Add Specificity Layers
Work through these dimensions **in order of importance:**
#### Layer 1: Stage/Size
- Startups: Pre-seed, Seed, Series A, B, C, Growth
- Projects: Small (<$1M), Medium ($1-10M), Large (>$10M)
- People: Beginner, Intermediate, Expert
- Products: MVP, Version 1.0, Mature
#### Layer 2: Category/Domain
- Startups: B2B, B2C, B2B2C
- Industry: Fintech, Healthcare, SaaS, Hardware
- Projects: Software, Construction, Pharmaceutical
- People: Role (CEO, Engineer, Designer)
#### Layer 3: Geography/Market
- US, Europe, Global
- Urban, Rural, Suburban
- Developed, Emerging markets
#### Layer 4: Time Period
- Current decade (2020s)
- Previous decade (2010s)
- Historical (pre-2010)
**Output:** "This is a [Stage] [Category] [Geography] [Type] from [Time Period]"
**Example:** "This is a Seed-stage B2B SaaS startup in the US from 2020-2024"
---
### Step 3: Test for Data Availability
**Search queries:**
```
"[Reference Class] success rate"
"[Reference Class] statistics"
"[Reference Class] survival rate"
"How many [Reference Class] succeed"
```
**Data availability check:**
- ✓ Found published studies/reports → Good reference class
- ⚠ Found anecdotal data → Usable but weak
- ✗ No data found → Reference class too narrow
**If no data:** Remove least important specificity layer and retry
---
### Step 4: Validate Homogeneity
**Question:** Are members of this class similar enough that averaging makes sense?
**Test: Variance Check**
If you have outcome data, calculate variance:
- Low variance → Good reference class (outcomes cluster)
- High variance → Bad reference class (outcomes all over the place)
**Heuristic: The Substitution Test**
Pick any two members of the reference class at random.
**Ask:** "If I swapped one for the other, would the prediction change dramatically?"
- No → Good homogeneity
- Yes → Too broad, needs subdivision
**Example:**
- "Tech startups" → Swap consumer mobile app for enterprise database company → Prediction changes drastically → **Too broad**
- "Seed-stage B2B SaaS" → Swap CRM tool for analytics platform → Prediction mostly same → **Good homogeneity**
---
## Similarity Metrics
### When You Can't Find Exact Match
If no perfect reference class exists, use **similarity matching** to find nearest neighbors.
### Dimensions of Similarity
**For Startups:**
1. Business model (B2B, B2C, marketplace, SaaS)
2. Revenue model (subscription, transaction, ads)
3. Stage/funding (seed, Series A, etc.)
4. Team size
5. Market size
6. Technology complexity
**For Projects:**
1. Size (budget, team size, duration)
2. Complexity (simple, moderate, complex)
3. Technology maturity (proven, emerging, experimental)
4. Team experience
5. Dependencies (few, many)
**For People:**
1. Experience level
2. Domain expertise
3. Resources available
4. Historical track record
5. Contextual factors (support, environment)
### Similarity Scoring
**Method: Nearest Neighbors**
1. List all dimensions of similarity (5-7 dimensions)
2. For each dimension, score how similar the case is to reference class (0-10)
3. Average the scores
4. Threshold: If similarity < 7/10, the reference class may not apply
**Example:**
Comparing "AI startup in 2024" to "Software startups 2010-2020" reference class:
- Business model: 9/10 (same)
- Revenue model: 8/10 (mostly SaaS)
- Technology maturity: 4/10 (AI is newer)
- Market size: 7/10 (comparable)
- Team size: 8/10 (similar)
- Funding environment: 5/10 (tighter in 2024)
**Average: 6.8/10** → Marginal reference class; use with caution
---
## Edge Cases and Judgment Calls
### Case 1: Structural Regime Change
**Problem:** Conditions have changed fundamentally since historical data
**Examples:**
- Pre-internet vs post-internet business
- Pre-COVID vs post-COVID work patterns
- Pre-AI vs post-AI software development
**Solution:**
1. Segment data by era if possible
2. Use most recent data only
3. Adjust base rate for known structural differences
4. Increase uncertainty bounds
---
### Case 2: The N=1 Problem
**Problem:** The case is literally unique (first of its kind)
**Examples:**
- First moon landing
- First pandemic of a novel pathogen
- First AGI system
**Solution:**
1. **Widen the class** - Go up one abstraction level
- "First moon landing" → "First major engineering projects"
- "Novel pandemic" → "Past pandemics of any type"
2. **Component decomposition** - Break into parts that have reference classes
- "Moon landing" → Rocket success rate × Navigation accuracy × Life support reliability
3. **Expert aggregation** - When no data, aggregate expert predictions (but with humility)
---
### Case 3: Multiple Plausible Reference Classes
**Problem:** Event could belong to multiple classes with different base rates
**Example:** "Elon Musk starting a brain-computer interface company"
Possible reference classes:
- "Startups by serial entrepreneurs" → 40% success
- "Medical device startups" → 10% success
- "Moonshot technology ventures" → 5% success
- "Companies founded by Elon Musk" → 80% success
**Solution: Ensemble Averaging**
1. Identify all plausible reference classes
2. Find base rate for each
3. Weight by relevance/similarity
4. Calculate weighted average
**Example weights:**
- Medical device (40%): 10% × 0.4 = 4%
- Moonshot tech (30%): 5% × 0.3 = 1.5%
- Serial entrepreneur (20%): 40% × 0.2 = 8%
- Elon track record (10%): 80% × 0.1 = 8%
**Weighted base rate: 21.5%**
---
## Common Selection Mistakes
### Mistake 1: Cherry-Picking Success Examples
**What it looks like:** "Reference class = Companies like Apple, Google, Facebook"
**Why it's wrong:** Survivorship bias - only looking at winners
**Fix:** Include all attempts, not just successes
### Mistake 2: Availability Bias
**What it looks like:** Reference class = Recent, memorable cases
**Why it's wrong:** Recent events are overweighted in memory
**Fix:** Use systematic data collection, not what comes to mind
### Mistake 3: Confirmation Bias
**What it looks like:** Choosing reference class that supports your prior belief
**Why it's wrong:** You're reverse-engineering the answer
**Fix:** Choose reference class BEFORE looking at base rate
### Mistake 4: Overfitting to Irrelevant Details
**What it looks like:** "Female, left-handed CEOs who went to Ivy League schools"
**Why it's wrong:** Most details don't matter; you're adding noise
**Fix:** Only include features that causally affect outcomes
### Mistake 5: Ignoring Time Decay
**What it looks like:** Using data from 1970s for 2024 prediction
**Why it's wrong:** World has changed
**Fix:** Weight recent data more heavily, or segment by era
---
## Reference Class Hierarchy
### Start Specific, Widen as Needed
**Level 1: Maximally Specific** (Try this first)
- Example: "Seed-stage B2B cybersecurity SaaS in US, 2020-2024"
- Check for data → If N > 30, use this
**Level 2: Drop One Feature** (If L1 has no data)
- Example: "Seed-stage B2B SaaS in US, 2020-2024" (removed "cybersecurity")
- Check for data → If N > 30, use this
**Level 3: Drop Two Features** (If L2 has no data)
- Example: "Seed-stage B2B SaaS, 2020-2024" (removed "US")
- Check for data → If N > 30, use this
**Level 4: Generic Category** (Last resort)
- Example: "Seed-stage startups"
- Always has data, but high variance
**Rule:** Use the most specific level that still gives you N ≥ 30 data points.
---
## Checklist: Is This a Good Reference Class?
Use this to validate your choice:
- [ ] **Sample size** ≥ 30 historical cases
- [ ] **Homogeneity**: Members are similar enough that averaging makes sense
- [ ] **Relevance**: Data is from appropriate time period (last 10 years preferred)
- [ ] **Specificity**: Class is narrow enough to be meaningful
- [ ] **Data availability**: Base rate is published or calculable
- [ ] **No survivorship bias**: Includes failures, not just successes
- [ ] **No cherry-picking**: Class chosen before looking at base rate
- [ ] **Causal relevance**: Features included actually affect outcomes
**If ≥ 6 checked:** Good reference class
**If 4-5 checked:** Acceptable, but increase uncertainty
**If < 4 checked:** Find a better reference class
---
## Advanced: Bayesian Reference Class Selection
When you have multiple plausible reference classes, you can use Bayesian reasoning:
### Step 1: Prior Distribution Over Classes
Assign probability to each reference class being the "true" one
**Example:**
- P(Class = "B2B SaaS") = 60%
- P(Class = "All SaaS") = 30%
- P(Class = "All startups") = 10%
### Step 2: Likelihood of Observed Features
How likely is this specific case under each class?
### Step 3: Posterior Distribution
Update class probabilities using Bayes' rule
### Step 4: Weighted Base Rate
Average base rates weighted by posterior probabilities
**This is advanced.** Default to the systematic selection method above unless you have strong quantitative skills.
---
## Practical Workflow
### Quick Protocol (5 minutes)
1. **Name the core type:** "This is a [X]"
2. **Add 2-3 specificity layers:** Stage, category, geography
3. **Google the base rate:** "[Reference class] success rate"
4. **Sanity check:** Does N > 30? Are members similar?
5. **Use it:** This is your starting probability
### Rigorous Protocol (30 minutes)
1. Systematic selection (Steps 1-4 above)
2. Similarity scoring for validation
3. Check for structural regime changes
4. Consider multiple reference classes
5. Weighted ensemble if multiple classes
6. Document assumptions and limitations
---
**Return to:** [Main Skill](../SKILL.md#interactive-menu)