Initial commit

2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions
--- a/skills/reference-class-forecasting/resources/common-pitfalls.md
+++ b/skills/reference-class-forecasting/resources/common-pitfalls.md
@@ -0,0 +1,414 @@
+# Common Pitfalls in Reference Class Forecasting
+
+## The Traps That Even Experts Fall Into
+
+---
+
+## 1. Base Rate Neglect
+
+### What It Is
+Ignoring statistical baselines in favor of specific case details.
+
+### Example: The Taxi Problem
+
+**Scenario:** A taxi was involved in a hit-and-run. Witness says it was Blue.
+- 85% of taxis in city are Green
+- 15% of taxis are Blue
+- Witness is 80% reliable in identifying colors
+
+**Most people say:** 80% chance it was Blue (trusting the witness)
+
+**Correct answer:**
+```
+P(Blue | Witness says Blue) = 41%
+```
+
+Why? The **base rate** (15% Blue taxis) dominates the witness reliability.
+
+### In Forecasting
+
+**The Trap:**
+- Focusing on compelling story about this startup
+- Ignoring that 90% of startups fail
+
+**The Fix:**
+- Start with 90% failure rate
+- Then adjust based on specific evidence
+- Weight base rate heavily (especially when base rate is extreme)
+
+---
+
+## 2. "This Time Is Different" Bias
+
+### What It Is
+Belief that current situation is unique and historical patterns don't apply.
+
+### Classic Examples
+
+**Financial Markets:**
+- "This bubble is different" (said before every bubble burst)
+- "This time housing prices won't fall"
+- "This cryptocurrency is different from previous scams"
+
+**Technology:**
+- "This social network will overtake Facebook" (dozens have failed)
+- "This time AI will achieve AGI" (predicted for 60 years)
+
+**Startups:**
+- "Our team is better than average" (90% of founders say this)
+- "Our market timing is perfect" (survivorship bias)
+
+### Why It Happens
+1. **Narrative fallacy:** Humans construct unique stories for everything
+2. **Overconfidence:** We overestimate our ability to judge uniqueness
+3. **Availability bias:** Recent cases feel more salient than statistical patterns
+4. **Ego:** Admitting you're "average" feels bad
+
+### The Fix: The Reversal Test
+
+**Question:** "If someone else claimed their case was unique for the same reasons I'm claiming, would I believe them?"
+
+**If NO → You're applying special pleading to yourself**
+
+### The Reality
+- ~95% of cases that feel unique are actually normal
+- The 5% that ARE unique still have some reference class (just more abstract)
+- Uniqueness is a matter of degree, not binary
+
+---
+
+## 3. Overfitting to Small Samples
+
+### What It Is
+Drawing strong conclusions from limited data points.
+
+### Example: The Hot Hand Fallacy
+
+**Basketball:** Player makes 3 shots in a row → "Hot hand!"
+**Reality:** Random sequences produce streaks. 3-in-a-row is not evidence of skill change.
+
+### In Reference Classes
+
+**The Trap:**
+- Finding N = 5 companies similar to yours
+- All 5 succeeded
+- Concluding 100% success rate
+
+**Why It's Wrong:**
+- Small samples have high variance
+- Sampling error dominates signal
+- Regression to the mean will occur
+
+### The Fix: Minimum Sample Size Rule
+
+**Minimum:** N ≥ 30 for meaningful statistics
+- N < 10: No statistical power
+- N = 10-30: Weak signal, wide confidence intervals
+- N > 30: Acceptable
+- N > 100: Good
+- N > 1000: Excellent
+
+**If N < 30:**
+1. Widen reference class to get more data
+2. Acknowledge high uncertainty (wide CI)
+3. Don't trust extreme base rates from small samples
+
+---
+
+## 4. Survivorship Bias
+
+### What It Is
+Only looking at cases that "survived" and ignoring failures.
+
+### Classic Example: WW2 Bomber Armor
+
+**Observation:** Returning bombers had bullet holes in wings and tail
+**Naive conclusion:** Reinforce wings and tail
+**Truth:** Planes shot in the engine didn't return (survivorship bias)
+**Correct action:** Reinforce engines
+
+### In Reference Classes
+
+**The Trap:**
+- "Reference class = Successful tech companies" (Apple, Google, Microsoft)
+- Base rate = 100% success
+- Ignoring all the failed companies
+
+**Why It's Wrong:**
+You're only looking at winners, which biases the base rate upward.
+
+### The Fix: Include All Attempts
+
+**Correct reference class:**
+- "All companies that TRIED to do X"
+- Not "All companies that SUCCEEDED at X"
+
+**Example:**
+- Wrong: "Companies like Facebook" → 100% success
+- Right: "Social network startups 2004-2010" → 5% success
+
+---
+
+## 5. Regression to the Mean Neglect
+
+### What It Is
+Failing to expect extreme outcomes to return to average.
+
+### The Sports Illustrated Jinx
+
+**Observation:** Athletes on SI cover often perform worse after
+**Naive explanation:** Curse, pressure, jinx
+**Truth:** They were on cover BECAUSE of extreme (lucky) performance → Regression to mean is inevitable
+
+### In Forecasting
+
+**The Trap:**
+- Company has amazing quarter → Predict continued amazing performance
+- Ignoring: Some of that was luck, which won't repeat
+
+**The Fix: Regression Formula**
+```
+Predicted = Mean + r × (Observed - Mean)
+```
+
+Where `r` = skill/(skill + luck)
+
+**If 50% luck, 50% skill:**
+```
+Predicted = Halfway between observed and mean
+```
+
+### Examples
+
+**Startup with $10M ARR in Year 1:**
+- Mean for seed startups: $500K ARR
+- Observed: $10M (extreme!)
+- Likely some luck (viral moment, lucky timing)
+- Predicted Year 2: Somewhere between $500K and $10M, not $20M
+
+**Student who aces one test:**
+- Don't predict straight A's (could have been easy test, lucky guessing)
+- Predict performance closer to class average
+
+---
+
+## 6. Availability Bias in Class Selection
+
+### What It Is
+Choosing reference class based on what's memorable, not what's statistically valid.
+
+### Example: Terrorism Risk
+
+**After 9/11:**
+- Availability: Terrorism is extremely salient
+- Many people chose reference class: "Major terrorist attacks on US"
+- Base rate felt high (because it was memorable)
+
+**Reality:**
+- Correct reference class: "Risk of death from various causes"
+- Terrorism: ~0.0001% annual risk
+- Car accidents: ~0.01% annual risk (100× higher)
+
+### The Fix
+1. **Don't use "what comes to mind" as reference class**
+2. **Use systematic search** for historical data
+3. **Weight by frequency, not memorability**
+
+---
+
+## 7. Confirmation Bias in Reference Class Selection
+
+### What It Is
+Choosing a reference class that supports your pre-existing belief.
+
+### Example: Political Predictions
+
+**If you want candidate X to win:**
+- Choose reference class: "Elections where candidate had high favorability"
+- Ignore reference class: "Elections in this demographic"
+
+**If you want candidate X to lose:**
+- Choose reference class: "Candidates with this scandal type"
+- Ignore reference class: "Incumbents in strong economy"
+
+### The Fix: Blind Selection
+
+**Process:**
+1. Choose reference class BEFORE you know the base rate
+2. Write down selection criteria
+3. Only then look up the base rate
+4. Stick with it even if you don't like the answer
+
+---
+
+## 8. Ignoring Time Decay
+
+### What It Is
+Using old data when conditions have changed.
+
+### Example: Newspaper Industry
+
+**Reference class:** "Newspaper companies, 1950-2000"
+**Base rate:** ~90% profitability
+
+**Problem:** Internet fundamentally changed the industry
+**Reality in 2010s:** ~10% profitability
+
+### When Time Decay Matters
+
+**High decay (don't use old data):**
+- Technology industries (5-year half-life)
+- Regulatory changes (e.g., crypto)
+- Structural market shifts (e.g., remote work post-COVID)
+
+**Low decay (old data OK):**
+- Human behavior (still same evolutionary psychology)
+- Physical laws (still same physics)
+- Basic business economics (margins, etc.)
+
+### The Fix
+1. **Use last 5-10 years** as default
+2. **Segment by era** if structural change occurred
+3. **Weight recent data more heavily**
+
+---
+
+## 9. Causation Confusion
+
+### What It Is
+Including features in reference class that correlate but don't cause outcomes.
+
+### Example: Ice Cream and Drowning
+
+**Observation:** Ice cream sales correlate with drowning deaths
+**Naive reference class:** "Days with high ice cream sales"
+**Base rate:** Higher drowning rate
+
+**Problem:** Ice cream doesn't cause drowning
+**Confound:** Summer weather causes both
+
+### In Forecasting
+
+**The Trap:**
+Adding irrelevant details to reference class:
+- "Startups founded by left-handed CEOs"
+- "Projects started on Tuesdays"
+- "People born under Scorpio"
+
+**Why It's Wrong:**
+These features don't causally affect outcomes, they just add noise.
+
+### The Fix: Causal Screening
+
+**Ask:** "Does this feature **cause** different outcomes, or just **correlate**?"
+
+**Include if causal:**
+- Business model (causes different unit economics)
+- Market size (causes different TAM)
+- Technology maturity (causes different risk)
+
+**Exclude if just correlation:**
+- Founder's birthday
+- Office location aesthetics
+- Random demographic details
+
+---
+
+## 10. Narrow Framing
+
+### What It Is
+Defining the reference class too narrowly around the specific case.
+
+### Example
+
+**You're forecasting:** "Will this specific bill pass Congress?"
+
+**Too narrow:** "Bills with this exact text in this specific session"
+→ N = 1, no data
+
+**Better:** "Bills of this type in similar political climate"
+→ N = 50, usable data
+
+### The Fix
+If you can't find data, your reference class is too narrow. Go up one level of abstraction.
+
+---
+
+## 11. Extremeness Aversion
+
+### What It Is
+Reluctance to use extreme base rates (close to 0% or 100%).
+
+### Psychological Bias
+
+**People feel uncomfortable saying:**
+- 5% chance
+- 95% chance
+
+**They retreat to:**
+- 20% ("to be safe")
+- 80% ("to be safe")
+
+**This is wrong.** If the base rate is 5%, START at 5%.
+
+### When Extreme Base Rates Are Correct
+
+- Drug development: 5-10% reach market
+- Lottery tickets: 0.0001% chance
+- Sun rising tomorrow: 99.9999% chance
+
+### The Fix
+- **Trust the base rate** even if it feels extreme
+- Extreme base rates are still better than gut feelings
+- Use inside view to adjust, but don't automatically moderate
+
+---
+
+## 12. Scope Insensitivity
+
+### What It Is
+Not adjusting forecasts proportionally to scale.
+
+### Example: Charity Donations
+
+**Study:** People willing to pay same amount to save:
+- 2,000 birds
+- 20,000 birds
+- 200,000 birds
+
+**Problem:** Scope doesn't change emotional response
+
+### In Reference Classes
+
+**The Trap:**
+- "Startup raising $1M" feels same as "Startup raising $10M"
+- Reference class doesn't distinguish by scale
+- Base rate is wrong
+
+**The Fix:**
+- Segment reference class by scale
+- "$1M raises" have different success rate than "$10M raises"
+- Make sure scope is specific
+
+---
+
+## Avoiding Pitfalls: Quick Checklist
+
+Before finalizing your reference class, check:
+
+- [ ] **Not neglecting base rate** - Started with statistics, not story
+- [ ] **Not claiming uniqueness** - Passed reversal test
+- [ ] **Sample size ≥ 30** - Not overfitting to small sample
+- [ ] **Included failures** - No survivorship bias
+- [ ] **Expected regression** - Extreme outcomes won't persist
+- [ ] **Systematic selection** - Not using availability bias
+- [ ] **Chose class first** - Not confirmation bias
+- [ ] **Recent data** - Not ignoring time decay
+- [ ] **Causal features only** - Not including random correlations
+- [ ] **Right level of abstraction** - Not too narrow
+- [ ] **OK with extremes** - Not moderating extreme base rates
+- [ ] **Scope-specific** - Scale is accounted for
+
+---
+
+**Return to:** [Main Skill](../SKILL.md#interactive-menu)
--- a/skills/reference-class-forecasting/resources/outside-view-principles.md
+++ b/skills/reference-class-forecasting/resources/outside-view-principles.md
@@ -0,0 +1,219 @@
+# Outside View Principles
+
+## Theory and Foundation
+
+### What is the Outside View?
+
+The **Outside View** is a forecasting method that relies on statistical baselines from similar historical cases rather than detailed analysis of the specific case at hand.
+
+**Coined by:** Daniel Kahneman and Amos Tversky
+**Alternative names:** Reference class forecasting, actuarial prediction, statistical prediction
+
+---
+
+## The Two Views Framework
+
+### Inside View (The Trap)
+- Focuses on unique details of the specific case
+- Constructs causal narratives
+- Emphasizes what makes "this time different"
+- Relies on expert intuition and judgment
+- Feels more satisfying and controllable
+
+**Example:** "Our startup will succeed because we have a great team, unique technology, strong market timing, and passionate founders."
+
+### Outside View (The Discipline)
+- Focuses on statistical patterns from similar cases
+- Ignores unique narratives initially
+- Emphasizes what usually happens to things like this
+- Relies on base rates and frequencies
+- Feels cold and impersonal
+
+**Example:** "Seed-stage B2B SaaS startups have a 10% success rate. We start at 10%."
+
+---
+
+## Why the Outside View Wins
+
+### Research Evidence
+
+**The Planning Fallacy Study (Kahneman)**
+- Students asked to predict thesis completion time
+- Inside view: Average prediction = 33 days
+- Actual average: 55 days
+- Outside view (based on past students): 48 days
+- **Result:** Outside view was 7× more accurate than inside view
+
+**Expert Predictions vs Base Rates**
+- Expert forecasters using inside view: 60% accuracy
+- Simple base rate models: 70% accuracy
+- **Result:** Ignoring expert judgment improves predictions
+
+**Why Experts Fail:**
+1. **Overweight unique details** (availability bias)
+2. **Construct plausible narratives** (hindsight bias)
+3. **Underweight statistical patterns** (base rate neglect)
+4. **Overconfident in causal understanding** (illusion of control)
+
+---
+
+## When Outside View Works Best
+
+### High-Signal Situations
+✓ Large historical datasets exist
+✓ Cases are reasonably similar
+✓ Outcomes are measurable
+✓ No major structural changes
+✓ Randomness plays a significant role
+
+**Examples:**
+- Startup success rates
+- Construction project delays
+- Drug approval timelines
+- Movie box office performance
+- Sports team performance
+
+---
+
+## When Outside View Fails
+
+### Low-Signal Situations
+✗ Truly novel events (no reference class)
+✗ Structural regime changes (e.g., new technology disrupts all patterns)
+✗ Extremely heterogeneous reference class
+✗ Small sample sizes (N < 20)
+✗ Deterministic physics-based systems
+
+**Examples:**
+- First moon landing (no reference class)
+- Pandemic with novel pathogen (limited reference class)
+- Cryptocurrency regulation (regime change)
+- Your friend's personality (N = 1)
+
+**What to do:** Use outside view as starting point, then heavily weight specific evidence
+
+---
+
+## Statistical Thinking vs Narrative Thinking
+
+### Narrative Thinking (Human Default)
+- Brain constructs causal stories
+- Connects dots into coherent explanations
+- Feels satisfying and convincing
+- **Problem:** Narratives are selected for coherence, not accuracy
+
+**Example narrative:** "Startup X will fail because the CEO is inexperienced, the market is crowded, and they're burning cash."
+
+This might be true, but:
+- Experienced CEOs also fail
+- Crowded markets have winners
+- Cash burn is normal for startups
+
+The narrative cherry-picks evidence.
+
+### Statistical Thinking (Discipline Required)
+- Brain resists cold numbers
+- Requires active effort to override intuition
+- Feels unsatisfying and reductive
+- **Benefit:** Statistics aggregate all past evidence, not just confirming cases
+
+**Example statistical:** "80% of startups with this profile fail within 3 years. Start at 80% failure probability."
+
+---
+
+## The Planning Fallacy in Depth
+
+### What It Is
+Systematic tendency to underestimate time, costs, and risks while overestimating benefits.
+
+### Why It Happens
+1. **Focus on success plan:** Ignore failure modes
+2. **Best-case scenario bias:** Assume things go smoothly
+3. **Neglect of base rates:** "Our project is different"
+4. **Anchoring on ideal conditions:** Forget reality intrudes
+
+### The Fix: Outside View
+Instead of asking "How long will our project take?" ask:
+- "How long did similar projects take?"
+- "What was the distribution of outcomes?"
+- "What percentage ran late? By how much?"
+
+**Rule:** Assume your project is **average** for its class until proven otherwise.
+
+---
+
+## Regression to the Mean
+
+### The Phenomenon
+Extreme outcomes tend to be followed by more average outcomes.
+
+**Examples:**
+- Hot hand in basketball → Returns to average
+- Stellar quarterly earnings → Next quarter closer to mean
+- Brilliant startup idea → Execution regresses to mean
+
+### Implication for Forecasting
+If you're predicting based on an extreme observation:
+- **Adjust toward the mean** unless you have evidence the extreme is sustainable
+- Extreme outcomes are often luck + skill; luck doesn't persist
+
+**Formula:**
+```
+Predicted = Mean + r × (Observed - Mean)
+```
+Where `r` = correlation (skill component)
+
+If 50% skill, 50% luck → r = 0.5 → Expect halfway between observed and mean
+
+---
+
+## Integration with Inside View
+
+### The Proper Sequence
+
+**Phase 1: Outside View (Base Rate)**
+1. Identify reference class
+2. Find base rate
+3. Set starting probability = base rate
+
+**Phase 2: Inside View (Adjustment)**
+4. Identify specific evidence
+5. Calculate how much evidence shifts probability
+6. Apply Bayesian update
+
+**Phase 3: Calibration**
+7. Check confidence intervals
+8. Stress test with premortem
+9. Remove biases
+
+**Never skip Phase 1.** Even if you plan to heavily adjust, the base rate is your anchor.
+
+---
+
+## Common Objections (And Rebuttals)
+
+### "But my case really IS different!"
+**Response:** Maybe. But 90% of people say this, and 90% are wrong. Prove it with evidence, not narrative.
+
+### "Base rates are too pessimistic!"
+**Response:** Optimism doesn't change reality. If the base rate is 10%, being optimistic doesn't make it 50%.
+
+### "I have insider knowledge!"
+**Response:** Great! Use Bayesian updating to adjust from the base rate. But start with the base rate.
+
+### "This feels too mechanical!"
+**Response:** Good forecasting should feel mechanical. Intuition is for generating hypotheses, not estimating probabilities.
+
+---
+
+## Practical Takeaways
+
+1. **Always start with base rate** - Non-negotiable
+2. **Resist narrative seduction** - Stories feel true but aren't predictive
+3. **Expect regression to mean** - Extreme outcomes are temporary
+4. **Use inside view as update** - Not replacement for outside view
+5. **Trust frequencies over judgment** - Especially when N is large
+
+---
+
+**Return to:** [Main Skill](../SKILL.md#interactive-menu)
--- a/skills/reference-class-forecasting/resources/reference-class-selection.md
+++ b/skills/reference-class-forecasting/resources/reference-class-selection.md
@@ -0,0 +1,357 @@
+# Reference Class Selection Guide
+
+## The Art and Science of Choosing Comparison Sets
+
+Selecting the right reference class is the most critical judgment call in forecasting. Too broad and the base rate is meaningless. Too narrow and you have no data.
+
+---
+
+## The Goldilocks Principle
+
+### Too Broad
+**Problem:** High variance, low signal
+**Example:** "Companies" as reference class for a fintech startup
+**Base rate:** ~50% fail? 90% fail? Meaningless.
+**Why it fails:** Includes everything from lemonade stands to Apple
+
+### Too Narrow
+**Problem:** No data, overfitting
+**Example:** "Fintech startups founded in Q2 2024 by Stanford CS grads in SF"
+**Base rate:** N = 3 companies, no outcomes yet
+**Why it fails:** So specific there's no statistical pattern
+
+### Just Right
+**Sweet spot:** Specific enough to be homogeneous, broad enough to have data
+**Example:** "Seed-stage B2B SaaS startups in financial services"
+**Base rate:** Can find N = 200+ companies with 5-year outcomes
+**Why it works:** Specific enough to be meaningful, broad enough for statistics
+
+---
+
+## Systematic Selection Method
+
+### Step 1: Define the Core Entity Type
+
+**Question:** What is the fundamental category?
+
+**Examples:**
+- Company (startup, public company, nonprofit)
+- Project (software, construction, research)
+- Person (athlete, politician, scientist)
+- Event (election, war, natural disaster)
+- Product (consumer, enterprise, service)
+
+**Output:** "This is a [TYPE]"
+
+---
+
+### Step 2: Add Specificity Layers
+
+Work through these dimensions **in order of importance:**
+
+#### Layer 1: Stage/Size
+- Startups: Pre-seed, Seed, Series A, B, C, Growth
+- Projects: Small (<$1M), Medium ($1-10M), Large (>$10M)
+- People: Beginner, Intermediate, Expert
+- Products: MVP, Version 1.0, Mature
+
+#### Layer 2: Category/Domain
+- Startups: B2B, B2C, B2B2C
+- Industry: Fintech, Healthcare, SaaS, Hardware
+- Projects: Software, Construction, Pharmaceutical
+- People: Role (CEO, Engineer, Designer)
+
+#### Layer 3: Geography/Market
+- US, Europe, Global
+- Urban, Rural, Suburban
+- Developed, Emerging markets
+
+#### Layer 4: Time Period
+- Current decade (2020s)
+- Previous decade (2010s)
+- Historical (pre-2010)
+
+**Output:** "This is a [Stage] [Category] [Geography] [Type] from [Time Period]"
+
+**Example:** "This is a Seed-stage B2B SaaS startup in the US from 2020-2024"
+
+---
+
+### Step 3: Test for Data Availability
+
+**Search queries:**
+```
+"[Reference Class] success rate"
+"[Reference Class] statistics"
+"[Reference Class] survival rate"
+"How many [Reference Class] succeed"
+```
+
+**Data availability check:**
+- ✓ Found published studies/reports → Good reference class
+- ⚠ Found anecdotal data → Usable but weak
+- ✗ No data found → Reference class too narrow
+
+**If no data:** Remove least important specificity layer and retry
+
+---
+
+### Step 4: Validate Homogeneity
+
+**Question:** Are members of this class similar enough that averaging makes sense?
+
+**Test: Variance Check**
+If you have outcome data, calculate variance:
+- Low variance → Good reference class (outcomes cluster)
+- High variance → Bad reference class (outcomes all over the place)
+
+**Heuristic: The Substitution Test**
+Pick any two members of the reference class at random.
+
+**Ask:** "If I swapped one for the other, would the prediction change dramatically?"
+- No → Good homogeneity
+- Yes → Too broad, needs subdivision
+
+**Example:**
+- "Tech startups" → Swap consumer mobile app for enterprise database company → Prediction changes drastically → **Too broad**
+- "Seed-stage B2B SaaS" → Swap CRM tool for analytics platform → Prediction mostly same → **Good homogeneity**
+
+---
+
+## Similarity Metrics
+
+### When You Can't Find Exact Match
+
+If no perfect reference class exists, use **similarity matching** to find nearest neighbors.
+
+### Dimensions of Similarity
+
+**For Startups:**
+1. Business model (B2B, B2C, marketplace, SaaS)
+2. Revenue model (subscription, transaction, ads)
+3. Stage/funding (seed, Series A, etc.)
+4. Team size
+5. Market size
+6. Technology complexity
+
+**For Projects:**
+1. Size (budget, team size, duration)
+2. Complexity (simple, moderate, complex)
+3. Technology maturity (proven, emerging, experimental)
+4. Team experience
+5. Dependencies (few, many)
+
+**For People:**
+1. Experience level
+2. Domain expertise
+3. Resources available
+4. Historical track record
+5. Contextual factors (support, environment)
+
+### Similarity Scoring
+
+**Method: Nearest Neighbors**
+
+1. List all dimensions of similarity (5-7 dimensions)
+2. For each dimension, score how similar the case is to reference class (0-10)
+3. Average the scores
+4. Threshold: If similarity < 7/10, the reference class may not apply
+
+**Example:**
+Comparing "AI startup in 2024" to "Software startups 2010-2020" reference class:
+- Business model: 9/10 (same)
+- Revenue model: 8/10 (mostly SaaS)
+- Technology maturity: 4/10 (AI is newer)
+- Market size: 7/10 (comparable)
+- Team size: 8/10 (similar)
+- Funding environment: 5/10 (tighter in 2024)
+
+**Average: 6.8/10** → Marginal reference class; use with caution
+
+---
+
+## Edge Cases and Judgment Calls
+
+### Case 1: Structural Regime Change
+
+**Problem:** Conditions have changed fundamentally since historical data
+
+**Examples:**
+- Pre-internet vs post-internet business
+- Pre-COVID vs post-COVID work patterns
+- Pre-AI vs post-AI software development
+
+**Solution:**
+1. Segment data by era if possible
+2. Use most recent data only
+3. Adjust base rate for known structural differences
+4. Increase uncertainty bounds
+
+---
+
+### Case 2: The N=1 Problem
+
+**Problem:** The case is literally unique (first of its kind)
+
+**Examples:**
+- First moon landing
+- First pandemic of a novel pathogen
+- First AGI system
+
+**Solution:**
+1. **Widen the class** - Go up one abstraction level
+   - "First moon landing" → "First major engineering projects"
+   - "Novel pandemic" → "Past pandemics of any type"
+2. **Component decomposition** - Break into parts that have reference classes
+   - "Moon landing" → Rocket success rate × Navigation accuracy × Life support reliability
+3. **Expert aggregation** - When no data, aggregate expert predictions (but with humility)
+
+---
+
+### Case 3: Multiple Plausible Reference Classes
+
+**Problem:** Event could belong to multiple classes with different base rates
+
+**Example:** "Elon Musk starting a brain-computer interface company"
+
+Possible reference classes:
+- "Startups by serial entrepreneurs" → 40% success
+- "Medical device startups" → 10% success
+- "Moonshot technology ventures" → 5% success
+- "Companies founded by Elon Musk" → 80% success
+
+**Solution: Ensemble Averaging**
+1. Identify all plausible reference classes
+2. Find base rate for each
+3. Weight by relevance/similarity
+4. Calculate weighted average
+
+**Example weights:**
+- Medical device (40%): 10% × 0.4 = 4%
+- Moonshot tech (30%): 5% × 0.3 = 1.5%
+- Serial entrepreneur (20%): 40% × 0.2 = 8%
+- Elon track record (10%): 80% × 0.1 = 8%
+
+**Weighted base rate: 21.5%**
+
+---
+
+## Common Selection Mistakes
+
+### Mistake 1: Cherry-Picking Success Examples
+**What it looks like:** "Reference class = Companies like Apple, Google, Facebook"
+**Why it's wrong:** Survivorship bias - only looking at winners
+**Fix:** Include all attempts, not just successes
+
+### Mistake 2: Availability Bias
+**What it looks like:** Reference class = Recent, memorable cases
+**Why it's wrong:** Recent events are overweighted in memory
+**Fix:** Use systematic data collection, not what comes to mind
+
+### Mistake 3: Confirmation Bias
+**What it looks like:** Choosing reference class that supports your prior belief
+**Why it's wrong:** You're reverse-engineering the answer
+**Fix:** Choose reference class BEFORE looking at base rate
+
+### Mistake 4: Overfitting to Irrelevant Details
+**What it looks like:** "Female, left-handed CEOs who went to Ivy League schools"
+**Why it's wrong:** Most details don't matter; you're adding noise
+**Fix:** Only include features that causally affect outcomes
+
+### Mistake 5: Ignoring Time Decay
+**What it looks like:** Using data from 1970s for 2024 prediction
+**Why it's wrong:** World has changed
+**Fix:** Weight recent data more heavily, or segment by era
+
+---
+
+## Reference Class Hierarchy
+
+### Start Specific, Widen as Needed
+
+**Level 1: Maximally Specific** (Try this first)
+- Example: "Seed-stage B2B cybersecurity SaaS in US, 2020-2024"
+- Check for data → If N > 30, use this
+
+**Level 2: Drop One Feature** (If L1 has no data)
+- Example: "Seed-stage B2B SaaS in US, 2020-2024" (removed "cybersecurity")
+- Check for data → If N > 30, use this
+
+**Level 3: Drop Two Features** (If L2 has no data)
+- Example: "Seed-stage B2B SaaS, 2020-2024" (removed "US")
+- Check for data → If N > 30, use this
+
+**Level 4: Generic Category** (Last resort)
+- Example: "Seed-stage startups"
+- Always has data, but high variance
+
+**Rule:** Use the most specific level that still gives you N ≥ 30 data points.
+
+---
+
+## Checklist: Is This a Good Reference Class?
+
+Use this to validate your choice:
+
+- [ ] **Sample size** ≥ 30 historical cases
+- [ ] **Homogeneity**: Members are similar enough that averaging makes sense
+- [ ] **Relevance**: Data is from appropriate time period (last 10 years preferred)
+- [ ] **Specificity**: Class is narrow enough to be meaningful
+- [ ] **Data availability**: Base rate is published or calculable
+- [ ] **No survivorship bias**: Includes failures, not just successes
+- [ ] **No cherry-picking**: Class chosen before looking at base rate
+- [ ] **Causal relevance**: Features included actually affect outcomes
+
+**If ≥ 6 checked:** Good reference class
+**If 4-5 checked:** Acceptable, but increase uncertainty
+**If < 4 checked:** Find a better reference class
+
+---
+
+## Advanced: Bayesian Reference Class Selection
+
+When you have multiple plausible reference classes, you can use Bayesian reasoning:
+
+### Step 1: Prior Distribution Over Classes
+Assign probability to each reference class being the "true" one
+
+**Example:**
+- P(Class = "B2B SaaS") = 60%
+- P(Class = "All SaaS") = 30%
+- P(Class = "All startups") = 10%
+
+### Step 2: Likelihood of Observed Features
+How likely is this specific case under each class?
+
+### Step 3: Posterior Distribution
+Update class probabilities using Bayes' rule
+
+### Step 4: Weighted Base Rate
+Average base rates weighted by posterior probabilities
+
+**This is advanced.** Default to the systematic selection method above unless you have strong quantitative skills.
+
+---
+
+## Practical Workflow
+
+### Quick Protocol (5 minutes)
+
+1. **Name the core type:** "This is a [X]"
+2. **Add 2-3 specificity layers:** Stage, category, geography
+3. **Google the base rate:** "[Reference class] success rate"
+4. **Sanity check:** Does N > 30? Are members similar?
+5. **Use it:** This is your starting probability
+
+### Rigorous Protocol (30 minutes)
+
+1. Systematic selection (Steps 1-4 above)
+2. Similarity scoring for validation
+3. Check for structural regime changes
+4. Consider multiple reference classes
+5. Weighted ensemble if multiple classes
+6. Document assumptions and limitations
+
+---
+
+**Return to:** [Main Skill](../SKILL.md#interactive-menu)