Files
2025-11-30 08:38:26 +08:00

422 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Debiasing Techniques
## A Practical Guide to Removing Bias from Forecasts
---
## The Systematic Debiasing Process
**The Four-Stage Framework:**
1. **Recognition** - Identify which biases are present, assess severity and direction
2. **Intervention** - Apply structured methods (not willpower), make bias mathematically impossible
3. **Validation** - Check if intervention worked, compare pre/post probabilities
4. **Institutionalization** - Build into routine process, create checklists, track effectiveness
**Key Principle:** You cannot "try harder" to avoid bias. Biases are unconscious. You need **systematic interventions**.
---
## Pre-Commitment Strategies
**Definition:** Locking in decision rules BEFORE seeing evidence, when you're still objective.
**Why it works:** Removes motivated reasoning by making updates automatic and mechanical.
### Technique 1: Pre-Registered Update Rules
**Before looking at evidence, write down:**
1. Current belief: "I believe X with Y% confidence"
2. Update rule: "If I observe Z, I will update to W%"
3. Decision criteria: "I will accept evidence Z as valid if it meets criteria Q"
**Example - Election Forecast:**
- Current: "Candidate A has 60% chance of winning"
- Update rule: "If next 3 polls show Candidate B ahead by >5%, I will update to 45%"
- Criteria: "Polls must be rated B+ or higher by 538, sample size >800, conducted in last 7 days"
**Prevents:** Cherry-picking polls, moving goalposts, asymmetric evidence standards
### Technique 2: Prediction Intervals with Triggers
**Method:** Set probability ranges that trigger re-evaluation.
**Example - Startup Valuation:**
```
If valuation announced is:
- <$50M → Update P(success) from 40% → 25%
- $50M-$100M → Keep at 40%
- $100M-$200M → Update to 55%
- >$200M → Update to 70%
```
Lock this in before you know the actual valuation.
**Prevents:** Post-hoc rationalization, scope insensitivity, anchoring
### Technique 3: Conditional Forecasting
**Method:** Make forecasts for different scenarios in advance.
**Example - Product Launch:**
- "If launch delayed >2 months: P(success) = 30%"
- "If launch on time: P(success) = 50%"
- "If launch early: P(success) = 45%"
When scenario occurs, use pre-committed probability.
**Prevents:** Status quo bias, under-updating when conditions change
---
## External Accountability
**Why it works:** Public predictions create reputational stake. Incentive to be accurate > incentive to be "right."
### Technique 4: Forecasting Tournaments
**Platforms:** Good Judgment Open, Metaculus, Manifold Markets, PredictIt
**How to use:**
1. Make 50-100 forecasts over 6 months
2. Track your Brier score (lower = better)
3. Review what you got wrong
4. Adjust your process
**Fixes:** Overconfidence, confirmation bias, motivated reasoning
### Technique 5: Public Prediction Logging
**Method:** Post forecasts publicly (Twitter, blog, Slack, email) before outcomes known.
**Format:**
```
Forecast: [Event]
Probability: X%
Date: [Today]
Resolution date: [When we'll know]
Reasoning: [2-3 sentences]
```
**Prevents:** Hindsight bias, selective memory, probability creep
### Technique 6: Forecasting Partners
**Method:** Find a "prediction buddy" who reviews your forecasts.
**Their job:** Ask "Why this probability and not 10% higher/lower?", point out motivated reasoning, suggest alternative reference classes, track systematic biases
**Prevents:** Blind spots, soldier mindset, lazy reasoning
---
## Algorithmic Aids
**Research finding:** Simple formulas outperform expert judgment. Formulas are consistent; humans are noisy.
### Technique 7: Base Rate + Adjustment Formula
```
Final Probability = Base Rate + (Adjustment × Confidence)
```
**Example - Startup Success:**
- Base rate: 10% (seed startups reach $10M revenue)
- Specific evidence: Great team (+20%), weak market fit (-10%), strong competition (-5%) = Net +5%
- Confidence in evidence: 0.5
- Calculation: 10% + (5% × 0.5) = 12.5%
**Prevents:** Ignoring base rates, overweighting anecdotes, inconsistent weighting
### Technique 8: Bayesian Update Calculator
**Formula:** Posterior Odds = Prior Odds × Likelihood Ratio
**Example - Medical Test:**
- Prior: 1% have disease X, Test: 90% true positive, 10% false positive, You test positive
- Prior odds: 1:99, Likelihood ratio: 0.9/0.1 = 9, Posterior odds: 9:99 = 1:11
- Posterior probability: 1/12 = 8.3%
**Lesson:** Even with positive test, only 8.3% chance (low base rate dominates).
**Tool:** Use online Bayes calculator or spreadsheet template.
### Technique 9: Ensemble Forecasting
**Method:** Use multiple methods, average results.
1. Reference class forecasting → X₁%
2. Inside view analysis → X₂%
3. Extrapolation from trends → X₃%
4. Expert consensus → X₄%
5. Weighted average: 0.4×(Ref class) + 0.3×(Inside) + 0.2×(Trends) + 0.1×(Expert)
**Prevents:** Over-reliance on single method, blind spots, methodology bias
---
## Consider-the-Opposite Technique
**Core Question:** "What would have to be true for the opposite outcome to occur?"
### Technique 10: Steelman the Opposite View
**Method:**
1. State your forecast: "70% probability Event X"
2. Build STRONGEST case for opposite (don't strawman, steelman)
3. Articulate so well that someone holding that view would agree
4. Re-evaluate probability based on strength
**Example - AGI Timeline:**
- Your view: "70% chance AGI by 2030"
- Steelman: "Every previous AI timeline wrong by decades, current systems lack common sense, scaling may hit limits, regulatory slowdowns, no clear path from LLMs to reasoning, hardware constraints, reference class: major tech breakthroughs take 20-40 years"
- Re-evaluation: "Hmm, that's strong. Updating to 45%."
### Technique 11: Ideological Turing Test
**Method:** Write 200-word argument for opposite of your forecast. Show to someone who holds that view. Ask "Can you tell I don't believe this?"
**If they can tell:** You don't understand the opposing view
**If they can't tell:** You've properly steelmanned it
**Prevents:** Strawmanning, tribalism, missing legitimate counter-arguments
---
## Red Teaming and Devil's Advocate
### Technique 12: Structured Red Team Review
**Roles (60-min session):**
- **Pessimist:** "What's worst case?" (10 min)
- **Optimist:** "Are we underestimating success?" (10 min)
- **Historian:** Find historical analogies that contradict forecast (10 min)
- **Statistician:** Check the math, CI width (10 min)
- **Devil's Advocate:** Argue opposite conclusion (10 min)
- **Moderator (You):** Listen without defending, take notes, update (15 min synthesis)
**No rebuttals allowed - just listen.**
### Technique 13: Premortem + Pre-parade
**Premortem:** "It's 1 year from now, prediction was WRONG (too low). Why?"
**Pre-parade:** "It's 1 year from now, prediction was WRONG (too high). Why?"
**Method:** Assume wrong in each direction, generate 5-10 plausible reasons. If BOTH lists are plausible → confidence too high, widen range.
---
## Calibration Training
**Calibration:** When you say "70%", it happens 70% of the time.
### Technique 14: Trivia-Based Calibration
**Method:** Answer 50 trivia questions with confidence levels. Group by confidence bucket. Calculate actual accuracy in each.
**Example results:**
| Confidence | # Questions | Actual Accuracy |
|------------|-------------|-----------------|
| 60-70% | 12 | 50% (overconfident) |
| 70-80% | 15 | 60% (overconfident) |
| 80-90% | 8 | 75% (overconfident) |
**Fix:** Lower confidence levels by 15-20 points. Repeat monthly until calibrated.
**Resources:** Calibrate.app, PredictionBook.com, Good Judgment Open
### Technique 15: Confidence Interval Training
**Exercise:** Answer questions with 80% confidence intervals (population of Australia, year Eiffel Tower completed, Earth-Moon distance, etc.)
**Your 80% CIs should capture true answer 80% of time.**
**Most people:** Capture only 40-50% (too narrow = overconfident)
**Training:** Do 20 questions/week, track hit rate, widen intervals until you hit 80%
---
## Keeping a Forecasting Journal
**Problem:** Memory unreliable - we remember hits, forget misses, unconsciously revise forecasts.
**Solution:** Written record
### Technique 16: Structured Forecast Log
**Format:**
```
=== FORECAST #[Number] ===
Date: [YYYY-MM-DD]
Question: [Precise, falsifiable]
Resolution Date: [When we'll know]
Base Rate: [Reference class frequency]
My Probability: [X%]
Confidence Interval: [Lower - Upper]
REASONING:
- Reference class: [Which, why]
- Evidence for: [Bullets]
- Evidence against: [Bullets]
- Main uncertainty: [What could change]
- Biases checked: [Techniques used]
OUTCOME (fill later):
Actual: [Yes/No or value]
Brier Score: [Calculated]
What I learned: [Post-mortem]
```
### Technique 17: Monthly Calibration Review
**Process (last day of month):**
1. Review all resolved forecasts
2. Calculate: Brier score, calibration plot, trend
3. Identify patterns: Which forecast types wrong? Which biases recurring?
4. Adjust process: Add techniques, adjust confidence levels
5. Set goals: "Reduce Brier by 0.05", "Achieve 75-85% calibration on 80% forecasts"
---
## Practicing on Low-Stakes Predictions
**Problem:** High-stakes forecasts bad for learning (emotional, rare, outcome bias).
**Solution:** Practice on low-stakes, fast-resolving questions.
### Technique 18: Daily Micro-Forecasts
**Method:** Make 1-5 small predictions daily.
**Examples:** "Will it rain tomorrow?" (70%), "Email response <24h?" (80%), "Meeting on time?" (40%)
**Benefits:** Fast feedback (hours/days), low stakes, high volume (100+/month), rapid iteration
**Track in spreadsheet, calculate rolling Brier score weekly.**
### Technique 19: Sports Forecasting Practice
**Why sports:** Clear resolution, abundant data, frequent events, low stakes, good reference classes
**Method (weekly session):**
1. Pick 10 upcoming games
2. Research: team records, head-to-head, injuries, home/away splits
3. Make probability forecasts
4. Compare to Vegas odds (well-calibrated baseline)
5. Track accuracy
**Goal:** Get within 5% of Vegas odds consistently
**Skills practiced:** Reference class, Bayesian updating, regression to mean, base rate anchoring
---
## Team Forecasting Protocols
**Team problems:** Groupthink, herding (anchor on first speaker), authority bias, social desirability
### Technique 20: Independent Then Combine
**Protocol:**
1. **Independent (15 min):** Each makes forecast individually, no discussion, submit to moderator
2. **Reveal (5 min):** Show all anonymously, display range, calculate median
3. **Discussion (20 min):** Outliers speak first, others respond, no splitting difference
4. **Re-forecast (10 min):** Update independently, can stay at original
5. **Aggregate:** Median of final forecasts = team forecast (or weighted by track record, or extremize median)
**Prevents:** Anchoring on first speaker, groupthink, authority bias
### Technique 21: Delphi Method
**Protocol:** Multi-round expert elicitation
- **Round 1:** Forecast independently, provide reasoning, submit anonymously
- **Round 2:** See Round 1 summary (anonymized), read reasoning, revise if convinced
- **Round 3:** See Round 2 summary, make final forecast
- **Final:** Median of Round 3
**Prevents:** Loud voice dominance, social pressure, first-mover anchoring
**Use when:** High-stakes forecasts, diverse expert team
### Technique 22: Red Team / Blue Team Split
**Setup:**
- **Blue Team:** Argues forecast should be HIGH
- **Red Team:** Argues forecast should be LOW
- **Gray Team:** Judges and synthesizes
**Process (75 min):**
1. Preparation (20 min): Each team finds evidence for their side
2. Presentations (30 min): Blue presents (10), Red presents (10), Gray questions (10)
3. Deliberation (15 min): Gray weighs evidence, makes forecast
4. Debrief (10 min): All reconvene, discuss learning
**Prevents:** Confirmation bias, groupthink, missing arguments
---
## Quick Reference: Technique Selection Guide
**Overconfidence:** → Calibration training (#14, #15), Premortem (#13)
**Confirmation Bias:** → Consider-opposite (#10), Red teaming (#12), Steelman (#10)
**Anchoring:** → Independent-then-combine (#20), Pre-commitment (#1), Ensemble (#9)
**Base Rate Neglect:** → Base rate formula (#7), Reference class + adjustment (#7)
**Availability Bias:** → Statistical lookup, Forecasting journal (#16)
**Motivated Reasoning:** → Pre-commitment (#1, #2), Public predictions (#5), Tournaments (#4)
**Scope Insensitivity:** → Algorithmic scaling (#7), Reference class by magnitude
**Status Quo Bias:** → Pre-parade (#13), Consider-opposite (#10)
**Groupthink:** → Independent-then-combine (#20), Delphi (#21), Red/Blue teams (#22)
---
## Integration into Forecasting Workflow
**Before forecast:** Pre-commitment (#1, #2, #3), Independent (if team) (#20)
**During forecast:** Algorithmic aids (#7, #8, #9), Consider-opposite (#10, #11)
**After initial:** Red teaming (#12, #13), Calibration check (#14, #15)
**Before finalizing:** Journal entry (#16), Public logging (#5)
**After resolution:** Journal review (#17), Calibration analysis (#17)
**Ongoing practice:** Micro-forecasts (#18), Sports (#19), Tournaments (#4)
---
## The Minimum Viable Debiasing Process
**If you only do THREE things:**
**1. Pre-commit to update rules** - Write "If I see X, I'll update to Y" before evidence (prevents motivated reasoning)
**2. Keep a forecasting journal** - Log all forecasts with reasoning, review monthly (prevents hindsight bias)
**3. Practice on low-stakes predictions** - Make 5-10 micro-forecasts/week (reduces overconfidence)
---
## Summary Table
| Technique | Bias Addressed | Difficulty | Time |
|-----------|----------------|------------|------|
| Pre-registered updates (#1) | Motivated reasoning | Easy | 5 min |
| Prediction intervals (#2) | Scope insensitivity | Easy | 5 min |
| Conditional forecasting (#3) | Status quo bias | Medium | 10 min |
| Tournaments (#4) | Overconfidence | Easy | Ongoing |
| Public logging (#5) | Hindsight bias | Easy | 2 min |
| Forecasting partners (#6) | Blind spots | Medium | Ongoing |
| Base rate formula (#7) | Base rate neglect | Easy | 3 min |
| Bayesian calculator (#8) | Update errors | Medium | 5 min |
| Ensemble methods (#9) | Method bias | Medium | 15 min |
| Steelman opposite (#10) | Confirmation bias | Medium | 10 min |
| Turing test (#11) | Tribalism | Hard | 20 min |
| Red team review (#12) | Groupthink | Hard | 60 min |
| Premortem/Pre-parade (#13) | Overconfidence | Medium | 15 min |
| Trivia calibration (#14) | Overconfidence | Easy | 20 min |
| CI training (#15) | Overconfidence | Easy | 15 min |
| Forecast journal (#16) | Multiple | Easy | 5 min |
| Monthly review (#17) | Calibration drift | Medium | 30 min |
| Micro-forecasts (#18) | Overconfidence | Easy | 5 min/day |
| Sports practice (#19) | Multiple | Medium | 30 min/week |
| Independent-then-combine (#20) | Groupthink | Medium | 50 min |
| Delphi method (#21) | Authority bias | Hard | 90 min |
| Red/Blue teams (#22) | Confirmation bias | Hard | 75 min |
---
**Return to:** [Main Skill](../SKILL.md#interactive-menu)