Files
2025-11-30 08:38:26 +08:00

15 KiB
Raw Permalink Blame History

Debiasing Techniques

A Practical Guide to Removing Bias from Forecasts


The Systematic Debiasing Process

The Four-Stage Framework:

  1. Recognition - Identify which biases are present, assess severity and direction
  2. Intervention - Apply structured methods (not willpower), make bias mathematically impossible
  3. Validation - Check if intervention worked, compare pre/post probabilities
  4. Institutionalization - Build into routine process, create checklists, track effectiveness

Key Principle: You cannot "try harder" to avoid bias. Biases are unconscious. You need systematic interventions.


Pre-Commitment Strategies

Definition: Locking in decision rules BEFORE seeing evidence, when you're still objective.

Why it works: Removes motivated reasoning by making updates automatic and mechanical.

Technique 1: Pre-Registered Update Rules

Before looking at evidence, write down:

  1. Current belief: "I believe X with Y% confidence"
  2. Update rule: "If I observe Z, I will update to W%"
  3. Decision criteria: "I will accept evidence Z as valid if it meets criteria Q"

Example - Election Forecast:

  • Current: "Candidate A has 60% chance of winning"
  • Update rule: "If next 3 polls show Candidate B ahead by >5%, I will update to 45%"
  • Criteria: "Polls must be rated B+ or higher by 538, sample size >800, conducted in last 7 days"

Prevents: Cherry-picking polls, moving goalposts, asymmetric evidence standards

Technique 2: Prediction Intervals with Triggers

Method: Set probability ranges that trigger re-evaluation.

Example - Startup Valuation:

If valuation announced is:
- <$50M  → Update P(success) from 40% → 25%
- $50M-$100M → Keep at 40%
- $100M-$200M → Update to 55%
- >$200M → Update to 70%

Lock this in before you know the actual valuation.

Prevents: Post-hoc rationalization, scope insensitivity, anchoring

Technique 3: Conditional Forecasting

Method: Make forecasts for different scenarios in advance.

Example - Product Launch:

  • "If launch delayed >2 months: P(success) = 30%"
  • "If launch on time: P(success) = 50%"
  • "If launch early: P(success) = 45%"

When scenario occurs, use pre-committed probability.

Prevents: Status quo bias, under-updating when conditions change


External Accountability

Why it works: Public predictions create reputational stake. Incentive to be accurate > incentive to be "right."

Technique 4: Forecasting Tournaments

Platforms: Good Judgment Open, Metaculus, Manifold Markets, PredictIt

How to use:

  1. Make 50-100 forecasts over 6 months
  2. Track your Brier score (lower = better)
  3. Review what you got wrong
  4. Adjust your process

Fixes: Overconfidence, confirmation bias, motivated reasoning

Technique 5: Public Prediction Logging

Method: Post forecasts publicly (Twitter, blog, Slack, email) before outcomes known.

Format:

Forecast: [Event]
Probability: X%
Date: [Today]
Resolution date: [When we'll know]
Reasoning: [2-3 sentences]

Prevents: Hindsight bias, selective memory, probability creep

Technique 6: Forecasting Partners

Method: Find a "prediction buddy" who reviews your forecasts.

Their job: Ask "Why this probability and not 10% higher/lower?", point out motivated reasoning, suggest alternative reference classes, track systematic biases

Prevents: Blind spots, soldier mindset, lazy reasoning


Algorithmic Aids

Research finding: Simple formulas outperform expert judgment. Formulas are consistent; humans are noisy.

Technique 7: Base Rate + Adjustment Formula

Final Probability = Base Rate + (Adjustment × Confidence)

Example - Startup Success:

  • Base rate: 10% (seed startups reach $10M revenue)
  • Specific evidence: Great team (+20%), weak market fit (-10%), strong competition (-5%) = Net +5%
  • Confidence in evidence: 0.5
  • Calculation: 10% + (5% × 0.5) = 12.5%

Prevents: Ignoring base rates, overweighting anecdotes, inconsistent weighting

Technique 8: Bayesian Update Calculator

Formula: Posterior Odds = Prior Odds × Likelihood Ratio

Example - Medical Test:

  • Prior: 1% have disease X, Test: 90% true positive, 10% false positive, You test positive
  • Prior odds: 1:99, Likelihood ratio: 0.9/0.1 = 9, Posterior odds: 9:99 = 1:11
  • Posterior probability: 1/12 = 8.3%

Lesson: Even with positive test, only 8.3% chance (low base rate dominates).

Tool: Use online Bayes calculator or spreadsheet template.

Technique 9: Ensemble Forecasting

Method: Use multiple methods, average results.

  1. Reference class forecasting → X₁%
  2. Inside view analysis → X₂%
  3. Extrapolation from trends → X₃%
  4. Expert consensus → X₄%
  5. Weighted average: 0.4×(Ref class) + 0.3×(Inside) + 0.2×(Trends) + 0.1×(Expert)

Prevents: Over-reliance on single method, blind spots, methodology bias


Consider-the-Opposite Technique

Core Question: "What would have to be true for the opposite outcome to occur?"

Technique 10: Steelman the Opposite View

Method:

  1. State your forecast: "70% probability Event X"
  2. Build STRONGEST case for opposite (don't strawman, steelman)
  3. Articulate so well that someone holding that view would agree
  4. Re-evaluate probability based on strength

Example - AGI Timeline:

  • Your view: "70% chance AGI by 2030"
  • Steelman: "Every previous AI timeline wrong by decades, current systems lack common sense, scaling may hit limits, regulatory slowdowns, no clear path from LLMs to reasoning, hardware constraints, reference class: major tech breakthroughs take 20-40 years"
  • Re-evaluation: "Hmm, that's strong. Updating to 45%."

Technique 11: Ideological Turing Test

Method: Write 200-word argument for opposite of your forecast. Show to someone who holds that view. Ask "Can you tell I don't believe this?"

If they can tell: You don't understand the opposing view If they can't tell: You've properly steelmanned it

Prevents: Strawmanning, tribalism, missing legitimate counter-arguments


Red Teaming and Devil's Advocate

Technique 12: Structured Red Team Review

Roles (60-min session):

  • Pessimist: "What's worst case?" (10 min)
  • Optimist: "Are we underestimating success?" (10 min)
  • Historian: Find historical analogies that contradict forecast (10 min)
  • Statistician: Check the math, CI width (10 min)
  • Devil's Advocate: Argue opposite conclusion (10 min)
  • Moderator (You): Listen without defending, take notes, update (15 min synthesis)

No rebuttals allowed - just listen.

Technique 13: Premortem + Pre-parade

Premortem: "It's 1 year from now, prediction was WRONG (too low). Why?" Pre-parade: "It's 1 year from now, prediction was WRONG (too high). Why?"

Method: Assume wrong in each direction, generate 5-10 plausible reasons. If BOTH lists are plausible → confidence too high, widen range.


Calibration Training

Calibration: When you say "70%", it happens 70% of the time.

Technique 14: Trivia-Based Calibration

Method: Answer 50 trivia questions with confidence levels. Group by confidence bucket. Calculate actual accuracy in each.

Example results:

Confidence # Questions Actual Accuracy
60-70% 12 50% (overconfident)
70-80% 15 60% (overconfident)
80-90% 8 75% (overconfident)

Fix: Lower confidence levels by 15-20 points. Repeat monthly until calibrated.

Resources: Calibrate.app, PredictionBook.com, Good Judgment Open

Technique 15: Confidence Interval Training

Exercise: Answer questions with 80% confidence intervals (population of Australia, year Eiffel Tower completed, Earth-Moon distance, etc.)

Your 80% CIs should capture true answer 80% of time.

Most people: Capture only 40-50% (too narrow = overconfident)

Training: Do 20 questions/week, track hit rate, widen intervals until you hit 80%


Keeping a Forecasting Journal

Problem: Memory unreliable - we remember hits, forget misses, unconsciously revise forecasts.

Solution: Written record

Technique 16: Structured Forecast Log

Format:

=== FORECAST #[Number] ===
Date: [YYYY-MM-DD]
Question: [Precise, falsifiable]
Resolution Date: [When we'll know]
Base Rate: [Reference class frequency]
My Probability: [X%]
Confidence Interval: [Lower - Upper]

REASONING:
- Reference class: [Which, why]
- Evidence for: [Bullets]
- Evidence against: [Bullets]
- Main uncertainty: [What could change]
- Biases checked: [Techniques used]

OUTCOME (fill later):
Actual: [Yes/No or value]
Brier Score: [Calculated]
What I learned: [Post-mortem]

Technique 17: Monthly Calibration Review

Process (last day of month):

  1. Review all resolved forecasts
  2. Calculate: Brier score, calibration plot, trend
  3. Identify patterns: Which forecast types wrong? Which biases recurring?
  4. Adjust process: Add techniques, adjust confidence levels
  5. Set goals: "Reduce Brier by 0.05", "Achieve 75-85% calibration on 80% forecasts"

Practicing on Low-Stakes Predictions

Problem: High-stakes forecasts bad for learning (emotional, rare, outcome bias).

Solution: Practice on low-stakes, fast-resolving questions.

Technique 18: Daily Micro-Forecasts

Method: Make 1-5 small predictions daily.

Examples: "Will it rain tomorrow?" (70%), "Email response <24h?" (80%), "Meeting on time?" (40%)

Benefits: Fast feedback (hours/days), low stakes, high volume (100+/month), rapid iteration

Track in spreadsheet, calculate rolling Brier score weekly.

Technique 19: Sports Forecasting Practice

Why sports: Clear resolution, abundant data, frequent events, low stakes, good reference classes

Method (weekly session):

  1. Pick 10 upcoming games
  2. Research: team records, head-to-head, injuries, home/away splits
  3. Make probability forecasts
  4. Compare to Vegas odds (well-calibrated baseline)
  5. Track accuracy

Goal: Get within 5% of Vegas odds consistently

Skills practiced: Reference class, Bayesian updating, regression to mean, base rate anchoring


Team Forecasting Protocols

Team problems: Groupthink, herding (anchor on first speaker), authority bias, social desirability

Technique 20: Independent Then Combine

Protocol:

  1. Independent (15 min): Each makes forecast individually, no discussion, submit to moderator
  2. Reveal (5 min): Show all anonymously, display range, calculate median
  3. Discussion (20 min): Outliers speak first, others respond, no splitting difference
  4. Re-forecast (10 min): Update independently, can stay at original
  5. Aggregate: Median of final forecasts = team forecast (or weighted by track record, or extremize median)

Prevents: Anchoring on first speaker, groupthink, authority bias

Technique 21: Delphi Method

Protocol: Multi-round expert elicitation

  • Round 1: Forecast independently, provide reasoning, submit anonymously
  • Round 2: See Round 1 summary (anonymized), read reasoning, revise if convinced
  • Round 3: See Round 2 summary, make final forecast
  • Final: Median of Round 3

Prevents: Loud voice dominance, social pressure, first-mover anchoring

Use when: High-stakes forecasts, diverse expert team

Technique 22: Red Team / Blue Team Split

Setup:

  • Blue Team: Argues forecast should be HIGH
  • Red Team: Argues forecast should be LOW
  • Gray Team: Judges and synthesizes

Process (75 min):

  1. Preparation (20 min): Each team finds evidence for their side
  2. Presentations (30 min): Blue presents (10), Red presents (10), Gray questions (10)
  3. Deliberation (15 min): Gray weighs evidence, makes forecast
  4. Debrief (10 min): All reconvene, discuss learning

Prevents: Confirmation bias, groupthink, missing arguments


Quick Reference: Technique Selection Guide

Overconfidence: → Calibration training (#14, #15), Premortem (#13) Confirmation Bias: → Consider-opposite (#10), Red teaming (#12), Steelman (#10) Anchoring: → Independent-then-combine (#20), Pre-commitment (#1), Ensemble (#9) Base Rate Neglect: → Base rate formula (#7), Reference class + adjustment (#7) Availability Bias: → Statistical lookup, Forecasting journal (#16) Motivated Reasoning: → Pre-commitment (#1, #2), Public predictions (#5), Tournaments (#4) Scope Insensitivity: → Algorithmic scaling (#7), Reference class by magnitude Status Quo Bias: → Pre-parade (#13), Consider-opposite (#10) Groupthink: → Independent-then-combine (#20), Delphi (#21), Red/Blue teams (#22)


Integration into Forecasting Workflow

Before forecast: Pre-commitment (#1, #2, #3), Independent (if team) (#20) During forecast: Algorithmic aids (#7, #8, #9), Consider-opposite (#10, #11) After initial: Red teaming (#12, #13), Calibration check (#14, #15) Before finalizing: Journal entry (#16), Public logging (#5) After resolution: Journal review (#17), Calibration analysis (#17) Ongoing practice: Micro-forecasts (#18), Sports (#19), Tournaments (#4)


The Minimum Viable Debiasing Process

If you only do THREE things:

1. Pre-commit to update rules - Write "If I see X, I'll update to Y" before evidence (prevents motivated reasoning) 2. Keep a forecasting journal - Log all forecasts with reasoning, review monthly (prevents hindsight bias) 3. Practice on low-stakes predictions - Make 5-10 micro-forecasts/week (reduces overconfidence)


Summary Table

Technique Bias Addressed Difficulty Time
Pre-registered updates (#1) Motivated reasoning Easy 5 min
Prediction intervals (#2) Scope insensitivity Easy 5 min
Conditional forecasting (#3) Status quo bias Medium 10 min
Tournaments (#4) Overconfidence Easy Ongoing
Public logging (#5) Hindsight bias Easy 2 min
Forecasting partners (#6) Blind spots Medium Ongoing
Base rate formula (#7) Base rate neglect Easy 3 min
Bayesian calculator (#8) Update errors Medium 5 min
Ensemble methods (#9) Method bias Medium 15 min
Steelman opposite (#10) Confirmation bias Medium 10 min
Turing test (#11) Tribalism Hard 20 min
Red team review (#12) Groupthink Hard 60 min
Premortem/Pre-parade (#13) Overconfidence Medium 15 min
Trivia calibration (#14) Overconfidence Easy 20 min
CI training (#15) Overconfidence Easy 15 min
Forecast journal (#16) Multiple Easy 5 min
Monthly review (#17) Calibration drift Medium 30 min
Micro-forecasts (#18) Overconfidence Easy 5 min/day
Sports practice (#19) Multiple Medium 30 min/week
Independent-then-combine (#20) Groupthink Medium 50 min
Delphi method (#21) Authority bias Hard 90 min
Red/Blue teams (#22) Confirmation bias Hard 75 min

Return to: Main Skill