Initial commit
This commit is contained in:
393
skills/market-mechanics-betting/resources/betting-theory.md
Normal file
393
skills/market-mechanics-betting/resources/betting-theory.md
Normal file
@@ -0,0 +1,393 @@
|
||||
# Betting Theory Fundamentals
|
||||
|
||||
This resource explains the core theoretical foundations of rational betting, expected value, variance management, and market efficiency.
|
||||
|
||||
**Foundation for:** All betting and forecasting decisions
|
||||
|
||||
---
|
||||
|
||||
## Why Learn Betting Theory
|
||||
|
||||
**Core insight:** Betting theory separates decision quality from outcome quality. Make +EV decisions repeatedly and survive variance.
|
||||
|
||||
**Enables:**
|
||||
- Think probabilistically (convert beliefs to quantifiable edges)
|
||||
- Manage risk rationally (distinguish bad decisions from bad outcomes)
|
||||
- Avoid costly mistakes (identify predictable failure modes)
|
||||
- Optimize long-term growth (balance aggression with preservation)
|
||||
|
||||
**Research foundation:** Kelly (1956), Samuelson (1963), Thorp (1969), behavioral economics (Kahneman & Tversky), market efficiency (Fama).
|
||||
|
||||
---
|
||||
|
||||
## 1. Expected Value Framework
|
||||
|
||||
### Definition and Formula
|
||||
|
||||
**Expected Value (EV):** Probability-weighted average of all possible outcomes.
|
||||
|
||||
```
|
||||
EV = Σ(Probability × Outcome)
|
||||
|
||||
Binary bet:
|
||||
EV = (P_win × Amount_won) - (P_lose × Amount_lost)
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Bet $100 on 60% event at even odds (+100)
|
||||
EV = (0.60 × $100) - (0.40 × $100) = $20
|
||||
EV% = +20% per $100 wagered
|
||||
```
|
||||
|
||||
### Positive vs Negative EV
|
||||
|
||||
**Decision Framework:**
|
||||
- **EV > +5%:** Strong bet (after fees/uncertainty)
|
||||
- **EV = 0% to +5%:** Marginal (consider passing)
|
||||
- **EV < 0%:** Never bet (unless hedging)
|
||||
|
||||
**Critical Rule:** Judge decisions by EV, not outcomes. Good decisions lose sometimes; bad decisions win sometimes. Process matters in small samples, results matter over 100+ trials.
|
||||
|
||||
### Converting Market Odds to EV
|
||||
|
||||
**Step 1: Implied probability**
|
||||
```
|
||||
Decimal odds: P = 1 / Odds
|
||||
Example: 1.67 → 60%
|
||||
|
||||
American (+): P = 100 / (Odds + 100)
|
||||
Example: +150 → 40%
|
||||
|
||||
American (-): P = |Odds| / (|Odds| + 100)
|
||||
Example: -150 → 60%
|
||||
```
|
||||
|
||||
**Step 2: Calculate edge**
|
||||
```
|
||||
Your probability: 70%
|
||||
Market probability: 60%
|
||||
Edge = 70% - 60% = +10%
|
||||
```
|
||||
|
||||
**Step 3: Calculate EV**
|
||||
```
|
||||
Bet $100 at 1.67 odds:
|
||||
EV = (0.70 × $67) - (0.30 × $100) = +$16.90 = +16.9%
|
||||
```
|
||||
|
||||
### Law of Large Numbers
|
||||
|
||||
**Key principle:** Observed frequency converges to true probability as sample size increases.
|
||||
|
||||
**Practical thresholds:**
|
||||
- 10 bets: High variance, might be down despite +EV
|
||||
- 100 bets: Convergence starting, likely near EV
|
||||
- 1000 bets: Results tightly centered around EV
|
||||
|
||||
**Application:** Don't judge strategy on <30 trials. Variance dominates small samples.
|
||||
|
||||
---
|
||||
|
||||
## 2. Variance and Risk
|
||||
|
||||
### Standard Deviation
|
||||
|
||||
**Measures outcome dispersion around EV.**
|
||||
|
||||
**Formula:**
|
||||
```
|
||||
σ = √(P_win×(Win-EV)² + P_lose×(Loss-EV)²)
|
||||
```
|
||||
|
||||
**Example ($100 bet, 60% win, even odds):**
|
||||
```
|
||||
EV = $20
|
||||
σ = √(0.60×(100-20)² + 0.40×(-100-20)²)
|
||||
σ = √9600 = $98
|
||||
|
||||
Coefficient of Variation: σ/EV = $98/$20 = 4.9
|
||||
```
|
||||
|
||||
**Interpretation:** Standard deviation ($98) is 5× the EV ($20). Variance dominates signal.
|
||||
|
||||
### Volatility Categories
|
||||
|
||||
**Coefficient of Variation (CV = σ/EV):**
|
||||
- CV < 1: Low volatility (10-30 trials to see EV)
|
||||
- CV = 1-3: Moderate (30-50 trials)
|
||||
- CV = 3-10: High (50-100 trials)
|
||||
- CV > 10: Extreme (100+ trials)
|
||||
|
||||
**Higher CV requires:** Larger bankroll, more patience, stronger discipline.
|
||||
|
||||
### Risk of Ruin
|
||||
|
||||
**Probability of losing entire bankroll before profit.**
|
||||
|
||||
**Practical Guidelines:**
|
||||
|
||||
| Bet Size | Risk of Ruin | Assessment |
|
||||
|----------|--------------|------------|
|
||||
| 50% of bankroll | ~40% | Reckless |
|
||||
| 25% of bankroll | ~20% | Aggressive |
|
||||
| 10% of bankroll | ~5% | Moderate |
|
||||
| 5% of bankroll | ~1% | Conservative |
|
||||
| 2% of bankroll | ~0.1% | Very conservative |
|
||||
|
||||
**Kelly Criterion naturally manages risk of ruin. Never bet >10% of bankroll on single bet.**
|
||||
|
||||
### Managing Volatility
|
||||
|
||||
**1. Fractional Kelly (Primary Tool):**
|
||||
- Full Kelly: 100% variance, 40%+ drawdowns
|
||||
- Half Kelly: 25% variance, ~20% drawdowns
|
||||
- Quarter Kelly: 6% variance, ~10% drawdowns
|
||||
|
||||
**2. Diversification:**
|
||||
- Multiple uncorrelated +EV bets
|
||||
- Requires independence (correlation < 0.3)
|
||||
|
||||
**3. Expected Drawdown:**
|
||||
- Even optimal betting experiences 20-40% drawdowns
|
||||
- Mentally prepare for temporary losses
|
||||
- Don't confuse drawdown with -EV strategy
|
||||
|
||||
---
|
||||
|
||||
## 3. Bankroll Management
|
||||
|
||||
### Defining Your Bankroll
|
||||
|
||||
**Valid:** Money you can afford to lose entirely, separate from emergency fund, investment portfolio, daily expenses. **Starting:** $500-$5000 recreational, $10,000+ serious.
|
||||
|
||||
**NOT valid:** Money needed for bills, emergency fund, retirement, money you'd be devastated to lose.
|
||||
|
||||
### Separation Principle
|
||||
|
||||
**Why:** Prevents scared money and revenge betting. Clear accounting, tax clarity, risk containment.
|
||||
|
||||
**Implementation:** Separate betting account, never add money mid-downswing, withdraw profits periodically, stop if bankroll → $0.
|
||||
|
||||
### Growth vs Preservation
|
||||
|
||||
**Preservation (Default):** 1/4 to 1/2 Kelly, for most bettors and bankrolls <$5000
|
||||
**Growth (Advanced):** 1/2 to full Kelly, for large bankrolls and high variance tolerance (requires 2+ years track record)
|
||||
|
||||
### Dynamic Sizing
|
||||
|
||||
Bet size scales with bankroll. Example: $1000 bankroll at 5% = $50. After wins → $1500 → bet $75. After losses → $600 → bet $30.
|
||||
|
||||
**Recalculate:** Daily if >20% change, weekly (active), monthly (casual).
|
||||
|
||||
### Withdrawal Strategy
|
||||
|
||||
**Recommended:** When bankroll doubles, withdraw original amount, continue with profit (break-even if lose profit).
|
||||
**Conservative:** 50% profit monthly. **Aggressive:** Never withdraw (full compounding).
|
||||
|
||||
---
|
||||
|
||||
## 4. Market Efficiency
|
||||
|
||||
### Efficient Market Hypothesis
|
||||
|
||||
**Core claim:** Prices reflect all available information. **Reality:** Semi-strong efficient in liquid, mature markets.
|
||||
|
||||
**Market knows:** Published polls/news, historical base rates, expert commentary, obvious statistical patterns.
|
||||
|
||||
### Where Edges Exist
|
||||
|
||||
**1. Information Asymmetry:** Local knowledge, domain expertise
|
||||
**2. Model Superiority:** Better statistical model, proper extremizing
|
||||
**3. Lower Transaction Costs:** Market 5% fee vs your 0-1%
|
||||
**4. Behavioral Biases:** Recency bias, base rate neglect, narrative following
|
||||
**5. Market Immaturity:** Low liquidity, niche topics, few informed traders
|
||||
|
||||
**Before betting, ask:** "What information or model do I have that the market doesn't?"
|
||||
- Nothing → Pass | Vague → Pass | Specific → Investigate
|
||||
|
||||
### Trust vs Question Market
|
||||
|
||||
**Trust:** Liquid, mature, objective outcome, many informed participants, low emotion
|
||||
**Question:** Illiquid, new, subjective outcome, few informed participants, high emotion (politics, fandom)
|
||||
|
||||
---
|
||||
|
||||
## 5. Common Betting Mistakes
|
||||
|
||||
### Chasing Losses
|
||||
**What:** Increasing bet size after losses. **Why:** Loss aversion, emotional arousal.
|
||||
**Fix:** Never increase bet size after loss, use bankroll %, take break after 2+ losses.
|
||||
|
||||
### Tilt (Emotional Betting)
|
||||
**Triggers:** Bad beat, streaks, external stress. **Symptoms:** No analysis, ignoring Kelly, revenge betting.
|
||||
**Fix:** Pre-commit no bets when tilted. Checklist: Calm? Calculate EV? Kelly sizing? Betting for +EV not revenge?
|
||||
|
||||
### Overconfidence Bias
|
||||
**What:** Overestimating probability accuracy (90% when true is 70%).
|
||||
**Fix:** Track calibration, log predictions + outcomes, calculate curve quarterly. Do 70% predictions happen 70%?
|
||||
|
||||
### Ignoring Variance
|
||||
**What:** Judging strategy on <30 trials. Example: "Down 15% after 20 bets, strategy sucks" (normal variance).
|
||||
**Fix:** Require 50+ bets minimum, 100+ preferred, 200+ for high confidence.
|
||||
|
||||
### Outcome Bias
|
||||
**What:** Judging by results not process. +15% EV lost = good decision (bad outcome). -10% EV won = bad decision (lucky).
|
||||
**Fix:** Checklist: EV correct? Edge > threshold? Kelly fraction? Followed system? YES = good decision regardless of outcome.
|
||||
|
||||
### Hindsight Bias
|
||||
**What:** After outcome, "I knew it would happen."
|
||||
**Fix:** Pre-commit logging, write probability before event, don't revise after, accept 40% events happen 40%.
|
||||
|
||||
---
|
||||
|
||||
## 6. Integration with Kelly Criterion
|
||||
|
||||
### EV Drives Kelly
|
||||
|
||||
**Kelly derives from:** Expected value (edge), odds received, bankroll optimization (maximize log wealth).
|
||||
|
||||
**Key relationship:** `f* = (bp - q) / b`. Edge drives bet size: 10% edge → ~10% Kelly, 5% edge → ~5% Kelly, 0% edge → 0% bet.
|
||||
|
||||
### Variance Tolerance
|
||||
|
||||
| Fraction | Variance | Growth | Drawdown |
|
||||
|----------|----------|--------|----------|
|
||||
| Full (1.0) | 100% | 100% | ~40% |
|
||||
| Half (0.5) | 25% | 75% | ~20% |
|
||||
| Quarter (0.25) | 6% | 50% | ~10% |
|
||||
|
||||
### Bankruptcy Protection
|
||||
|
||||
Kelly never bets 100%: prevents ruin, keeps capital for next bet, scales down as bankroll shrinks. **Practical:** Stop if bankroll drops 80-90%.
|
||||
|
||||
---
|
||||
|
||||
## 7. Practical Examples for Forecasters
|
||||
|
||||
### Example 1: Election Prediction Market
|
||||
|
||||
**Scenario:** Market 55%, your forecast 65%, bankroll $2000
|
||||
|
||||
**Step 1: Edge**
|
||||
```
|
||||
Edge = 65% - 55% = +10%
|
||||
Threshold: 5%
|
||||
Decision: +10% > 5% → Proceed
|
||||
```
|
||||
|
||||
**Step 2: EV**
|
||||
```
|
||||
Bet $100 at 1.82 odds → Win $82
|
||||
EV = (0.65 × $82) - (0.35 × $100) = +$18.30 = +18.3%
|
||||
```
|
||||
|
||||
**Step 3: Kelly**
|
||||
```
|
||||
Full Kelly: 22.3%
|
||||
Half Kelly: 11.2%
|
||||
Bet: $2000 × 11.2% = $224
|
||||
```
|
||||
|
||||
### Example 2: Brier Score Tracking
|
||||
|
||||
**50 forecasts, goal: Brier < 0.15**
|
||||
|
||||
| Forecast | Your P | Outcome | (P-O)² |
|
||||
|----------|--------|---------|--------|
|
||||
| Event A | 80% | YES (1) | 0.04 |
|
||||
| Event B | 30% | NO (0) | 0.09 |
|
||||
| Event C | 90% | YES (1) | 0.01 |
|
||||
| Event D | 60% | NO (0) | 0.36 |
|
||||
| Event E | 70% | YES (1) | 0.09 |
|
||||
|
||||
**Brier:** 0.59 / 5 = 0.118 (Excellent)
|
||||
|
||||
**Analysis:** Event D large error normal (40% events happen). Don't game metric by avoiding 60% predictions.
|
||||
|
||||
### Example 3: Extremizing
|
||||
|
||||
**Forecasts:** You 72%, A 68%, B 75%, C 70%, Market 71%
|
||||
**Average:** 71.2%
|
||||
|
||||
**Extremize:**
|
||||
```
|
||||
Factor: 1.3 (moderate)
|
||||
Extremized = 50% + (71.2% - 50%) × 1.3 = 77.6% ≈ 78%
|
||||
|
||||
Edge: 78% - 71% = +7%
|
||||
Half Kelly ≈ 3.5% of $5000 = $175 bet
|
||||
```
|
||||
|
||||
### Example 4: Correlated Portfolio
|
||||
|
||||
**Scenario:** Democrats House (60% yours, 55% market) + Senate (55% yours, 50% market)
|
||||
**Correlation:** 0.7 (high)
|
||||
|
||||
**Naive (WRONG):**
|
||||
```
|
||||
Bet A: 5% × $10k = $500
|
||||
Bet B: 5% × $10k = $500
|
||||
Total: $1000 (10%)
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```
|
||||
Adjust for correlation: 1 - (0.7 × 0.5) = 0.65
|
||||
Bet A: $500 × 0.65 = $325
|
||||
Bet B: $500 × 0.65 = $325
|
||||
Total: $650 (6.5%)
|
||||
```
|
||||
|
||||
**Reasoning:** Positive correlation amplifies risk. Reduce sizing to maintain tolerance.
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
### The 10 Commandments
|
||||
|
||||
1. **Expected Value is King** - Judge decisions by EV, not outcomes
|
||||
2. **Variance is Inevitable** - Embrace it; don't fight it
|
||||
3. **Bankroll is Sacred** - Protect it above all else
|
||||
4. **Kelly is Your Guide** - But use fractional (1/4 to 1/2)
|
||||
5. **Market is Usually Right** - You need edge to beat it
|
||||
6. **Discipline Over Impulse** - System beats emotion
|
||||
7. **Sample Size Matters** - 50+ bets before judgment
|
||||
8. **Calibration is Honesty** - Track it religiously
|
||||
9. **Correlations Kill** - Adjust for portfolio risk
|
||||
10. **Survival Enables Profit** - Can't win if bankrupt
|
||||
|
||||
### Mental Models
|
||||
|
||||
**Betting = Business**
|
||||
- Bankroll = Working capital
|
||||
- EV = Profit margin
|
||||
- Variance = Market volatility
|
||||
- Kelly = Capital allocation
|
||||
|
||||
**Decision Quality ≠ Outcome Quality**
|
||||
- Good decisions lose sometimes (variance)
|
||||
- Bad decisions win sometimes (luck)
|
||||
- Process > Results (small samples)
|
||||
- Results > Process (large samples 100+)
|
||||
|
||||
### Integration Workflow
|
||||
|
||||
**Before betting:**
|
||||
1. Make forecast (Bayesian, reference class)
|
||||
2. Calculate edge vs market
|
||||
3. Check edge > threshold (5%+)
|
||||
4. Use Kelly for sizing
|
||||
5. Execute and log
|
||||
|
||||
**After betting:**
|
||||
1. Track outcome
|
||||
2. Update calibration
|
||||
3. Calculate Brier score
|
||||
4. Don't judge single bet
|
||||
5. Evaluate after 50+ bets
|
||||
|
||||
---
|
||||
|
||||
**Return to:** [Main Skill](../SKILL.md#interactive-menu)
|
||||
494
skills/market-mechanics-betting/resources/kelly-criterion.md
Normal file
494
skills/market-mechanics-betting/resources/kelly-criterion.md
Normal file
@@ -0,0 +1,494 @@
|
||||
# Kelly Criterion Deep Dive
|
||||
|
||||
Mathematical foundation for optimal bet sizing under uncertainty.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Mathematical Derivation](#1-mathematical-derivation)
|
||||
2. [Formula Variations](#2-formula-variations)
|
||||
3. [Fractional Kelly](#3-fractional-kelly)
|
||||
4. [Extensions](#4-extensions)
|
||||
5. [Common Mistakes](#5-common-mistakes)
|
||||
6. [Practical Implementation](#6-practical-implementation)
|
||||
7. [Historical Examples](#7-historical-examples)
|
||||
8. [Comparison to Other Methods](#8-comparison-to-other-methods)
|
||||
|
||||
---
|
||||
|
||||
## 1. Mathematical Derivation
|
||||
|
||||
### The Core Question
|
||||
|
||||
**Problem**: What fraction of your bankroll maximizes long-term growth?
|
||||
|
||||
**Why it matters**: Bet too little → Leave money on the table. Bet too much → Risk ruin, high variance.
|
||||
|
||||
### Logarithmic Utility Framework
|
||||
|
||||
**Key insight**: Maximize expected logarithm of wealth, not expected wealth.
|
||||
|
||||
**Why log utility?**
|
||||
- Captures diminishing marginal utility ($1 matters more when you have $100 vs $1M)
|
||||
- Makes repeated multiplicative bets additive: log(AB) = log(A) + log(B)
|
||||
- Geometric mean emerges naturally (what matters for repeated bets)
|
||||
- Prevents betting 100% (avoids ruin)
|
||||
|
||||
### Derivation for Binary Bet
|
||||
|
||||
**Setup**:
|
||||
- Current bankroll: W
|
||||
- Bet fraction: f
|
||||
- Win probability: p, Loss probability: q = 1 - p
|
||||
- Net odds: b (bet $1, win $b net)
|
||||
|
||||
**Outcomes**:
|
||||
- Win (probability p): New wealth = W(1 + fb)
|
||||
- Lose (probability q): New wealth = W(1 - f)
|
||||
|
||||
**Expected log utility**:
|
||||
```
|
||||
E[log(W_new)] = p × log(1 + fb) + q × log(1 - f) + log(W)
|
||||
```
|
||||
|
||||
**Objective**: Maximize g(f) = p × log(1 + fb) + q × log(1 - f)
|
||||
|
||||
### Finding the Optimum
|
||||
|
||||
**Take derivative**:
|
||||
```
|
||||
dg/df = pb/(1 + fb) - q/(1 - f)
|
||||
```
|
||||
|
||||
**Set equal to zero and solve**:
|
||||
```
|
||||
pb/(1 + fb) = q/(1 - f)
|
||||
pb(1 - f) = q(1 + fb)
|
||||
pb - pbf = q + qfb
|
||||
pb - q = f(pb + qb) = fb(p + q) = fb
|
||||
|
||||
f* = (pb - q) / b = (bp - q) / b
|
||||
```
|
||||
|
||||
**The Kelly Criterion**:
|
||||
```
|
||||
f* = (bp - q) / b
|
||||
|
||||
Where:
|
||||
f* = Optimal fraction to bet
|
||||
b = Net odds received
|
||||
p = Win probability
|
||||
q = 1 - p
|
||||
```
|
||||
|
||||
### Alternative Form
|
||||
|
||||
**Edge** = Expected return per dollar bet = bp - q
|
||||
|
||||
**Kelly formula**: f* = Edge / Odds = (bp - q) / b
|
||||
|
||||
**Example**: p = 60%, b = 1.0 (even money)
|
||||
- Edge = 0.6 × 1 - 0.4 = 0.2
|
||||
- f* = 0.2 / 1 = 20%
|
||||
|
||||
### Optimality
|
||||
|
||||
**Second derivative**: d²g/df² < 0 at f = f* → Maximum confirmed
|
||||
|
||||
**Growth rate**: G(f*) maximizes long-run geometric growth
|
||||
|
||||
**Comparison**:
|
||||
- f < f*: Lower growth (too conservative)
|
||||
- f > f*: Lower growth (too aggressive, variance dominates)
|
||||
- f > 2f*: Negative growth (eventual ruin)
|
||||
|
||||
---
|
||||
|
||||
## 2. Formula Variations
|
||||
|
||||
### Converting Market Odds
|
||||
|
||||
**Decimal odds** (e.g., 2.50): b = Decimal - 1 = 1.50
|
||||
|
||||
**American odds**:
|
||||
- Positive (+150): b = 150/100 = 1.50
|
||||
- Negative (-150): b = 100/150 = 0.667
|
||||
|
||||
**Fractional odds** (3/1): b = 3.0
|
||||
|
||||
**Implied probability**: Market p = 1/(b + 1)
|
||||
|
||||
### Multi-Outcome Bet
|
||||
|
||||
**Horse race**: Multiple options, bet on any with positive Kelly
|
||||
|
||||
**Formula for outcome i**:
|
||||
```
|
||||
f_i* = (p_i(b_i + 1) - 1) / b_i
|
||||
|
||||
If f_i* > 0: Bet f_i* on outcome i
|
||||
If f_i* ≤ 0: Don't bet
|
||||
```
|
||||
|
||||
### Continuous Outcomes (Merton's Formula)
|
||||
|
||||
**Stock market application**:
|
||||
```
|
||||
f* = μ / σ²
|
||||
|
||||
Where:
|
||||
μ = Expected return (drift)
|
||||
σ² = Variance of returns
|
||||
```
|
||||
|
||||
**Example**: μ = 8%, σ = 20% → f* = 0.08/0.04 = 2.0 (200%, use leverage)
|
||||
|
||||
**Reality**: Too aggressive, use fractional Kelly → 50-100% more reasonable
|
||||
|
||||
---
|
||||
|
||||
## 3. Fractional Kelly
|
||||
|
||||
### Why Fractional Kelly?
|
||||
|
||||
**Problems with full Kelly**:
|
||||
1. **Extreme volatility**: Wild swings, can lose 50%+ in bad runs
|
||||
2. **Model error**: If probability estimate wrong, full Kelly overbets dramatically
|
||||
3. **Practical ruin**: 20% chance of 50% drawdown before doubling
|
||||
4. **Non-ergodic**: Most can't bet infinitely many times
|
||||
|
||||
### Formula
|
||||
|
||||
```
|
||||
Fractional Kelly = f* × Fraction
|
||||
|
||||
Common choices:
|
||||
- Half Kelly: f*/2
|
||||
- Quarter Kelly: f*/4 (recommended)
|
||||
- Third Kelly: f*/3
|
||||
```
|
||||
|
||||
### Growth vs. Variance Trade-off
|
||||
|
||||
| Strategy | Growth Rate | Volatility | Max Drawdown |
|
||||
|----------|-------------|------------|--------------|
|
||||
| Full Kelly | 100% | 100% | -50% |
|
||||
| Half Kelly | ~75% | 50% | -25% |
|
||||
| Quarter Kelly | ~50% | 25% | -12% |
|
||||
|
||||
**Key**: Half Kelly gives 75% of growth with 25% of variance → Better risk-adjusted return
|
||||
|
||||
### Robustness to Error
|
||||
|
||||
**Example**: You think p = 0.60, true p = 0.55, even money bet
|
||||
|
||||
**Full Kelly** (f = 20%):
|
||||
- Growth rate = 0.55×log(1.20) + 0.45×log(0.80) ≈ 0 (breakeven!)
|
||||
|
||||
**Half Kelly** (f = 10%):
|
||||
- Growth rate = 0.55×log(1.10) + 0.45×log(0.90) ≈ 0.005 (still positive)
|
||||
|
||||
**Lesson**: Overbetting much worse than underbetting. Fractional Kelly provides buffer.
|
||||
|
||||
### Recommended Fractions
|
||||
|
||||
| Situation | Fraction | Reasoning |
|
||||
|-----------|----------|-----------|
|
||||
| Professional gambler | 1/4 to 1/3 | Reduces career risk |
|
||||
| High model uncertainty | 1/4 or less | Error buffer crucial |
|
||||
| High confidence | 1/2 to 2/3 | Can use more aggression |
|
||||
| Institutional | 1/4 to 1/3 | Drawdown = career risk |
|
||||
|
||||
**Default**: Quarter Kelly (1/4) for most real-world situations
|
||||
|
||||
---
|
||||
|
||||
## 4. Extensions
|
||||
|
||||
### Multiple Simultaneous Bets
|
||||
|
||||
**Matrix form** (N assets):
|
||||
```
|
||||
f* = Σ⁻¹ × μ
|
||||
|
||||
Where:
|
||||
Σ = Covariance matrix
|
||||
μ = Expected returns vector
|
||||
```
|
||||
|
||||
**Key insight**: Correlated bets reduce optimal sizing
|
||||
|
||||
**Heuristic**: Adjusted Kelly = Individual Kelly × (1 - ρ/2), where ρ = correlation
|
||||
|
||||
**Example**: ρ = 0.6, Individual Kelly = 15%
|
||||
- Adjusted: 15% × (1 - 0.3) = 10.5%
|
||||
|
||||
### Correlated Outcomes
|
||||
|
||||
**Common correlations**:
|
||||
- Political: Presidential + Senate races
|
||||
- Sports: Team championship + Player MVP
|
||||
- Markets: Tech stock A + Tech stock B
|
||||
|
||||
**Extreme cases**:
|
||||
- ρ = 1 (perfect correlation): Only bet on one
|
||||
- ρ = -1 (negative correlation): Bets hedge, can bet more
|
||||
- ρ = 0 (independent): No adjustment
|
||||
|
||||
### Dynamic Kelly
|
||||
|
||||
**Problem**: Probability changes over time (new information)
|
||||
|
||||
**Process**:
|
||||
1. Start with p₀, bet f₀*
|
||||
2. New information → Update to p₁ (Bayesian)
|
||||
3. Recalculate f₁*
|
||||
4. Rebalance (adjust bet size)
|
||||
|
||||
**Consideration**: Transaction costs limit rebalancing frequency
|
||||
|
||||
---
|
||||
|
||||
## 5. Common Mistakes
|
||||
|
||||
### Mistake 1: Full Kelly Overbet
|
||||
|
||||
**The error**: Using full Kelly in practice
|
||||
|
||||
**Why wrong**: Assumes perfect probability estimate (never true)
|
||||
|
||||
**Impact**: Bet 2×f* → Negative growth rate
|
||||
|
||||
**Fix**: Always use fractional Kelly (1/4 to 1/2)
|
||||
|
||||
### Mistake 2: Ignoring Model Error
|
||||
|
||||
**The error**: Treating probability as certain
|
||||
|
||||
**Adjustment**:
|
||||
```
|
||||
Uncertain Kelly = f* × Confidence
|
||||
|
||||
Example: f* = 20%, 80% confident → Bet 16%
|
||||
```
|
||||
|
||||
**Better**: Use fractional Kelly (implicitly adjusts)
|
||||
|
||||
### Mistake 3: Neglecting Bankruptcy
|
||||
|
||||
**Reality**: Finite games + estimation error → real ruin risk
|
||||
|
||||
**Drawdown stats** (full Kelly, p=0.55):
|
||||
- 25% chance of -40% before recovery
|
||||
- 10% chance of -50% before recovery
|
||||
|
||||
**Practical bankruptcy**: Client fires you, forced liquidation, can't maintain discipline
|
||||
|
||||
**Fix**: Use fractional Kelly, set stop-loss (if down 25%, pause)
|
||||
|
||||
### Mistake 4: Ignoring Correlation
|
||||
|
||||
**Example disaster**:
|
||||
- 10 bets, each Kelly 10%
|
||||
- All highly correlated (same theme)
|
||||
- Bet 100% total → Single adverse event → Large loss
|
||||
|
||||
**Fix**: Measure correlations, use portfolio Kelly, diversify themes
|
||||
|
||||
### Mistake 5: Misestimating Odds
|
||||
|
||||
**Common confusion**:
|
||||
- Decimal 2.0: b = 1.0 (not 2.0)
|
||||
- "3-to-1": b = 3.0 ✓
|
||||
- American +200: b = 2.0 (not 200)
|
||||
|
||||
**Fix**: Always convert to NET payout (b = total return - 1)
|
||||
|
||||
### Mistake 6: Static Bankroll
|
||||
|
||||
**Problem**: Calculate once, never update
|
||||
|
||||
**Fix**: Recalculate before each bet using current bankroll
|
||||
|
||||
---
|
||||
|
||||
## 6. Practical Implementation
|
||||
|
||||
### Step-by-Step Process
|
||||
|
||||
**1. Convert odds to decimal**:
|
||||
```python
|
||||
# Decimal odds: b = decimal - 1
|
||||
# American +150: b = 150/100 = 1.50
|
||||
# American -150: b = 100/150 = 0.667
|
||||
# Fractional 3/1: b = 3.0
|
||||
```
|
||||
|
||||
**2. Determine probability**: Use forecasting process (base rates, Bayesian updating, etc.)
|
||||
|
||||
**3. Calculate edge**:
|
||||
```python
|
||||
edge = net_odds * probability - (1 - probability)
|
||||
```
|
||||
|
||||
**4. Calculate Kelly**:
|
||||
```python
|
||||
kelly_fraction = edge / net_odds
|
||||
```
|
||||
|
||||
**5. Apply fractional Kelly**:
|
||||
```python
|
||||
fraction = 0.25 # Quarter Kelly recommended
|
||||
adjusted_kelly = kelly_fraction * fraction
|
||||
```
|
||||
|
||||
**6. Calculate bet size**:
|
||||
```python
|
||||
bet_size = current_bankroll * adjusted_kelly
|
||||
```
|
||||
|
||||
**7. Execute and track**:
|
||||
- Record: Date, event, probability, odds, edge, Kelly%, bet
|
||||
- Set reminder for resolution
|
||||
- Note new information
|
||||
|
||||
### Position Tracking Template
|
||||
|
||||
```
|
||||
Date: 2024-01-15
|
||||
Event: Candidate A wins
|
||||
Your probability: 65%
|
||||
Market odds: 2.20 (implied 45%)
|
||||
Net odds (b): 1.20
|
||||
Edge: 0.43 (43%)
|
||||
Full Kelly: 35.8%
|
||||
Fractional (1/4): 8.9%
|
||||
Bankroll: $10,000
|
||||
Bet size: $890
|
||||
Resolution: 2024-11-05
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Historical Examples
|
||||
|
||||
### Ed Thorp - Blackjack (1960s)
|
||||
|
||||
**Application**: Card counting edge varies with count → Dynamic Kelly
|
||||
|
||||
**Implementation**:
|
||||
- True count +1: Edge ~0.5%, bet ~0.5% of bankroll
|
||||
- True count +5: Edge ~2.5%, bet ~2.5% of bankroll
|
||||
|
||||
**Results**: Turned $10k into $100k+, proved Kelly works in practice
|
||||
|
||||
**Lessons**: Used fractional Kelly (~1/2), dynamic sizing, managed "heat" (detection risk)
|
||||
|
||||
### Princeton-Newport Partners (1970s-1980s)
|
||||
|
||||
**Strategy**: Statistical arbitrage, convertible bonds
|
||||
|
||||
**Kelly application**: 1-3% per position, 50-100 positions (diversification)
|
||||
|
||||
**Results**: 19.1% annual (1969-1988), only 4 down months in 19 years, <5% max drawdown
|
||||
|
||||
**Lessons**: Fractional Kelly + diversification = low volatility, dominant strategy
|
||||
|
||||
### Renaissance Technologies / Medallion Fund
|
||||
|
||||
**Strategy**: Thousands of small edges, high frequency
|
||||
|
||||
**Kelly application**:
|
||||
- Each signal: 0.1-0.5% (tiny fractional Kelly)
|
||||
- Portfolio: 10,000+ positions
|
||||
- Leverage: 2-4× (portfolio Kelly supports with diversification)
|
||||
|
||||
**Results**: 66% annual (gross) over 30+ years, never down year
|
||||
|
||||
**Lessons**: Kelly optimal for repeated bets with edge. Diversification enables leverage. Discipline crucial.
|
||||
|
||||
### Warren Buffett (Implicit Kelly)
|
||||
|
||||
**Concentrated bets**: American Express (40%), Coca-Cola (25%), Apple (40%)
|
||||
|
||||
**Why Kelly-like**: High conviction → High p → Large Kelly → Large position
|
||||
|
||||
**Quote**: "Diversification is protection against ignorance."
|
||||
|
||||
**Lessons**: Kelly justifies concentration with edge. Still uses fractional (~40% max, not 100%).
|
||||
|
||||
---
|
||||
|
||||
## 8. Comparison to Other Methods
|
||||
|
||||
### Fixed Fraction
|
||||
|
||||
**Method**: Always bet same percentage
|
||||
|
||||
**Pros**: Simple, prevents ruin
|
||||
|
||||
**Cons**: Ignores edge, suboptimal growth
|
||||
|
||||
**When to use**: Don't trust probability estimates, want simplicity
|
||||
|
||||
### Martingale (Double After Loss)
|
||||
|
||||
**Method**: Double bet after each loss
|
||||
|
||||
**Fatal flaws**:
|
||||
- Requires infinite bankroll
|
||||
- Exponential growth (10 losses → need $10,240)
|
||||
- Negative edge → lose faster
|
||||
- Betting limits prevent recovery
|
||||
|
||||
**Conclusion**: **NEVER use**. Mathematically certain to fail.
|
||||
|
||||
### Fixed Amount
|
||||
|
||||
**Method**: Always bet same dollar amount
|
||||
|
||||
**Cons**: As bankroll changes, fraction changes inappropriately
|
||||
|
||||
**When to use**: Very small recreational betting
|
||||
|
||||
### Constant Proportion
|
||||
|
||||
**Method**: Fixed percentage, not optimized for edge
|
||||
|
||||
**Difference from Kelly**: Doesn't adjust for edge/odds
|
||||
|
||||
**Conclusion**: Better than fixed dollar, worse than Kelly
|
||||
|
||||
### Risk Parity
|
||||
|
||||
**Method**: Allocate to equalize risk contribution
|
||||
|
||||
**Difference from Kelly**: Doesn't use expected returns (ignores edge)
|
||||
|
||||
**When better**: Don't have reliable return estimates, defensive portfolio
|
||||
|
||||
**When Kelly better**: Have edge estimates, goal is growth
|
||||
|
||||
### Summary Comparison
|
||||
|
||||
| Method | Growth | Ruin Risk | When to Use |
|
||||
|--------|--------|-----------|-------------|
|
||||
| **Kelly** | Highest | None* | Active betting with edge |
|
||||
| **Fractional Kelly** | High | Very low | **Real-world (recommended)** |
|
||||
| **Fixed Fraction** | Medium | Low | Simple discipline |
|
||||
| **Fixed Amount** | Low | Medium | Recreational only |
|
||||
| **Martingale** | Negative | Certain | **NEVER** |
|
||||
| **Risk Parity** | Low-Med | Low | Defensive portfolios |
|
||||
|
||||
*Kelly theoretically no ruin risk, but model error creates practical risk → use fractional Kelly
|
||||
|
||||
**Final Recommendation**: **Quarter Kelly (f*/4)** for nearly all real-world scenarios.
|
||||
|
||||
---
|
||||
|
||||
## Return to Main Skill
|
||||
|
||||
[← Back to Market Mechanics & Betting](../skill.md)
|
||||
|
||||
**Related resources**:
|
||||
- [Betting Theory Fundamentals](betting-theory.md)
|
||||
- [Scoring Rules and Calibration](scoring-rules.md)
|
||||
494
skills/market-mechanics-betting/resources/scoring-rules.md
Normal file
494
skills/market-mechanics-betting/resources/scoring-rules.md
Normal file
@@ -0,0 +1,494 @@
|
||||
# Scoring Rules and Calibration
|
||||
|
||||
Comprehensive guide to proper scoring rules, calibration measurement, and forecast accuracy improvement.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Proper Scoring Rules](#1-proper-scoring-rules)
|
||||
2. [Brier Score Deep Dive](#2-brier-score-deep-dive)
|
||||
3. [Log Score](#3-log-score-logarithmic-scoring-rule)
|
||||
4. [Calibration Curves](#4-calibration-curves)
|
||||
5. [Resolution Analysis](#5-resolution-analysis)
|
||||
6. [Sharpness](#6-sharpness)
|
||||
7. [Practical Calibration Training](#7-practical-calibration-training)
|
||||
8. [Comparison Table](#8-comparison-table-of-scoring-rules)
|
||||
|
||||
---
|
||||
|
||||
## 1. Proper Scoring Rules
|
||||
|
||||
### What is a Scoring Rule?
|
||||
|
||||
A **scoring rule** assigns a numerical score to a probabilistic forecast based on the forecast and actual outcome.
|
||||
|
||||
**Purpose:** Measure accuracy, incentivize honesty, enable comparison, track calibration over time.
|
||||
|
||||
### Strictly Proper vs Quasi-Proper
|
||||
|
||||
**Strictly Proper:** Reporting your true belief uniquely maximizes your expected score. No other probability gives better expected score.
|
||||
|
||||
**Why it matters:** Incentivizes honesty, eliminates gaming, optimizes for accurate beliefs.
|
||||
|
||||
**Quasi-Proper:** True belief maximizes score, but other probabilities might tie. Less desirable for forecasting.
|
||||
|
||||
### Common Proper Scoring Rules
|
||||
|
||||
**1. Brier Score** (strictly proper)
|
||||
```
|
||||
Score = -(p - o)²
|
||||
p = Your probability (0 to 1)
|
||||
o = Outcome (0 or 1)
|
||||
```
|
||||
|
||||
**2. Logarithmic Score** (strictly proper)
|
||||
```
|
||||
Score = log(p) if outcome occurs
|
||||
Score = log(1-p) if outcome doesn't occur
|
||||
```
|
||||
|
||||
**3. Spherical Score** (strictly proper)
|
||||
```
|
||||
Score = p / √(p² + (1-p)²) if outcome occurs
|
||||
```
|
||||
|
||||
### Common IMPROPER Scoring Rules (Avoid)
|
||||
|
||||
**Absolute Error:** `Score = -|p - o|` → Incentivizes extremes (NOT proper)
|
||||
|
||||
**Threshold Accuracy:** Binary right/wrong → Ignores calibration (NOT proper)
|
||||
|
||||
**Example of gaming improper rules:**
|
||||
```
|
||||
Using absolute error (improper):
|
||||
True belief: 60% → Optimal report: 100% (dishonest)
|
||||
|
||||
Using Brier score (proper):
|
||||
True belief: 60% → Optimal report: 60% (honest)
|
||||
```
|
||||
|
||||
**Key Principle:** Only use strictly proper scoring rules for forecast evaluation.
|
||||
|
||||
---
|
||||
|
||||
## 2. Brier Score Deep Dive
|
||||
|
||||
### Formula
|
||||
|
||||
**Single forecast:** `Brier = (p - o)²`
|
||||
|
||||
**Multiple forecasts:** `Brier = (1/N) × Σ(pi - oi)²`
|
||||
|
||||
**Range:** 0.00 (perfect) to 1.00 (worst). Lower is better.
|
||||
|
||||
### Calculation Examples
|
||||
|
||||
```
|
||||
90% Yes → (0.90-1)² = 0.01 (good) | 90% No → (0.90-0)² = 0.81 (bad)
|
||||
60% Yes → (0.60-1)² = 0.16 (medium) | 50% Any → 0.25 (baseline)
|
||||
```
|
||||
|
||||
### Brier Score Decomposition
|
||||
|
||||
**Murphy Decomposition:**
|
||||
```
|
||||
Brier Score = Reliability - Resolution + Uncertainty
|
||||
```
|
||||
|
||||
**Reliability (Calibration Error):** Are your probabilities correct on average? (Lower is better)
|
||||
|
||||
**Resolution:** Do you assign different probabilities to different outcomes? (Higher is better)
|
||||
|
||||
**Uncertainty:** Base rate variance (uncontrollable, depends on problem)
|
||||
|
||||
**Improving Brier:**
|
||||
1. Minimize reliability (fix calibration)
|
||||
2. Maximize resolution (differentiate forecasts)
|
||||
|
||||
### Brier Score Interpretation
|
||||
|
||||
| Brier Score | Quality | Description |
|
||||
|-------------|---------|-------------|
|
||||
| 0.00 - 0.05 | Exceptional | Near-perfect |
|
||||
| 0.05 - 0.10 | Excellent | Top tier |
|
||||
| 0.10 - 0.15 | Good | Skilled |
|
||||
| 0.15 - 0.20 | Average | Better than random |
|
||||
| 0.20 - 0.25 | Below Average | Approaching random |
|
||||
| 0.25+ | Poor | At or worse than random |
|
||||
|
||||
**Context matters:** Easy questions expect lower scores. Compare to baseline (0.25) and other forecasters.
|
||||
|
||||
### Improving Your Brier Score
|
||||
|
||||
**Path 1: Fix Calibration**
|
||||
|
||||
**If overconfident:** 80% predictions happen 60% → Be less extreme, widen intervals
|
||||
|
||||
**If underconfident:** 60% predictions happen 80% → Be more extreme when you have evidence
|
||||
|
||||
**Path 2: Improve Resolution**
|
||||
|
||||
**Problem:** All forecasts near 50% → Differentiate easy vs hard questions, research more, be bold when warranted
|
||||
|
||||
**Balance:** `Good Forecaster = Well-Calibrated + High Resolution`
|
||||
|
||||
### Brier Skill Score
|
||||
|
||||
```
|
||||
BSS = 1 - (Your Brier / Baseline Brier)
|
||||
|
||||
Example:
|
||||
Your Brier: 0.12, Baseline: 0.25
|
||||
BSS = 1 - 0.48 = 0.52 (52% improvement over baseline)
|
||||
```
|
||||
|
||||
**Interpretation:** BSS = 1.00 (perfect), 0.00 (same as baseline), <0 (worse than baseline)
|
||||
|
||||
---
|
||||
|
||||
## 3. Log Score (Logarithmic Scoring Rule)
|
||||
|
||||
### Formula
|
||||
|
||||
```
|
||||
Log Score = log₂(p) if outcome occurs
|
||||
Log Score = log₂(1-p) if outcome doesn't occur
|
||||
|
||||
Range: -∞ (worst) to 0 (perfect)
|
||||
Higher (less negative) is better
|
||||
```
|
||||
|
||||
### Calculation Examples
|
||||
|
||||
```
|
||||
90% Yes → -0.15 | 90% No → -3.32 (severe) | 50% Yes → -1.00
|
||||
99% No → -6.64 (catastrophic penalty for overconfidence)
|
||||
```
|
||||
|
||||
### Relationship to Information Theory
|
||||
|
||||
**Log score measures bits of surprise:**
|
||||
```
|
||||
Surprise = -log₂(p)
|
||||
|
||||
p = 50% → 1 bit surprise
|
||||
p = 25% → 2 bits surprise
|
||||
p = 12.5% → 3 bits surprise
|
||||
```
|
||||
|
||||
**Connection to entropy:** Log score equals cross-entropy between forecast distribution and true outcome.
|
||||
|
||||
### When to Use Log Score vs Brier
|
||||
|
||||
**Use Log Score when:**
|
||||
- Severe penalty for overconfidence desired
|
||||
- Tail risk matters (rare events important)
|
||||
- Information-theoretic interpretation useful
|
||||
- Comparing probabilistic models
|
||||
|
||||
**Use Brier Score when:**
|
||||
- Human forecasters (less punishing)
|
||||
- Easier interpretation (squared error)
|
||||
- Standard benchmark (more common)
|
||||
- Avoiding extreme penalties
|
||||
|
||||
**Key Difference:**
|
||||
|
||||
Brier: Quadratic penalty (grows with square)
|
||||
```
|
||||
Error: 10% → 0.01, 20% → 0.04, 30% → 0.09, 40% → 0.16
|
||||
```
|
||||
|
||||
Log: Logarithmic penalty (grows faster for extremes)
|
||||
```
|
||||
Forecast: 90% wrong → -3.3, 95% wrong → -4.3, 99% wrong → -6.6
|
||||
```
|
||||
|
||||
**Recommendation:** Default to Brier. Add Log for high-stakes or to penalize overconfidence. Track both for complete picture.
|
||||
|
||||
---
|
||||
|
||||
## 4. Calibration Curves
|
||||
|
||||
### What is a Calibration Curve?
|
||||
|
||||
**Visualization of forecast accuracy:**
|
||||
```
|
||||
Y-axis: Actual frequency (how often outcome occurred)
|
||||
X-axis: Stated probability (your forecasts)
|
||||
Perfect calibration: Diagonal line (y = x)
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Actual %
|
||||
100 ┤ ╱
|
||||
80 ┤ ●
|
||||
60 ┤ ●
|
||||
40 ┤ ● ← Perfect calibration line
|
||||
20 ┤ ●
|
||||
0 └───────────────────────
|
||||
0 20 40 60 80 100
|
||||
Stated probability %
|
||||
```
|
||||
|
||||
### How to Create
|
||||
|
||||
**Step 1:** Collect 50+ forecasts and outcomes
|
||||
|
||||
**Step 2:** Bin by probability (0-10%, 10-20%, ..., 90-100%)
|
||||
|
||||
**Step 3:** For each bin, calculate actual frequency
|
||||
```
|
||||
Example: 60-70% bin
|
||||
Forecasts: 15 total, Outcomes: 9 Yes, 6 No
|
||||
Actual frequency: 9/15 = 60%
|
||||
Plot point: (65, 60)
|
||||
```
|
||||
|
||||
**Step 4:** Draw perfect calibration line (diagonal from (0,0) to (100,100))
|
||||
|
||||
**Step 5:** Compare points to line
|
||||
|
||||
### Over/Under Confidence Detection
|
||||
|
||||
**Overconfidence:** Points below diagonal (said 90%, happened 70%). Fix: Be less extreme, widen intervals.
|
||||
|
||||
**Underconfidence:** Points above diagonal (said 90%, happened 95%). Fix: Be more extreme when evidence is strong.
|
||||
|
||||
**Sample size:** <10/bin unreliable, 10-20 weak, 20-50 moderate, 50+ strong evidence
|
||||
|
||||
---
|
||||
|
||||
## 5. Resolution Analysis
|
||||
|
||||
### What is Resolution?
|
||||
|
||||
**Resolution** measures ability to assign different probabilities to outcomes that actually differ.
|
||||
|
||||
**High resolution:** Events you call 90% happen much more than events you call 10% (good)
|
||||
|
||||
**Low resolution:** All forecasts near 50%, can't discriminate (bad)
|
||||
|
||||
### Formula
|
||||
|
||||
```
|
||||
Resolution = (1/N) × Σ nk(ok - ō)²
|
||||
|
||||
nk = Forecasts in bin k
|
||||
ok = Actual frequency in bin k
|
||||
ō = Overall base rate
|
||||
|
||||
Higher is better
|
||||
```
|
||||
|
||||
### How to Improve Resolution
|
||||
|
||||
**Problem: Stuck at 50%**
|
||||
|
||||
Bad pattern: All forecasts 48-52% → Low resolution
|
||||
|
||||
Good pattern: Range from 20% to 90% → High resolution
|
||||
|
||||
**Strategies:**
|
||||
|
||||
1. **Gather discriminating information** - Find features that distinguish outcomes
|
||||
2. **Use decomposition** - Fermi, causal models, scenarios
|
||||
3. **Be bold when warranted** - If evidence strong → Say 85% not 65%
|
||||
4. **Update with evidence** - Start with base rate, update with Bayesian reasoning
|
||||
|
||||
### Calibration vs Resolution Tradeoff
|
||||
|
||||
```
|
||||
Perfect Calibration Only: Say 60% for everything when base rate is 60%
|
||||
→ Calibration: Perfect
|
||||
→ Resolution: Zero
|
||||
→ Brier: 0.24 (bad)
|
||||
|
||||
High Resolution Only: Say 10% or 90% (extremes) incorrectly
|
||||
→ Calibration: Poor
|
||||
→ Resolution: High
|
||||
→ Brier: Terrible
|
||||
|
||||
Optimal Balance: Well-calibrated AND high resolution
|
||||
→ Calibration: Good
|
||||
→ Resolution: High
|
||||
→ Brier: Minimized
|
||||
```
|
||||
|
||||
**Best forecasters:** Well-calibrated (low reliability error) + High resolution (discriminate events) = Low Brier
|
||||
|
||||
**Recommendation:** Don't sacrifice resolution for perfect calibration. Be bold when evidence warrants.
|
||||
|
||||
---
|
||||
|
||||
## 6. Sharpness
|
||||
|
||||
### What is Sharpness?
|
||||
|
||||
**Sharpness** = Tendency to make extreme predictions (away from 50%) when appropriate.
|
||||
|
||||
**Sharp:** Predicts 5% or 95% when evidence supports it (decisive)
|
||||
|
||||
**Unsharp:** Stays near 50% (plays it safe, indecisive)
|
||||
|
||||
### Why Sharpness Matters
|
||||
|
||||
```
|
||||
Scenario: Base rate 60%
|
||||
|
||||
Unsharp forecaster: 50% for every event → Brier: 0.24, Usefulness: Low
|
||||
Sharp forecaster: Range 20-90% → Brier: 0.12 (if calibrated), Usefulness: High
|
||||
```
|
||||
|
||||
**Insight:** Extreme predictions (when accurate) improve Brier significantly. When wrong, hurt badly. Solution: Be sharp when you have evidence.
|
||||
|
||||
### Measuring Sharpness
|
||||
|
||||
```
|
||||
Sharpness = Variance of forecast probabilities
|
||||
|
||||
Forecaster A: [0.45, 0.50, 0.48, 0.52, 0.49] → Var = 0.0007 (unsharp)
|
||||
Forecaster B: [0.15, 0.85, 0.30, 0.90, 0.20] → Var = 0.1150 (sharp)
|
||||
```
|
||||
|
||||
### When to Be Sharp
|
||||
|
||||
**Be sharp (extreme probabilities) when:**
|
||||
- Strong discriminating evidence (multiple independent pieces align)
|
||||
- Easy questions (outcome nearly certain)
|
||||
- You have expertise (domain knowledge, track record)
|
||||
|
||||
**Stay moderate (near 50%) when:**
|
||||
- High uncertainty (limited information, conflicting evidence)
|
||||
- Hard questions (true probability near 50%)
|
||||
- No expertise (unfamiliar domain)
|
||||
|
||||
**Goal:** Sharp AND well-calibrated (extreme when warranted, accurate probabilities)
|
||||
|
||||
---
|
||||
|
||||
## 7. Practical Calibration Training
|
||||
|
||||
### Calibration Exercises
|
||||
|
||||
**Exercise Set 1:** Make 10 forecasts on verifiable questions (fair coin 50%, Paris capital 99%, two heads 25%, die shows 6 at 16.67%). Check: Did 99% come true 9-10 times? Did 50% come true ~5 times?
|
||||
|
||||
**Exercise Set 2:** Make 20 "80% confident" predictions. Expected: 16/20 correct. Common: 12-14/20 (overconfident). What feels "80%" should be reported as "65%".
|
||||
|
||||
### Tracking Methods
|
||||
|
||||
**Method 1: Spreadsheet**
|
||||
```
|
||||
| Date | Question | Prob | Outcome | Brier | Notes |
|
||||
Monthly: Calculate mean Brier
|
||||
Quarterly: Generate calibration curve
|
||||
```
|
||||
|
||||
**Method 2: Apps**
|
||||
- PredictionBook.com (free, tracks calibration)
|
||||
- Metaculus.com (forecasting platform)
|
||||
- Good Judgment Open (tournament)
|
||||
|
||||
**Method 3: Focused Practice**
|
||||
- Week 1: Make 20 predictions (focus on honesty)
|
||||
- Week 2: Check calibration curve (identify bias)
|
||||
- Week 3: Increase resolution (be bold)
|
||||
- Week 4: Balance calibration + resolution
|
||||
|
||||
### Training Drills
|
||||
|
||||
**Drill 1:** Generate 10 "90% CIs" for unknowns. Target: 9/10 contain true value. Common mistake: Only 5-7 (overconfident). Fix: Widen by 1.5×.
|
||||
|
||||
**Drill 2:** Bayesian practice - State prior, observe evidence, update posterior, check calibration.
|
||||
|
||||
**Drill 3:** Make 10 predictions >80% or <20%. Force extremes when "pretty sure". Track: Are >80% happening >80%?
|
||||
|
||||
---
|
||||
|
||||
## 8. Comparison Table of Scoring Rules
|
||||
|
||||
### Summary
|
||||
|
||||
| Feature | Brier | Log | Spherical | Threshold |
|
||||
|---------|-------|-----|-----------|-----------|
|
||||
| **Proper** | Strictly | Strictly | Strictly | NO |
|
||||
| **Range** | 0 to 1 | -∞ to 0 | 0 to 1 | 0 to 1 |
|
||||
| **Penalty** | Quadratic | Logarithmic | Moderate | None |
|
||||
| **Interpretation** | Squared error | Bits surprise | Geometric | Binary |
|
||||
| **Usage** | Default | High-stakes | Rare | Avoid |
|
||||
| **Human-friendly** | Yes | Somewhat | No | Yes (misleading) |
|
||||
|
||||
### Detailed Comparison
|
||||
|
||||
**Brier Score**
|
||||
|
||||
Pros: Easy to interpret, standard in competitions, moderate penalty, good for humans
|
||||
|
||||
Cons: Less severe penalty for overconfidence
|
||||
|
||||
Best for: General forecasting, calibration training, standard benchmarking
|
||||
|
||||
**Log Score**
|
||||
|
||||
Pros: Severe penalty for overconfidence, information-theoretic, strongly incentivizes honesty
|
||||
|
||||
Cons: Too punishing for humans, infinite at 0%/100%, less intuitive
|
||||
|
||||
Best for: High-stakes forecasting, penalizing overconfidence, ML models, tail risk
|
||||
|
||||
**Spherical Score**
|
||||
|
||||
Pros: Strictly proper, bounded, geometric interpretation
|
||||
|
||||
Cons: Uncommon, complex formula, rarely used
|
||||
|
||||
Best for: Theoretical analysis only
|
||||
|
||||
**Threshold / Binary Accuracy**
|
||||
|
||||
Pros: Very intuitive, easy to explain
|
||||
|
||||
Cons: NOT proper (incentivizes extremes), ignores calibration, can be gamed
|
||||
|
||||
Best for: Nothing (don't use for forecasting)
|
||||
|
||||
### When to Use Each
|
||||
|
||||
| Your Situation | Recommended |
|
||||
|----------------|-------------|
|
||||
| Starting out | **Brier** |
|
||||
| Experienced forecaster | **Brier** or **Log** |
|
||||
| High-stakes decisions | **Log** |
|
||||
| Comparing to benchmarks | **Brier** |
|
||||
| Building ML model | **Log** |
|
||||
| Personal tracking | **Brier** |
|
||||
| Teaching others | **Brier** |
|
||||
|
||||
**Recommendation:** Use **Brier** as default. Add **Log** for high-stakes or to penalize overconfidence.
|
||||
|
||||
### Conversion Example
|
||||
|
||||
**Forecast: 80%, Outcome: Yes**
|
||||
```
|
||||
Brier: (0.80-1)² = 0.04
|
||||
Log (base 2): log₂(0.80) = -0.322
|
||||
Spherical: 0.80/√(0.80²+0.20²) = 0.971
|
||||
```
|
||||
|
||||
**Forecast: 80%, Outcome: No**
|
||||
```
|
||||
Brier: (0.80-0)² = 0.64
|
||||
Log (base 2): log₂(0.20) = -2.322 (much worse penalty)
|
||||
Spherical: 0.20/√(0.80²+0.20²) = 0.243
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Return to Main Skill
|
||||
|
||||
[← Back to Market Mechanics & Betting](../SKILL.md)
|
||||
|
||||
**Related Resources:**
|
||||
- [Betting Theory Fundamentals](betting-theory.md)
|
||||
- [Kelly Criterion Deep Dive](kelly-criterion.md)
|
||||
|
||||
Reference in New Issue
Block a user