Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,393 @@
# Betting Theory Fundamentals
This resource explains the core theoretical foundations of rational betting, expected value, variance management, and market efficiency.
**Foundation for:** All betting and forecasting decisions
---
## Why Learn Betting Theory
**Core insight:** Betting theory separates decision quality from outcome quality. Make +EV decisions repeatedly and survive variance.
**Enables:**
- Think probabilistically (convert beliefs to quantifiable edges)
- Manage risk rationally (distinguish bad decisions from bad outcomes)
- Avoid costly mistakes (identify predictable failure modes)
- Optimize long-term growth (balance aggression with preservation)
**Research foundation:** Kelly (1956), Samuelson (1963), Thorp (1969), behavioral economics (Kahneman & Tversky), market efficiency (Fama).
---
## 1. Expected Value Framework
### Definition and Formula
**Expected Value (EV):** Probability-weighted average of all possible outcomes.
```
EV = Σ(Probability × Outcome)
Binary bet:
EV = (P_win × Amount_won) - (P_lose × Amount_lost)
```
**Example:**
```
Bet $100 on 60% event at even odds (+100)
EV = (0.60 × $100) - (0.40 × $100) = $20
EV% = +20% per $100 wagered
```
### Positive vs Negative EV
**Decision Framework:**
- **EV > +5%:** Strong bet (after fees/uncertainty)
- **EV = 0% to +5%:** Marginal (consider passing)
- **EV < 0%:** Never bet (unless hedging)
**Critical Rule:** Judge decisions by EV, not outcomes. Good decisions lose sometimes; bad decisions win sometimes. Process matters in small samples, results matter over 100+ trials.
### Converting Market Odds to EV
**Step 1: Implied probability**
```
Decimal odds: P = 1 / Odds
Example: 1.67 → 60%
American (+): P = 100 / (Odds + 100)
Example: +150 → 40%
American (-): P = |Odds| / (|Odds| + 100)
Example: -150 → 60%
```
**Step 2: Calculate edge**
```
Your probability: 70%
Market probability: 60%
Edge = 70% - 60% = +10%
```
**Step 3: Calculate EV**
```
Bet $100 at 1.67 odds:
EV = (0.70 × $67) - (0.30 × $100) = +$16.90 = +16.9%
```
### Law of Large Numbers
**Key principle:** Observed frequency converges to true probability as sample size increases.
**Practical thresholds:**
- 10 bets: High variance, might be down despite +EV
- 100 bets: Convergence starting, likely near EV
- 1000 bets: Results tightly centered around EV
**Application:** Don't judge strategy on <30 trials. Variance dominates small samples.
---
## 2. Variance and Risk
### Standard Deviation
**Measures outcome dispersion around EV.**
**Formula:**
```
σ = √(P_win×(Win-EV)² + P_lose×(Loss-EV)²)
```
**Example ($100 bet, 60% win, even odds):**
```
EV = $20
σ = √(0.60×(100-20)² + 0.40×(-100-20)²)
σ = √9600 = $98
Coefficient of Variation: σ/EV = $98/$20 = 4.9
```
**Interpretation:** Standard deviation ($98) is 5× the EV ($20). Variance dominates signal.
### Volatility Categories
**Coefficient of Variation (CV = σ/EV):**
- CV < 1: Low volatility (10-30 trials to see EV)
- CV = 1-3: Moderate (30-50 trials)
- CV = 3-10: High (50-100 trials)
- CV > 10: Extreme (100+ trials)
**Higher CV requires:** Larger bankroll, more patience, stronger discipline.
### Risk of Ruin
**Probability of losing entire bankroll before profit.**
**Practical Guidelines:**
| Bet Size | Risk of Ruin | Assessment |
|----------|--------------|------------|
| 50% of bankroll | ~40% | Reckless |
| 25% of bankroll | ~20% | Aggressive |
| 10% of bankroll | ~5% | Moderate |
| 5% of bankroll | ~1% | Conservative |
| 2% of bankroll | ~0.1% | Very conservative |
**Kelly Criterion naturally manages risk of ruin. Never bet >10% of bankroll on single bet.**
### Managing Volatility
**1. Fractional Kelly (Primary Tool):**
- Full Kelly: 100% variance, 40%+ drawdowns
- Half Kelly: 25% variance, ~20% drawdowns
- Quarter Kelly: 6% variance, ~10% drawdowns
**2. Diversification:**
- Multiple uncorrelated +EV bets
- Requires independence (correlation < 0.3)
**3. Expected Drawdown:**
- Even optimal betting experiences 20-40% drawdowns
- Mentally prepare for temporary losses
- Don't confuse drawdown with -EV strategy
---
## 3. Bankroll Management
### Defining Your Bankroll
**Valid:** Money you can afford to lose entirely, separate from emergency fund, investment portfolio, daily expenses. **Starting:** $500-$5000 recreational, $10,000+ serious.
**NOT valid:** Money needed for bills, emergency fund, retirement, money you'd be devastated to lose.
### Separation Principle
**Why:** Prevents scared money and revenge betting. Clear accounting, tax clarity, risk containment.
**Implementation:** Separate betting account, never add money mid-downswing, withdraw profits periodically, stop if bankroll → $0.
### Growth vs Preservation
**Preservation (Default):** 1/4 to 1/2 Kelly, for most bettors and bankrolls <$5000
**Growth (Advanced):** 1/2 to full Kelly, for large bankrolls and high variance tolerance (requires 2+ years track record)
### Dynamic Sizing
Bet size scales with bankroll. Example: $1000 bankroll at 5% = $50. After wins → $1500 → bet $75. After losses → $600 → bet $30.
**Recalculate:** Daily if >20% change, weekly (active), monthly (casual).
### Withdrawal Strategy
**Recommended:** When bankroll doubles, withdraw original amount, continue with profit (break-even if lose profit).
**Conservative:** 50% profit monthly. **Aggressive:** Never withdraw (full compounding).
---
## 4. Market Efficiency
### Efficient Market Hypothesis
**Core claim:** Prices reflect all available information. **Reality:** Semi-strong efficient in liquid, mature markets.
**Market knows:** Published polls/news, historical base rates, expert commentary, obvious statistical patterns.
### Where Edges Exist
**1. Information Asymmetry:** Local knowledge, domain expertise
**2. Model Superiority:** Better statistical model, proper extremizing
**3. Lower Transaction Costs:** Market 5% fee vs your 0-1%
**4. Behavioral Biases:** Recency bias, base rate neglect, narrative following
**5. Market Immaturity:** Low liquidity, niche topics, few informed traders
**Before betting, ask:** "What information or model do I have that the market doesn't?"
- Nothing → Pass | Vague → Pass | Specific → Investigate
### Trust vs Question Market
**Trust:** Liquid, mature, objective outcome, many informed participants, low emotion
**Question:** Illiquid, new, subjective outcome, few informed participants, high emotion (politics, fandom)
---
## 5. Common Betting Mistakes
### Chasing Losses
**What:** Increasing bet size after losses. **Why:** Loss aversion, emotional arousal.
**Fix:** Never increase bet size after loss, use bankroll %, take break after 2+ losses.
### Tilt (Emotional Betting)
**Triggers:** Bad beat, streaks, external stress. **Symptoms:** No analysis, ignoring Kelly, revenge betting.
**Fix:** Pre-commit no bets when tilted. Checklist: Calm? Calculate EV? Kelly sizing? Betting for +EV not revenge?
### Overconfidence Bias
**What:** Overestimating probability accuracy (90% when true is 70%).
**Fix:** Track calibration, log predictions + outcomes, calculate curve quarterly. Do 70% predictions happen 70%?
### Ignoring Variance
**What:** Judging strategy on <30 trials. Example: "Down 15% after 20 bets, strategy sucks" (normal variance).
**Fix:** Require 50+ bets minimum, 100+ preferred, 200+ for high confidence.
### Outcome Bias
**What:** Judging by results not process. +15% EV lost = good decision (bad outcome). -10% EV won = bad decision (lucky).
**Fix:** Checklist: EV correct? Edge > threshold? Kelly fraction? Followed system? YES = good decision regardless of outcome.
### Hindsight Bias
**What:** After outcome, "I knew it would happen."
**Fix:** Pre-commit logging, write probability before event, don't revise after, accept 40% events happen 40%.
---
## 6. Integration with Kelly Criterion
### EV Drives Kelly
**Kelly derives from:** Expected value (edge), odds received, bankroll optimization (maximize log wealth).
**Key relationship:** `f* = (bp - q) / b`. Edge drives bet size: 10% edge → ~10% Kelly, 5% edge → ~5% Kelly, 0% edge → 0% bet.
### Variance Tolerance
| Fraction | Variance | Growth | Drawdown |
|----------|----------|--------|----------|
| Full (1.0) | 100% | 100% | ~40% |
| Half (0.5) | 25% | 75% | ~20% |
| Quarter (0.25) | 6% | 50% | ~10% |
### Bankruptcy Protection
Kelly never bets 100%: prevents ruin, keeps capital for next bet, scales down as bankroll shrinks. **Practical:** Stop if bankroll drops 80-90%.
---
## 7. Practical Examples for Forecasters
### Example 1: Election Prediction Market
**Scenario:** Market 55%, your forecast 65%, bankroll $2000
**Step 1: Edge**
```
Edge = 65% - 55% = +10%
Threshold: 5%
Decision: +10% > 5% → Proceed
```
**Step 2: EV**
```
Bet $100 at 1.82 odds → Win $82
EV = (0.65 × $82) - (0.35 × $100) = +$18.30 = +18.3%
```
**Step 3: Kelly**
```
Full Kelly: 22.3%
Half Kelly: 11.2%
Bet: $2000 × 11.2% = $224
```
### Example 2: Brier Score Tracking
**50 forecasts, goal: Brier < 0.15**
| Forecast | Your P | Outcome | (P-O)² |
|----------|--------|---------|--------|
| Event A | 80% | YES (1) | 0.04 |
| Event B | 30% | NO (0) | 0.09 |
| Event C | 90% | YES (1) | 0.01 |
| Event D | 60% | NO (0) | 0.36 |
| Event E | 70% | YES (1) | 0.09 |
**Brier:** 0.59 / 5 = 0.118 (Excellent)
**Analysis:** Event D large error normal (40% events happen). Don't game metric by avoiding 60% predictions.
### Example 3: Extremizing
**Forecasts:** You 72%, A 68%, B 75%, C 70%, Market 71%
**Average:** 71.2%
**Extremize:**
```
Factor: 1.3 (moderate)
Extremized = 50% + (71.2% - 50%) × 1.3 = 77.6% ≈ 78%
Edge: 78% - 71% = +7%
Half Kelly ≈ 3.5% of $5000 = $175 bet
```
### Example 4: Correlated Portfolio
**Scenario:** Democrats House (60% yours, 55% market) + Senate (55% yours, 50% market)
**Correlation:** 0.7 (high)
**Naive (WRONG):**
```
Bet A: 5% × $10k = $500
Bet B: 5% × $10k = $500
Total: $1000 (10%)
```
**Correct:**
```
Adjust for correlation: 1 - (0.7 × 0.5) = 0.65
Bet A: $500 × 0.65 = $325
Bet B: $500 × 0.65 = $325
Total: $650 (6.5%)
```
**Reasoning:** Positive correlation amplifies risk. Reduce sizing to maintain tolerance.
---
## Key Takeaways
### The 10 Commandments
1. **Expected Value is King** - Judge decisions by EV, not outcomes
2. **Variance is Inevitable** - Embrace it; don't fight it
3. **Bankroll is Sacred** - Protect it above all else
4. **Kelly is Your Guide** - But use fractional (1/4 to 1/2)
5. **Market is Usually Right** - You need edge to beat it
6. **Discipline Over Impulse** - System beats emotion
7. **Sample Size Matters** - 50+ bets before judgment
8. **Calibration is Honesty** - Track it religiously
9. **Correlations Kill** - Adjust for portfolio risk
10. **Survival Enables Profit** - Can't win if bankrupt
### Mental Models
**Betting = Business**
- Bankroll = Working capital
- EV = Profit margin
- Variance = Market volatility
- Kelly = Capital allocation
**Decision Quality ≠ Outcome Quality**
- Good decisions lose sometimes (variance)
- Bad decisions win sometimes (luck)
- Process > Results (small samples)
- Results > Process (large samples 100+)
### Integration Workflow
**Before betting:**
1. Make forecast (Bayesian, reference class)
2. Calculate edge vs market
3. Check edge > threshold (5%+)
4. Use Kelly for sizing
5. Execute and log
**After betting:**
1. Track outcome
2. Update calibration
3. Calculate Brier score
4. Don't judge single bet
5. Evaluate after 50+ bets
---
**Return to:** [Main Skill](../SKILL.md#interactive-menu)

View File

@@ -0,0 +1,494 @@
# Kelly Criterion Deep Dive
Mathematical foundation for optimal bet sizing under uncertainty.
## Table of Contents
1. [Mathematical Derivation](#1-mathematical-derivation)
2. [Formula Variations](#2-formula-variations)
3. [Fractional Kelly](#3-fractional-kelly)
4. [Extensions](#4-extensions)
5. [Common Mistakes](#5-common-mistakes)
6. [Practical Implementation](#6-practical-implementation)
7. [Historical Examples](#7-historical-examples)
8. [Comparison to Other Methods](#8-comparison-to-other-methods)
---
## 1. Mathematical Derivation
### The Core Question
**Problem**: What fraction of your bankroll maximizes long-term growth?
**Why it matters**: Bet too little → Leave money on the table. Bet too much → Risk ruin, high variance.
### Logarithmic Utility Framework
**Key insight**: Maximize expected logarithm of wealth, not expected wealth.
**Why log utility?**
- Captures diminishing marginal utility ($1 matters more when you have $100 vs $1M)
- Makes repeated multiplicative bets additive: log(AB) = log(A) + log(B)
- Geometric mean emerges naturally (what matters for repeated bets)
- Prevents betting 100% (avoids ruin)
### Derivation for Binary Bet
**Setup**:
- Current bankroll: W
- Bet fraction: f
- Win probability: p, Loss probability: q = 1 - p
- Net odds: b (bet $1, win $b net)
**Outcomes**:
- Win (probability p): New wealth = W(1 + fb)
- Lose (probability q): New wealth = W(1 - f)
**Expected log utility**:
```
E[log(W_new)] = p × log(1 + fb) + q × log(1 - f) + log(W)
```
**Objective**: Maximize g(f) = p × log(1 + fb) + q × log(1 - f)
### Finding the Optimum
**Take derivative**:
```
dg/df = pb/(1 + fb) - q/(1 - f)
```
**Set equal to zero and solve**:
```
pb/(1 + fb) = q/(1 - f)
pb(1 - f) = q(1 + fb)
pb - pbf = q + qfb
pb - q = f(pb + qb) = fb(p + q) = fb
f* = (pb - q) / b = (bp - q) / b
```
**The Kelly Criterion**:
```
f* = (bp - q) / b
Where:
f* = Optimal fraction to bet
b = Net odds received
p = Win probability
q = 1 - p
```
### Alternative Form
**Edge** = Expected return per dollar bet = bp - q
**Kelly formula**: f* = Edge / Odds = (bp - q) / b
**Example**: p = 60%, b = 1.0 (even money)
- Edge = 0.6 × 1 - 0.4 = 0.2
- f* = 0.2 / 1 = 20%
### Optimality
**Second derivative**: d²g/df² < 0 at f = f* → Maximum confirmed
**Growth rate**: G(f*) maximizes long-run geometric growth
**Comparison**:
- f < f*: Lower growth (too conservative)
- f > f*: Lower growth (too aggressive, variance dominates)
- f > 2f*: Negative growth (eventual ruin)
---
## 2. Formula Variations
### Converting Market Odds
**Decimal odds** (e.g., 2.50): b = Decimal - 1 = 1.50
**American odds**:
- Positive (+150): b = 150/100 = 1.50
- Negative (-150): b = 100/150 = 0.667
**Fractional odds** (3/1): b = 3.0
**Implied probability**: Market p = 1/(b + 1)
### Multi-Outcome Bet
**Horse race**: Multiple options, bet on any with positive Kelly
**Formula for outcome i**:
```
f_i* = (p_i(b_i + 1) - 1) / b_i
If f_i* > 0: Bet f_i* on outcome i
If f_i* ≤ 0: Don't bet
```
### Continuous Outcomes (Merton's Formula)
**Stock market application**:
```
f* = μ / σ²
Where:
μ = Expected return (drift)
σ² = Variance of returns
```
**Example**: μ = 8%, σ = 20% → f* = 0.08/0.04 = 2.0 (200%, use leverage)
**Reality**: Too aggressive, use fractional Kelly → 50-100% more reasonable
---
## 3. Fractional Kelly
### Why Fractional Kelly?
**Problems with full Kelly**:
1. **Extreme volatility**: Wild swings, can lose 50%+ in bad runs
2. **Model error**: If probability estimate wrong, full Kelly overbets dramatically
3. **Practical ruin**: 20% chance of 50% drawdown before doubling
4. **Non-ergodic**: Most can't bet infinitely many times
### Formula
```
Fractional Kelly = f* × Fraction
Common choices:
- Half Kelly: f*/2
- Quarter Kelly: f*/4 (recommended)
- Third Kelly: f*/3
```
### Growth vs. Variance Trade-off
| Strategy | Growth Rate | Volatility | Max Drawdown |
|----------|-------------|------------|--------------|
| Full Kelly | 100% | 100% | -50% |
| Half Kelly | ~75% | 50% | -25% |
| Quarter Kelly | ~50% | 25% | -12% |
**Key**: Half Kelly gives 75% of growth with 25% of variance → Better risk-adjusted return
### Robustness to Error
**Example**: You think p = 0.60, true p = 0.55, even money bet
**Full Kelly** (f = 20%):
- Growth rate = 0.55×log(1.20) + 0.45×log(0.80) ≈ 0 (breakeven!)
**Half Kelly** (f = 10%):
- Growth rate = 0.55×log(1.10) + 0.45×log(0.90) ≈ 0.005 (still positive)
**Lesson**: Overbetting much worse than underbetting. Fractional Kelly provides buffer.
### Recommended Fractions
| Situation | Fraction | Reasoning |
|-----------|----------|-----------|
| Professional gambler | 1/4 to 1/3 | Reduces career risk |
| High model uncertainty | 1/4 or less | Error buffer crucial |
| High confidence | 1/2 to 2/3 | Can use more aggression |
| Institutional | 1/4 to 1/3 | Drawdown = career risk |
**Default**: Quarter Kelly (1/4) for most real-world situations
---
## 4. Extensions
### Multiple Simultaneous Bets
**Matrix form** (N assets):
```
f* = Σ⁻¹ × μ
Where:
Σ = Covariance matrix
μ = Expected returns vector
```
**Key insight**: Correlated bets reduce optimal sizing
**Heuristic**: Adjusted Kelly = Individual Kelly × (1 - ρ/2), where ρ = correlation
**Example**: ρ = 0.6, Individual Kelly = 15%
- Adjusted: 15% × (1 - 0.3) = 10.5%
### Correlated Outcomes
**Common correlations**:
- Political: Presidential + Senate races
- Sports: Team championship + Player MVP
- Markets: Tech stock A + Tech stock B
**Extreme cases**:
- ρ = 1 (perfect correlation): Only bet on one
- ρ = -1 (negative correlation): Bets hedge, can bet more
- ρ = 0 (independent): No adjustment
### Dynamic Kelly
**Problem**: Probability changes over time (new information)
**Process**:
1. Start with p₀, bet f₀*
2. New information → Update to p₁ (Bayesian)
3. Recalculate f₁*
4. Rebalance (adjust bet size)
**Consideration**: Transaction costs limit rebalancing frequency
---
## 5. Common Mistakes
### Mistake 1: Full Kelly Overbet
**The error**: Using full Kelly in practice
**Why wrong**: Assumes perfect probability estimate (never true)
**Impact**: Bet 2×f* → Negative growth rate
**Fix**: Always use fractional Kelly (1/4 to 1/2)
### Mistake 2: Ignoring Model Error
**The error**: Treating probability as certain
**Adjustment**:
```
Uncertain Kelly = f* × Confidence
Example: f* = 20%, 80% confident → Bet 16%
```
**Better**: Use fractional Kelly (implicitly adjusts)
### Mistake 3: Neglecting Bankruptcy
**Reality**: Finite games + estimation error → real ruin risk
**Drawdown stats** (full Kelly, p=0.55):
- 25% chance of -40% before recovery
- 10% chance of -50% before recovery
**Practical bankruptcy**: Client fires you, forced liquidation, can't maintain discipline
**Fix**: Use fractional Kelly, set stop-loss (if down 25%, pause)
### Mistake 4: Ignoring Correlation
**Example disaster**:
- 10 bets, each Kelly 10%
- All highly correlated (same theme)
- Bet 100% total → Single adverse event → Large loss
**Fix**: Measure correlations, use portfolio Kelly, diversify themes
### Mistake 5: Misestimating Odds
**Common confusion**:
- Decimal 2.0: b = 1.0 (not 2.0)
- "3-to-1": b = 3.0 ✓
- American +200: b = 2.0 (not 200)
**Fix**: Always convert to NET payout (b = total return - 1)
### Mistake 6: Static Bankroll
**Problem**: Calculate once, never update
**Fix**: Recalculate before each bet using current bankroll
---
## 6. Practical Implementation
### Step-by-Step Process
**1. Convert odds to decimal**:
```python
# Decimal odds: b = decimal - 1
# American +150: b = 150/100 = 1.50
# American -150: b = 100/150 = 0.667
# Fractional 3/1: b = 3.0
```
**2. Determine probability**: Use forecasting process (base rates, Bayesian updating, etc.)
**3. Calculate edge**:
```python
edge = net_odds * probability - (1 - probability)
```
**4. Calculate Kelly**:
```python
kelly_fraction = edge / net_odds
```
**5. Apply fractional Kelly**:
```python
fraction = 0.25 # Quarter Kelly recommended
adjusted_kelly = kelly_fraction * fraction
```
**6. Calculate bet size**:
```python
bet_size = current_bankroll * adjusted_kelly
```
**7. Execute and track**:
- Record: Date, event, probability, odds, edge, Kelly%, bet
- Set reminder for resolution
- Note new information
### Position Tracking Template
```
Date: 2024-01-15
Event: Candidate A wins
Your probability: 65%
Market odds: 2.20 (implied 45%)
Net odds (b): 1.20
Edge: 0.43 (43%)
Full Kelly: 35.8%
Fractional (1/4): 8.9%
Bankroll: $10,000
Bet size: $890
Resolution: 2024-11-05
```
---
## 7. Historical Examples
### Ed Thorp - Blackjack (1960s)
**Application**: Card counting edge varies with count → Dynamic Kelly
**Implementation**:
- True count +1: Edge ~0.5%, bet ~0.5% of bankroll
- True count +5: Edge ~2.5%, bet ~2.5% of bankroll
**Results**: Turned $10k into $100k+, proved Kelly works in practice
**Lessons**: Used fractional Kelly (~1/2), dynamic sizing, managed "heat" (detection risk)
### Princeton-Newport Partners (1970s-1980s)
**Strategy**: Statistical arbitrage, convertible bonds
**Kelly application**: 1-3% per position, 50-100 positions (diversification)
**Results**: 19.1% annual (1969-1988), only 4 down months in 19 years, <5% max drawdown
**Lessons**: Fractional Kelly + diversification = low volatility, dominant strategy
### Renaissance Technologies / Medallion Fund
**Strategy**: Thousands of small edges, high frequency
**Kelly application**:
- Each signal: 0.1-0.5% (tiny fractional Kelly)
- Portfolio: 10,000+ positions
- Leverage: 2-4× (portfolio Kelly supports with diversification)
**Results**: 66% annual (gross) over 30+ years, never down year
**Lessons**: Kelly optimal for repeated bets with edge. Diversification enables leverage. Discipline crucial.
### Warren Buffett (Implicit Kelly)
**Concentrated bets**: American Express (40%), Coca-Cola (25%), Apple (40%)
**Why Kelly-like**: High conviction → High p → Large Kelly → Large position
**Quote**: "Diversification is protection against ignorance."
**Lessons**: Kelly justifies concentration with edge. Still uses fractional (~40% max, not 100%).
---
## 8. Comparison to Other Methods
### Fixed Fraction
**Method**: Always bet same percentage
**Pros**: Simple, prevents ruin
**Cons**: Ignores edge, suboptimal growth
**When to use**: Don't trust probability estimates, want simplicity
### Martingale (Double After Loss)
**Method**: Double bet after each loss
**Fatal flaws**:
- Requires infinite bankroll
- Exponential growth (10 losses → need $10,240)
- Negative edge → lose faster
- Betting limits prevent recovery
**Conclusion**: **NEVER use**. Mathematically certain to fail.
### Fixed Amount
**Method**: Always bet same dollar amount
**Cons**: As bankroll changes, fraction changes inappropriately
**When to use**: Very small recreational betting
### Constant Proportion
**Method**: Fixed percentage, not optimized for edge
**Difference from Kelly**: Doesn't adjust for edge/odds
**Conclusion**: Better than fixed dollar, worse than Kelly
### Risk Parity
**Method**: Allocate to equalize risk contribution
**Difference from Kelly**: Doesn't use expected returns (ignores edge)
**When better**: Don't have reliable return estimates, defensive portfolio
**When Kelly better**: Have edge estimates, goal is growth
### Summary Comparison
| Method | Growth | Ruin Risk | When to Use |
|--------|--------|-----------|-------------|
| **Kelly** | Highest | None* | Active betting with edge |
| **Fractional Kelly** | High | Very low | **Real-world (recommended)** |
| **Fixed Fraction** | Medium | Low | Simple discipline |
| **Fixed Amount** | Low | Medium | Recreational only |
| **Martingale** | Negative | Certain | **NEVER** |
| **Risk Parity** | Low-Med | Low | Defensive portfolios |
*Kelly theoretically no ruin risk, but model error creates practical risk → use fractional Kelly
**Final Recommendation**: **Quarter Kelly (f*/4)** for nearly all real-world scenarios.
---
## Return to Main Skill
[← Back to Market Mechanics & Betting](../skill.md)
**Related resources**:
- [Betting Theory Fundamentals](betting-theory.md)
- [Scoring Rules and Calibration](scoring-rules.md)

View File

@@ -0,0 +1,494 @@
# Scoring Rules and Calibration
Comprehensive guide to proper scoring rules, calibration measurement, and forecast accuracy improvement.
## Table of Contents
1. [Proper Scoring Rules](#1-proper-scoring-rules)
2. [Brier Score Deep Dive](#2-brier-score-deep-dive)
3. [Log Score](#3-log-score-logarithmic-scoring-rule)
4. [Calibration Curves](#4-calibration-curves)
5. [Resolution Analysis](#5-resolution-analysis)
6. [Sharpness](#6-sharpness)
7. [Practical Calibration Training](#7-practical-calibration-training)
8. [Comparison Table](#8-comparison-table-of-scoring-rules)
---
## 1. Proper Scoring Rules
### What is a Scoring Rule?
A **scoring rule** assigns a numerical score to a probabilistic forecast based on the forecast and actual outcome.
**Purpose:** Measure accuracy, incentivize honesty, enable comparison, track calibration over time.
### Strictly Proper vs Quasi-Proper
**Strictly Proper:** Reporting your true belief uniquely maximizes your expected score. No other probability gives better expected score.
**Why it matters:** Incentivizes honesty, eliminates gaming, optimizes for accurate beliefs.
**Quasi-Proper:** True belief maximizes score, but other probabilities might tie. Less desirable for forecasting.
### Common Proper Scoring Rules
**1. Brier Score** (strictly proper)
```
Score = -(p - o)²
p = Your probability (0 to 1)
o = Outcome (0 or 1)
```
**2. Logarithmic Score** (strictly proper)
```
Score = log(p) if outcome occurs
Score = log(1-p) if outcome doesn't occur
```
**3. Spherical Score** (strictly proper)
```
Score = p / √(p² + (1-p)²) if outcome occurs
```
### Common IMPROPER Scoring Rules (Avoid)
**Absolute Error:** `Score = -|p - o|` → Incentivizes extremes (NOT proper)
**Threshold Accuracy:** Binary right/wrong → Ignores calibration (NOT proper)
**Example of gaming improper rules:**
```
Using absolute error (improper):
True belief: 60% → Optimal report: 100% (dishonest)
Using Brier score (proper):
True belief: 60% → Optimal report: 60% (honest)
```
**Key Principle:** Only use strictly proper scoring rules for forecast evaluation.
---
## 2. Brier Score Deep Dive
### Formula
**Single forecast:** `Brier = (p - o)²`
**Multiple forecasts:** `Brier = (1/N) × Σ(pi - oi)²`
**Range:** 0.00 (perfect) to 1.00 (worst). Lower is better.
### Calculation Examples
```
90% Yes → (0.90-1)² = 0.01 (good) | 90% No → (0.90-0)² = 0.81 (bad)
60% Yes → (0.60-1)² = 0.16 (medium) | 50% Any → 0.25 (baseline)
```
### Brier Score Decomposition
**Murphy Decomposition:**
```
Brier Score = Reliability - Resolution + Uncertainty
```
**Reliability (Calibration Error):** Are your probabilities correct on average? (Lower is better)
**Resolution:** Do you assign different probabilities to different outcomes? (Higher is better)
**Uncertainty:** Base rate variance (uncontrollable, depends on problem)
**Improving Brier:**
1. Minimize reliability (fix calibration)
2. Maximize resolution (differentiate forecasts)
### Brier Score Interpretation
| Brier Score | Quality | Description |
|-------------|---------|-------------|
| 0.00 - 0.05 | Exceptional | Near-perfect |
| 0.05 - 0.10 | Excellent | Top tier |
| 0.10 - 0.15 | Good | Skilled |
| 0.15 - 0.20 | Average | Better than random |
| 0.20 - 0.25 | Below Average | Approaching random |
| 0.25+ | Poor | At or worse than random |
**Context matters:** Easy questions expect lower scores. Compare to baseline (0.25) and other forecasters.
### Improving Your Brier Score
**Path 1: Fix Calibration**
**If overconfident:** 80% predictions happen 60% → Be less extreme, widen intervals
**If underconfident:** 60% predictions happen 80% → Be more extreme when you have evidence
**Path 2: Improve Resolution**
**Problem:** All forecasts near 50% → Differentiate easy vs hard questions, research more, be bold when warranted
**Balance:** `Good Forecaster = Well-Calibrated + High Resolution`
### Brier Skill Score
```
BSS = 1 - (Your Brier / Baseline Brier)
Example:
Your Brier: 0.12, Baseline: 0.25
BSS = 1 - 0.48 = 0.52 (52% improvement over baseline)
```
**Interpretation:** BSS = 1.00 (perfect), 0.00 (same as baseline), <0 (worse than baseline)
---
## 3. Log Score (Logarithmic Scoring Rule)
### Formula
```
Log Score = log₂(p) if outcome occurs
Log Score = log₂(1-p) if outcome doesn't occur
Range: -∞ (worst) to 0 (perfect)
Higher (less negative) is better
```
### Calculation Examples
```
90% Yes → -0.15 | 90% No → -3.32 (severe) | 50% Yes → -1.00
99% No → -6.64 (catastrophic penalty for overconfidence)
```
### Relationship to Information Theory
**Log score measures bits of surprise:**
```
Surprise = -log₂(p)
p = 50% → 1 bit surprise
p = 25% → 2 bits surprise
p = 12.5% → 3 bits surprise
```
**Connection to entropy:** Log score equals cross-entropy between forecast distribution and true outcome.
### When to Use Log Score vs Brier
**Use Log Score when:**
- Severe penalty for overconfidence desired
- Tail risk matters (rare events important)
- Information-theoretic interpretation useful
- Comparing probabilistic models
**Use Brier Score when:**
- Human forecasters (less punishing)
- Easier interpretation (squared error)
- Standard benchmark (more common)
- Avoiding extreme penalties
**Key Difference:**
Brier: Quadratic penalty (grows with square)
```
Error: 10% → 0.01, 20% → 0.04, 30% → 0.09, 40% → 0.16
```
Log: Logarithmic penalty (grows faster for extremes)
```
Forecast: 90% wrong → -3.3, 95% wrong → -4.3, 99% wrong → -6.6
```
**Recommendation:** Default to Brier. Add Log for high-stakes or to penalize overconfidence. Track both for complete picture.
---
## 4. Calibration Curves
### What is a Calibration Curve?
**Visualization of forecast accuracy:**
```
Y-axis: Actual frequency (how often outcome occurred)
X-axis: Stated probability (your forecasts)
Perfect calibration: Diagonal line (y = x)
```
**Example:**
```
Actual %
100 ┤
80 ┤ ●
60 ┤ ●
40 ┤ ● ← Perfect calibration line
20 ┤ ●
0 └───────────────────────
0 20 40 60 80 100
Stated probability %
```
### How to Create
**Step 1:** Collect 50+ forecasts and outcomes
**Step 2:** Bin by probability (0-10%, 10-20%, ..., 90-100%)
**Step 3:** For each bin, calculate actual frequency
```
Example: 60-70% bin
Forecasts: 15 total, Outcomes: 9 Yes, 6 No
Actual frequency: 9/15 = 60%
Plot point: (65, 60)
```
**Step 4:** Draw perfect calibration line (diagonal from (0,0) to (100,100))
**Step 5:** Compare points to line
### Over/Under Confidence Detection
**Overconfidence:** Points below diagonal (said 90%, happened 70%). Fix: Be less extreme, widen intervals.
**Underconfidence:** Points above diagonal (said 90%, happened 95%). Fix: Be more extreme when evidence is strong.
**Sample size:** <10/bin unreliable, 10-20 weak, 20-50 moderate, 50+ strong evidence
---
## 5. Resolution Analysis
### What is Resolution?
**Resolution** measures ability to assign different probabilities to outcomes that actually differ.
**High resolution:** Events you call 90% happen much more than events you call 10% (good)
**Low resolution:** All forecasts near 50%, can't discriminate (bad)
### Formula
```
Resolution = (1/N) × Σ nk(ok - ō)²
nk = Forecasts in bin k
ok = Actual frequency in bin k
ō = Overall base rate
Higher is better
```
### How to Improve Resolution
**Problem: Stuck at 50%**
Bad pattern: All forecasts 48-52% → Low resolution
Good pattern: Range from 20% to 90% → High resolution
**Strategies:**
1. **Gather discriminating information** - Find features that distinguish outcomes
2. **Use decomposition** - Fermi, causal models, scenarios
3. **Be bold when warranted** - If evidence strong → Say 85% not 65%
4. **Update with evidence** - Start with base rate, update with Bayesian reasoning
### Calibration vs Resolution Tradeoff
```
Perfect Calibration Only: Say 60% for everything when base rate is 60%
→ Calibration: Perfect
→ Resolution: Zero
→ Brier: 0.24 (bad)
High Resolution Only: Say 10% or 90% (extremes) incorrectly
→ Calibration: Poor
→ Resolution: High
→ Brier: Terrible
Optimal Balance: Well-calibrated AND high resolution
→ Calibration: Good
→ Resolution: High
→ Brier: Minimized
```
**Best forecasters:** Well-calibrated (low reliability error) + High resolution (discriminate events) = Low Brier
**Recommendation:** Don't sacrifice resolution for perfect calibration. Be bold when evidence warrants.
---
## 6. Sharpness
### What is Sharpness?
**Sharpness** = Tendency to make extreme predictions (away from 50%) when appropriate.
**Sharp:** Predicts 5% or 95% when evidence supports it (decisive)
**Unsharp:** Stays near 50% (plays it safe, indecisive)
### Why Sharpness Matters
```
Scenario: Base rate 60%
Unsharp forecaster: 50% for every event → Brier: 0.24, Usefulness: Low
Sharp forecaster: Range 20-90% → Brier: 0.12 (if calibrated), Usefulness: High
```
**Insight:** Extreme predictions (when accurate) improve Brier significantly. When wrong, hurt badly. Solution: Be sharp when you have evidence.
### Measuring Sharpness
```
Sharpness = Variance of forecast probabilities
Forecaster A: [0.45, 0.50, 0.48, 0.52, 0.49] → Var = 0.0007 (unsharp)
Forecaster B: [0.15, 0.85, 0.30, 0.90, 0.20] → Var = 0.1150 (sharp)
```
### When to Be Sharp
**Be sharp (extreme probabilities) when:**
- Strong discriminating evidence (multiple independent pieces align)
- Easy questions (outcome nearly certain)
- You have expertise (domain knowledge, track record)
**Stay moderate (near 50%) when:**
- High uncertainty (limited information, conflicting evidence)
- Hard questions (true probability near 50%)
- No expertise (unfamiliar domain)
**Goal:** Sharp AND well-calibrated (extreme when warranted, accurate probabilities)
---
## 7. Practical Calibration Training
### Calibration Exercises
**Exercise Set 1:** Make 10 forecasts on verifiable questions (fair coin 50%, Paris capital 99%, two heads 25%, die shows 6 at 16.67%). Check: Did 99% come true 9-10 times? Did 50% come true ~5 times?
**Exercise Set 2:** Make 20 "80% confident" predictions. Expected: 16/20 correct. Common: 12-14/20 (overconfident). What feels "80%" should be reported as "65%".
### Tracking Methods
**Method 1: Spreadsheet**
```
| Date | Question | Prob | Outcome | Brier | Notes |
Monthly: Calculate mean Brier
Quarterly: Generate calibration curve
```
**Method 2: Apps**
- PredictionBook.com (free, tracks calibration)
- Metaculus.com (forecasting platform)
- Good Judgment Open (tournament)
**Method 3: Focused Practice**
- Week 1: Make 20 predictions (focus on honesty)
- Week 2: Check calibration curve (identify bias)
- Week 3: Increase resolution (be bold)
- Week 4: Balance calibration + resolution
### Training Drills
**Drill 1:** Generate 10 "90% CIs" for unknowns. Target: 9/10 contain true value. Common mistake: Only 5-7 (overconfident). Fix: Widen by 1.5×.
**Drill 2:** Bayesian practice - State prior, observe evidence, update posterior, check calibration.
**Drill 3:** Make 10 predictions >80% or <20%. Force extremes when "pretty sure". Track: Are >80% happening >80%?
---
## 8. Comparison Table of Scoring Rules
### Summary
| Feature | Brier | Log | Spherical | Threshold |
|---------|-------|-----|-----------|-----------|
| **Proper** | Strictly | Strictly | Strictly | NO |
| **Range** | 0 to 1 | -∞ to 0 | 0 to 1 | 0 to 1 |
| **Penalty** | Quadratic | Logarithmic | Moderate | None |
| **Interpretation** | Squared error | Bits surprise | Geometric | Binary |
| **Usage** | Default | High-stakes | Rare | Avoid |
| **Human-friendly** | Yes | Somewhat | No | Yes (misleading) |
### Detailed Comparison
**Brier Score**
Pros: Easy to interpret, standard in competitions, moderate penalty, good for humans
Cons: Less severe penalty for overconfidence
Best for: General forecasting, calibration training, standard benchmarking
**Log Score**
Pros: Severe penalty for overconfidence, information-theoretic, strongly incentivizes honesty
Cons: Too punishing for humans, infinite at 0%/100%, less intuitive
Best for: High-stakes forecasting, penalizing overconfidence, ML models, tail risk
**Spherical Score**
Pros: Strictly proper, bounded, geometric interpretation
Cons: Uncommon, complex formula, rarely used
Best for: Theoretical analysis only
**Threshold / Binary Accuracy**
Pros: Very intuitive, easy to explain
Cons: NOT proper (incentivizes extremes), ignores calibration, can be gamed
Best for: Nothing (don't use for forecasting)
### When to Use Each
| Your Situation | Recommended |
|----------------|-------------|
| Starting out | **Brier** |
| Experienced forecaster | **Brier** or **Log** |
| High-stakes decisions | **Log** |
| Comparing to benchmarks | **Brier** |
| Building ML model | **Log** |
| Personal tracking | **Brier** |
| Teaching others | **Brier** |
**Recommendation:** Use **Brier** as default. Add **Log** for high-stakes or to penalize overconfidence.
### Conversion Example
**Forecast: 80%, Outcome: Yes**
```
Brier: (0.80-1)² = 0.04
Log (base 2): log₂(0.80) = -0.322
Spherical: 0.80/√(0.80²+0.20²) = 0.971
```
**Forecast: 80%, Outcome: No**
```
Brier: (0.80-0)² = 0.64
Log (base 2): log₂(0.20) = -2.322 (much worse penalty)
Spherical: 0.20/√(0.80²+0.20²) = 0.243
```
---
## Return to Main Skill
[← Back to Market Mechanics & Betting](../SKILL.md)
**Related Resources:**
- [Betting Theory Fundamentals](betting-theory.md)
- [Kelly Criterion Deep Dive](kelly-criterion.md)