12 KiB
Bayesian Analysis: Feature Adoption Forecast
Question
Hypothesis: New sharing feature will achieve >20% adoption within 3 months of launch
Estimating: P(adoption >20%)
Timeframe: 3 months post-launch (results measured at month 3)
Matters because: Need 20% adoption to justify ongoing development investment. Below 20%, we should sunset the feature and reallocate resources.
Prior Belief (Before Evidence)
Base Rate
What's the general frequency of similar features achieving >20% adoption?
- Reference class: Previous features we've launched in this product category
- Historical data:
- Last 8 features launched: 5 achieved >20% adoption (62.5%)
- Industry benchmarks: Social sharing features average 15-25% adoption
- Our product has higher engagement than average
- Base rate: 60%
Adjustments
How is this case different from the base rate?
-
Factor 1: Feature complexity - This feature is simpler than average (+5%)
- Previous successful features averaged 3 steps to use
- This feature is 1-click sharing
- Simpler features historically perform better
-
Factor 2: Market timing - Competitive pressure is high (-10%)
- Two competitors launched similar features 6 months ago
- Early adopters may have already switched to competitors
- Late-to-market features typically see 15-20% lower adoption
-
Factor 3: User research signals - Strong user request (+10%)
- Feature was #2 most requested in last user survey (450 responses)
- 72% said they would use it "frequently" or "very frequently"
- Strong stated intent typically correlates with 40-60% actual usage
Prior Probability
P(H) = 65%
Justification: Starting from 60% base rate, adjusted upward for simplicity (+5%) and strong user signals (+10%), adjusted down for late market entry (-10%). Net effect: 65% prior confidence that adoption will exceed 20%.
Range if uncertain: 55% to 75% (accounting for uncertainty in adjustment factors)
Evidence
What was observed: Beta test with 200 users showed 35% adoption (70 users actively used feature)
How diagnostic: This is moderately to strongly diagnostic evidence. Beta tests often show higher engagement than production (selection bias), but 35% is meaningfully above our 20% threshold. The question is whether this beta performance predicts production performance.
Likelihoods
P(E|H) = 75% - Probability of seeing 35% beta adoption IF true production adoption will be >20%
Reasoning:
- If production adoption will be >20%, beta should show higher (beta users are early adopters)
- Typical pattern: beta adoption is 1.5-2x production adoption for engaged features
- If production will be 22%, beta would likely be 33-44% → 35% fits this well
- If production will be 25%, beta would likely be 38-50% → 35% is on lower end but plausible
- 75% accounts for variance and beta-to-production conversion uncertainty
P(E|¬H) = 15% - Probability of seeing 35% beta adoption IF true production adoption will be ≤20%
Reasoning:
- If production adoption will be ≤20% (say, 15%), beta would typically be 22-30%
- Seeing 35% beta when production will be ≤20% would require unusual beta-to-production drop
- This could happen (beta selection bias, novelty effect wears off), but is uncommon
- 15% reflects that this scenario is possible but unlikely
Likelihood Ratio = 75% / 15% = 5.0
Interpretation: Evidence is moderately strong. A 35% beta result is 5 times more likely if production adoption will exceed 20% than if it won't. This is meaningful but not overwhelming evidence.
Bayesian Update
Calculation
Using odds form (simpler for this case):
Prior Odds = P(H) / P(¬H) = 65% / 35% = 1.86
Likelihood Ratio = 5.0
Posterior Odds = Prior Odds × LR = 1.86 × 5.0 = 9.3
Posterior Probability = Posterior Odds / (1 + Posterior Odds)
= 9.3 / 10.3
= 90.3%
Verification using probability form:
P(E) = [P(E|H) × P(H)] + [P(E|¬H) × P(¬H)]
P(E) = [75% × 65%] + [15% × 35%]
P(E) = 48.75% + 5.25% = 54%
P(H|E) = [P(E|H) × P(H)] / P(E)
P(H|E) = [75% × 65%] / 54%
P(H|E) = 48.75% / 54% = 90.3%
Posterior Probability
P(H|E) = 90%
Change in Belief
- Prior: 65%
- Posterior: 90%
- Change: +25 percentage points
- Interpretation: Evidence strongly supports hypothesis. Beta test results meaningfully increased confidence that production adoption will exceed 20%.
Sensitivity Analysis
How sensitive is posterior to inputs?
If Prior was different:
| Prior | Posterior | Note |
|---|---|---|
| 50% | 83% | Even starting at coin-flip, evidence pushes to high confidence |
| 75% | 94% | Higher prior → very high posterior |
| 40% | 77% | Lower prior → still high confidence |
Finding: Posterior is somewhat robust. Evidence is strong enough that even with priors ranging from 40-75%, posterior stays in 77-94% range.
If P(E|H) was different:
| P(E|H) | LR | Posterior | Note |
|---|---|---|---|
| 60% | 4.0 | 87% | Less diagnostic evidence → still high confidence |
| 85% | 5.67 | 92% | More diagnostic evidence → very high confidence |
| 50% | 3.33 | 82% | Weaker evidence → moderate-high confidence |
Finding: Posterior is moderately sensitive to P(E|H), but stays above 80% across plausible range.
If P(E|¬H) was different:
| P(E|¬H) | LR | Posterior | Note |
|---|---|---|---|
| 25% | 3.0 | 84% | Less diagnostic → still high confidence |
| 10% | 7.5 | 94% | More diagnostic → very high confidence |
| 30% | 2.5 | 80% | Weak evidence → moderate confidence |
Finding: Posterior is sensitive to P(E|¬H). If beta-to-production drop is common (higher P(E|¬H)), confidence decreases meaningfully.
Robustness: Conclusion is moderately robust. Across reasonable input ranges, posterior stays above 77%, supporting launch decision. Most sensitive to assumption about beta-to-production conversion rates.
Calibration Check
Am I overconfident?
-
Did I anchor on initial belief?
- No - prior (65%) was based on base rates, not arbitrary
- Evidence substantially moved belief (+25pp)
- Not stuck at starting point
-
Did I ignore base rates?
- No - explicitly used historical feature adoption (60%) as starting point
- Adjusted for known differences systematically
-
Is my posterior extreme (>90% or <10%)?
- Yes - 90% is borderline extreme
- Check: Is evidence truly that strong?
- LR = 5.0 is moderately strong (not very strong)
- Prior was already high (65%)
- Combination pushes to 90%
- Concern: May be slightly overconfident
- Adjustment: Consider reporting as 85-90% range rather than point estimate
-
Would an outside observer agree with my likelihoods?
- P(E|H) = 75%: Reasonable - beta users are engaged, expect higher than production
- P(E|¬H) = 15%: Potentially optimistic - beta selection bias could be stronger
- Alternative: If P(E|¬H) = 25%, posterior drops to 84% (more conservative)
Red flags:
- ✓ Posterior is not 100% or 0%
- ✓ Update magnitude (25pp) matches evidence strength (LR=5.0)
- ✓ Prior uses base rates
- ⚠ Posterior is at upper end (90%) - consider uncertainty range
Calibration adjustment: Report as 85-90% confidence range to account for uncertainty in likelihoods.
Limitations & Assumptions
Key assumptions:
-
Beta users are representative of broader user base
- Assumption: Beta users are 1.5-2x more engaged than average
- Risk: If beta users are much more engaged (3x), production adoption could be lower
- Impact: Could invalidate high posterior
-
No major bugs or UX issues in production
- Assumption: Production experience will match beta experience
- Risk: Unforeseen technical issues could crater adoption
- Impact: Would make evidence misleading
-
Competitive landscape stays stable
- Assumption: No major competitor moves in next 3 months
- Risk: Competitor could launch superior version
- Impact: Could reduce adoption below 20% despite strong beta
-
Beta sample size is sufficient (n=200)
- Assumption: 200 users is enough to estimate adoption
- Confidence interval: 35% ± 6.6% at 95% CI
- Impact: True beta adoption could be 28-42%, adding uncertainty
What could invalidate this analysis:
- Major product changes: If we significantly alter the feature post-beta, beta results become less predictive
- Different user segment: If we launch to a different user segment than beta testers, adoption patterns may differ
- Seasonal effects: If beta ran during high-engagement season and launch is during low season
- Discovery/onboarding issues: If users don't discover the feature in production (beta users were explicitly invited)
Uncertainty:
-
Most uncertain about: P(E|¬H) = 15% - How often do features with ≤20% production adoption show 35% beta adoption?
- This is the key assumption
- If this is actually 25-30%, posterior drops to 80-84%
- Recommendation: Review historical beta-to-production conversion data
-
Could be wrong if:
- Beta users are much more engaged than typical users (>2x multiplier)
- Novelty effect in beta wears off quickly in production
- Production launch has poor discoverability/onboarding
Decision Implications
Given posterior of 90% (range: 85-90%):
Recommended action: Proceed with launch with monitoring plan
Rationale:
- 90% confidence exceeds decision threshold for feature launches
- Even conservative estimate (85%) supports launch
- Risk of failure (<20% adoption) is only 10-15%
- Cost of being wrong: Wasted 3 months of development effort
- Cost of not launching: Missing potential high-adoption feature
If decision threshold is:
-
High confidence needed (>80%): ✅ LAUNCH - Exceeds threshold, proceed with production rollout
-
Medium confidence (>60%): ✅ LAUNCH - Well above threshold, strong conviction
-
Low bar (>40%): ✅ LAUNCH - Far exceeds minimum threshold
Monitoring plan (to validate forecast):
-
Week 1: Check if adoption is on track for >6% (20% / 3 months, assuming linear growth)
- If <4%: Red flag, investigate onboarding/discovery issues
- If >8%: Exceeding expectations, validate data quality
-
Month 1: Check if adoption is trending toward >10%
- If <7%: Update forecast downward, consider intervention
- If >13%: Exceeding expectations, high confidence
-
Month 3: Measure final adoption
- If <20%: Analyze what went wrong, calibrate future forecasts
- If >20%: Validate forecast accuracy, update priors for future features
Next evidence to gather:
- Historical beta-to-production conversion rates: Review last 5-10 feature launches to calibrate P(E|¬H) more accurately
- User segment analysis: Compare beta user demographics to production user base
- Competitive feature adoption: Check competitors' sharing feature adoption rates
- Early production data: After 1 week of production, use actual adoption data for next Bayesian update
What would change our mind:
- Week 1 adoption <3%: Would update posterior down to ~60%, trigger investigation
- Competitor launches superior feature: Would need to recalculate with new competitive landscape
- Discovery of major beta sampling bias: If beta users are 5x more engaged, would significantly reduce confidence
Meta: Forecast Quality Assessment
Using rubric from rubric_bayesian_reasoning_calibration.json:
Self-assessment:
- Prior Quality: 4/5 (good base rate usage, clear adjustments)
- Likelihood Justification: 4/5 (clear reasoning, could use more empirical data)
- Evidence Diagnosticity: 4/5 (LR=5.0 is moderately strong)
- Calculation Correctness: 5/5 (verified with both odds and probability forms)
- Calibration & Realism: 3/5 (posterior is 90%, borderline extreme, flagged for review)
- Assumption Transparency: 4/5 (key assumptions stated clearly)
- Base Rate Usage: 5/5 (explicit base rate from historical data)
- Sensitivity Analysis: 4/5 (comprehensive sensitivity checks)
- Interpretation Quality: 4/5 (clear decision implications with thresholds)
- Avoidance of Common Errors: 4/5 (no prosecutor's fallacy, proper base rates)
Average: 4.1/5 - Meets "very good" threshold for medium-stakes decision
Decision: Forecast is sufficiently rigorous for feature launch decision (medium stakes). Primary area for improvement: gather more data on beta-to-production conversion to refine P(E|¬H) estimate.