Initial commit

2025-11-29 18:28:37 +08:00
commit ccc65b3f07
180 changed files with 53970 additions and 0 deletions
--- a/skills/debug-like-expert/references/hypothesis-testing.md
+++ b/skills/debug-like-expert/references/hypothesis-testing.md
@@ -0,0 +1,373 @@
+
+<overview>
+Debugging is applied scientific method. You observe a phenomenon (the bug), form hypotheses about its cause, design experiments to test those hypotheses, and revise based on evidence. This isn't metaphorical - it's literal experimental science.
+</overview>
+
+
+<principle name="falsifiability">
+A good hypothesis can be proven wrong. If you can't design an experiment that could disprove it, it's not a useful hypothesis.
+
+**Bad hypotheses** (unfalsifiable):
+- "Something is wrong with the state"
+- "The timing is off"
+- "There's a race condition somewhere"
+- "The library is buggy"
+
+**Good hypotheses** (falsifiable):
+- "The user state is being reset because the component remounts when the route changes"
+- "The API call completes after the component unmounts, causing the state update on unmounted component warning"
+- "Two async operations are modifying the same array without locking, causing data loss"
+- "The library's caching mechanism is returning stale data because our cache key doesn't include the timestamp"
+
+**The difference**: Specificity. Good hypotheses make specific, testable claims.
+</principle>
+
+<how_to_form>
+**Process for forming hypotheses**:
+
+1. **Observe the behavior precisely**
+   - Not "it's broken"
+   - But "the counter shows 3 when clicking once, should show 1"
+
+2. **Ask "What could cause this?"**
+   - List every possible cause you can think of
+   - Don't judge them yet, just brainstorm
+
+3. **Make each hypothesis specific**
+   - Not "state is wrong"
+   - But "state is being updated twice because handleClick is called twice"
+
+4. **Identify what evidence would support/refute each**
+   - If hypothesis X is true, I should see Y
+   - If hypothesis X is false, I should see Z
+
+<example>
+**Observation**: Button click sometimes saves data, sometimes doesn't.
+
+**Vague hypothesis**: "The save isn't working reliably"
+❌ Unfalsifiable, not specific
+
+**Specific hypotheses**:
+1. "The save API call is timing out when network is slow"
+   - Testable: Check network tab for timeout errors
+   - Falsifiable: If all requests complete successfully, this is wrong
+
+2. "The save button is being double-clicked, and the second request overwrites with stale data"
+   - Testable: Add logging to count clicks
+   - Falsifiable: If only one click is registered, this is wrong
+
+3. "The save is successful but the UI doesn't update because the response is being ignored"
+   - Testable: Check if API returns success
+   - Falsifiable: If UI updates on successful response, this is wrong
+</example>
+</how_to_form>
+
+
+<experimental_design>
+An experiment is a test that produces evidence supporting or refuting a hypothesis.
+
+**Good experiments**:
+- Test one hypothesis at a time
+- Have clear success/failure criteria
+- Produce unambiguous results
+- Are repeatable
+
+**Bad experiments**:
+- Test multiple things at once
+- Have unclear outcomes ("maybe it works better?")
+- Rely on subjective judgment
+- Can't be reproduced
+
+<framework>
+For each hypothesis, design an experiment:
+
+**1. Prediction**: If hypothesis H is true, then I will observe X
+**2. Test setup**: What do I need to do to test this?
+**3. Measurement**: What exactly am I measuring?
+**4. Success criteria**: What result confirms H? What result refutes H?
+**5. Run the experiment**: Execute the test
+**6. Observe the result**: Record what actually happened
+**7. Conclude**: Does this support or refute H?
+
+</framework>
+
+<example>
+**Hypothesis**: "The component is re-rendering excessively because the parent is passing a new object reference on every render"
+
+**1. Prediction**: If true, the component will re-render even when the object's values haven't changed
+
+**2. Test setup**:
+   - Add console.log in component body to count renders
+   - Add console.log in parent to track when object is created
+   - Add useEffect with the object as dependency to log when it changes
+
+**3. Measurement**: Count of renders and object creations
+
+**4. Success criteria**:
+   - Confirms H: Component re-renders match parent renders, object reference changes each time
+   - Refutes H: Component only re-renders when object values actually change
+
+**5. Run**: Execute the code with logging
+
+**6. Observe**:
+   ```
+   [Parent] Created user object
+   [Child] Rendering (1)
+   [Parent] Created user object
+   [Child] Rendering (2)
+   [Parent] Created user object
+   [Child] Rendering (3)
+   ```
+
+**7. Conclude**: CONFIRMED. New object every parent render → child re-renders
+</example>
+</experimental_design>
+
+
+<evidence_quality>
+Not all evidence is equal. Learn to distinguish strong from weak evidence.
+
+**Strong evidence**:
+- Directly observable ("I can see in the logs that X happens")
+- Repeatable ("This fails every time I do Y")
+- Unambiguous ("The value is definitely null, not undefined")
+- Independent ("This happens even in a fresh browser with no cache")
+
+**Weak evidence**:
+- Hearsay ("I think I saw this fail once")
+- Non-repeatable ("It failed that one time but I can't reproduce it")
+- Ambiguous ("Something seems off")
+- Confounded ("It works after I restarted the server and cleared the cache and updated the package")
+
+<examples>
+**Strong**:
+```javascript
+console.log('User ID:', userId); // Output: User ID: undefined
+console.log('Type:', typeof userId); // Output: Type: undefined
+```
+✅ Direct observation, unambiguous
+
+**Weak**:
+"I think the user ID might not be set correctly sometimes"
+❌ Vague, not verified, uncertain
+
+**Strong**:
+```javascript
+for (let i = 0; i < 100; i++) {
+  const result = processData(testData);
+  if (result !== expected) {
+    console.log('Failed on iteration', i);
+  }
+}
+// Output: Failed on iterations: 3, 7, 12, 23, 31...
+```
+✅ Repeatable, shows pattern
+
+**Weak**:
+"It usually works, but sometimes fails"
+❌ Not quantified, no pattern identified
+</examples>
+</evidence_quality>
+
+
+<decision_point>
+Don't act too early (premature fix) or too late (analysis paralysis).
+
+**Act when you can answer YES to all**:
+
+1. **Do you understand the mechanism?**
+   - Not just "what fails" but "why it fails"
+   - Can you explain the chain of events that produces the bug?
+
+2. **Can you reproduce it reliably?**
+   - Either always reproduces, or you understand the conditions that trigger it
+   - If you can't reproduce, you don't understand it yet
+
+3. **Do you have evidence, not just theory?**
+   - You've observed the behavior directly
+   - You've logged the values, traced the execution
+   - You're not guessing
+
+4. **Have you ruled out alternatives?**
+   - You've considered other hypotheses
+   - Evidence contradicts the alternatives
+   - This is the most likely cause, not just the first idea
+
+**Don't act if**:
+- "I think it might be X" - Too uncertain
+- "This could be the issue" - Not confident enough
+- "Let me try changing Y and see" - Random changes, not hypothesis-driven
+- "I'll fix it and if it works, great" - Outcome-based, not understanding-based
+
+<example>
+**Too early** (don't act):
+- Hypothesis: "Maybe the API is slow"
+- Evidence: None, just a guess
+- Action: Add caching
+- Result: Bug persists, now you have caching to debug too
+
+**Right time** (act):
+- Hypothesis: "API response is missing the 'status' field when user is inactive, causing the app to crash"
+- Evidence:
+  - Logged API response for active user: has 'status' field
+  - Logged API response for inactive user: missing 'status' field
+  - Logged app behavior: crashes on accessing undefined status
+- Action: Add defensive check for missing status field
+- Result: Bug fixed because you understood the cause
+</example>
+</decision_point>
+
+
+<recovery>
+You will be wrong sometimes. This is normal. The skill is recovering gracefully.
+
+**When your hypothesis is disproven**:
+
+1. **Acknowledge it explicitly**
+   - "This hypothesis was wrong because [evidence]"
+   - Don't gloss over it or rationalize
+   - Intellectual honesty with yourself
+
+2. **Extract the learning**
+   - What did this experiment teach you?
+   - What did you rule out?
+   - What new information do you have?
+
+3. **Revise your understanding**
+   - Update your mental model
+   - What does the evidence actually suggest?
+
+4. **Form new hypotheses**
+   - Based on what you now know
+   - Avoid just moving to "second-guess" - use the evidence
+
+5. **Don't get attached to hypotheses**
+   - You're not your ideas
+   - Being wrong quickly is better than being wrong slowly
+
+<example>
+**Initial hypothesis**: "The memory leak is caused by event listeners not being cleaned up"
+
+**Experiment**: Check Chrome DevTools for listener counts
+**Result**: Listener count stays stable, doesn't grow over time
+
+**Recovery**:
+1. ✅ "Event listeners are NOT the cause. The count doesn't increase."
+2. ✅ "I've ruled out event listeners as the culprit"
+3. ✅ "But the memory profile shows objects accumulating. What objects? Let me check the heap snapshot..."
+4. ✅ "New hypothesis: Large arrays are being cached and never released. Let me test by checking the heap for array sizes..."
+
+This is good debugging. Wrong hypothesis, quick recovery, better understanding.
+</example>
+</recovery>
+
+
+<multiple_hypotheses>
+Don't fall in love with your first hypothesis. Generate multiple alternatives.
+
+**Strategy**: "Strong inference" - Design experiments that differentiate between competing hypotheses.
+
+<example>
+**Problem**: Form submission fails intermittently
+
+**Competing hypotheses**:
+1. Network timeout
+2. Validation failure
+3. Race condition with auto-save
+4. Server-side rate limiting
+
+**Design experiment that differentiates**:
+
+Add logging at each stage:
+```javascript
+try {
+  console.log('[1] Starting validation');
+  const validation = await validate(formData);
+  console.log('[1] Validation passed:', validation);
+
+  console.log('[2] Starting submission');
+  const response = await api.submit(formData);
+  console.log('[2] Response received:', response.status);
+
+  console.log('[3] Updating UI');
+  updateUI(response);
+  console.log('[3] Complete');
+} catch (error) {
+  console.log('[ERROR] Failed at stage:', error);
+}
+```
+
+**Observe results**:
+- Fails at [2] with timeout error → Hypothesis 1
+- Fails at [1] with validation error → Hypothesis 2
+- Succeeds but [3] has wrong data → Hypothesis 3
+- Fails at [2] with 429 status → Hypothesis 4
+
+**One experiment, differentiates between four hypotheses.**
+</example>
+</multiple_hypotheses>
+
+
+<workflow>
+```
+1. Observe unexpected behavior
+     ↓
+2. Form specific hypotheses (plural)
+     ↓
+3. For each hypothesis: What would prove/disprove?
+     ↓
+4. Design experiment to test
+     ↓
+5. Run experiment
+     ↓
+6. Observe results
+     ↓
+7. Evaluate: Confirmed, refuted, or inconclusive?
+     ↓
+8a. If CONFIRMED → Design fix based on understanding
+8b. If REFUTED → Return to step 2 with new hypotheses
+8c. If INCONCLUSIVE → Redesign experiment or gather more data
+```
+
+**Key insight**: This is a loop, not a line. You'll cycle through multiple times. That's expected.
+</workflow>
+
+
+<pitfalls>
+
+**Pitfall: Testing multiple hypotheses at once**
+- You change three things and it works
+- Which one fixed it? You don't know
+- Solution: Test one hypothesis at a time
+
+**Pitfall: Confirmation bias in experiments**
+- You only look for evidence that confirms your hypothesis
+- You ignore evidence that contradicts it
+- Solution: Actively seek disconfirming evidence
+
+**Pitfall: Acting on weak evidence**
+- "It seems like maybe this could be..."
+- Solution: Wait for strong, unambiguous evidence
+
+**Pitfall: Not documenting results**
+- You forget what you tested
+- You repeat the same experiments
+- Solution: Write down each hypothesis and its result
+
+**Pitfall: Giving up on the scientific method**
+- Under pressure, you start making random changes
+- "Let me just try this..."
+- Solution: Double down on rigor when pressure increases
+</pitfalls>
+
+<excellence>
+**Great debuggers**:
+- Form multiple competing hypotheses
+- Design clever experiments that differentiate between them
+- Follow the evidence wherever it leads
+- Revise their beliefs when proven wrong
+- Act only when they have strong evidence
+- Understand the mechanism, not just the symptom
+
+This is the difference between guessing and debugging.
+</excellence>