Files
gh-glittercowboy-taches-cc-…/skills/debug-like-expert/references/hypothesis-testing.md
2025-11-29 18:28:37 +08:00

12 KiB

Debugging is applied scientific method. You observe a phenomenon (the bug), form hypotheses about its cause, design experiments to test those hypotheses, and revise based on evidence. This isn't metaphorical - it's literal experimental science. A good hypothesis can be proven wrong. If you can't design an experiment that could disprove it, it's not a useful hypothesis.

Bad hypotheses (unfalsifiable):

  • "Something is wrong with the state"
  • "The timing is off"
  • "There's a race condition somewhere"
  • "The library is buggy"

Good hypotheses (falsifiable):

  • "The user state is being reset because the component remounts when the route changes"
  • "The API call completes after the component unmounts, causing the state update on unmounted component warning"
  • "Two async operations are modifying the same array without locking, causing data loss"
  • "The library's caching mechanism is returning stale data because our cache key doesn't include the timestamp"

The difference: Specificity. Good hypotheses make specific, testable claims.

<how_to_form> Process for forming hypotheses:

  1. Observe the behavior precisely

    • Not "it's broken"
    • But "the counter shows 3 when clicking once, should show 1"
  2. Ask "What could cause this?"

    • List every possible cause you can think of
    • Don't judge them yet, just brainstorm
  3. Make each hypothesis specific

    • Not "state is wrong"
    • But "state is being updated twice because handleClick is called twice"
  4. Identify what evidence would support/refute each

    • If hypothesis X is true, I should see Y
    • If hypothesis X is false, I should see Z
**Observation**: Button click sometimes saves data, sometimes doesn't.

Vague hypothesis: "The save isn't working reliably" Unfalsifiable, not specific

Specific hypotheses:

  1. "The save API call is timing out when network is slow"

    • Testable: Check network tab for timeout errors
    • Falsifiable: If all requests complete successfully, this is wrong
  2. "The save button is being double-clicked, and the second request overwrites with stale data"

    • Testable: Add logging to count clicks
    • Falsifiable: If only one click is registered, this is wrong
  3. "The save is successful but the UI doesn't update because the response is being ignored"

    • Testable: Check if API returns success
    • Falsifiable: If UI updates on successful response, this is wrong </how_to_form>

<experimental_design> An experiment is a test that produces evidence supporting or refuting a hypothesis.

Good experiments:

  • Test one hypothesis at a time
  • Have clear success/failure criteria
  • Produce unambiguous results
  • Are repeatable

Bad experiments:

  • Test multiple things at once
  • Have unclear outcomes ("maybe it works better?")
  • Rely on subjective judgment
  • Can't be reproduced
For each hypothesis, design an experiment:

1. Prediction: If hypothesis H is true, then I will observe X 2. Test setup: What do I need to do to test this? 3. Measurement: What exactly am I measuring? 4. Success criteria: What result confirms H? What result refutes H? 5. Run the experiment: Execute the test 6. Observe the result: Record what actually happened 7. Conclude: Does this support or refute H?

**Hypothesis**: "The component is re-rendering excessively because the parent is passing a new object reference on every render"

1. Prediction: If true, the component will re-render even when the object's values haven't changed

2. Test setup:

  • Add console.log in component body to count renders
  • Add console.log in parent to track when object is created
  • Add useEffect with the object as dependency to log when it changes

3. Measurement: Count of renders and object creations

4. Success criteria:

  • Confirms H: Component re-renders match parent renders, object reference changes each time
  • Refutes H: Component only re-renders when object values actually change

5. Run: Execute the code with logging

6. Observe:

[Parent] Created user object
[Child] Rendering (1)
[Parent] Created user object
[Child] Rendering (2)
[Parent] Created user object
[Child] Rendering (3)

7. Conclude: CONFIRMED. New object every parent render → child re-renders </experimental_design>

<evidence_quality> Not all evidence is equal. Learn to distinguish strong from weak evidence.

Strong evidence:

  • Directly observable ("I can see in the logs that X happens")
  • Repeatable ("This fails every time I do Y")
  • Unambiguous ("The value is definitely null, not undefined")
  • Independent ("This happens even in a fresh browser with no cache")

Weak evidence:

  • Hearsay ("I think I saw this fail once")
  • Non-repeatable ("It failed that one time but I can't reproduce it")
  • Ambiguous ("Something seems off")
  • Confounded ("It works after I restarted the server and cleared the cache and updated the package")
**Strong**: ```javascript console.log('User ID:', userId); // Output: User ID: undefined console.log('Type:', typeof userId); // Output: Type: undefined ``` Direct observation, unambiguous

Weak: "I think the user ID might not be set correctly sometimes" Vague, not verified, uncertain

Strong:

for (let i = 0; i < 100; i++) {
  const result = processData(testData);
  if (result !== expected) {
    console.log('Failed on iteration', i);
  }
}
// Output: Failed on iterations: 3, 7, 12, 23, 31...

Repeatable, shows pattern

Weak: "It usually works, but sometimes fails" Not quantified, no pattern identified </evidence_quality>

<decision_point> Don't act too early (premature fix) or too late (analysis paralysis).

Act when you can answer YES to all:

  1. Do you understand the mechanism?

    • Not just "what fails" but "why it fails"
    • Can you explain the chain of events that produces the bug?
  2. Can you reproduce it reliably?

    • Either always reproduces, or you understand the conditions that trigger it
    • If you can't reproduce, you don't understand it yet
  3. Do you have evidence, not just theory?

    • You've observed the behavior directly
    • You've logged the values, traced the execution
    • You're not guessing
  4. Have you ruled out alternatives?

    • You've considered other hypotheses
    • Evidence contradicts the alternatives
    • This is the most likely cause, not just the first idea

Don't act if:

  • "I think it might be X" - Too uncertain
  • "This could be the issue" - Not confident enough
  • "Let me try changing Y and see" - Random changes, not hypothesis-driven
  • "I'll fix it and if it works, great" - Outcome-based, not understanding-based
**Too early** (don't act): - Hypothesis: "Maybe the API is slow" - Evidence: None, just a guess - Action: Add caching - Result: Bug persists, now you have caching to debug too

Right time (act):

  • Hypothesis: "API response is missing the 'status' field when user is inactive, causing the app to crash"
  • Evidence:
    • Logged API response for active user: has 'status' field
    • Logged API response for inactive user: missing 'status' field
    • Logged app behavior: crashes on accessing undefined status
  • Action: Add defensive check for missing status field
  • Result: Bug fixed because you understood the cause </decision_point>
You will be wrong sometimes. This is normal. The skill is recovering gracefully.

When your hypothesis is disproven:

  1. Acknowledge it explicitly

    • "This hypothesis was wrong because [evidence]"
    • Don't gloss over it or rationalize
    • Intellectual honesty with yourself
  2. Extract the learning

    • What did this experiment teach you?
    • What did you rule out?
    • What new information do you have?
  3. Revise your understanding

    • Update your mental model
    • What does the evidence actually suggest?
  4. Form new hypotheses

    • Based on what you now know
    • Avoid just moving to "second-guess" - use the evidence
  5. Don't get attached to hypotheses

    • You're not your ideas
    • Being wrong quickly is better than being wrong slowly
**Initial hypothesis**: "The memory leak is caused by event listeners not being cleaned up"

Experiment: Check Chrome DevTools for listener counts Result: Listener count stays stable, doesn't grow over time

Recovery:

  1. "Event listeners are NOT the cause. The count doesn't increase."
  2. "I've ruled out event listeners as the culprit"
  3. "But the memory profile shows objects accumulating. What objects? Let me check the heap snapshot..."
  4. "New hypothesis: Large arrays are being cached and never released. Let me test by checking the heap for array sizes..."

This is good debugging. Wrong hypothesis, quick recovery, better understanding.

<multiple_hypotheses> Don't fall in love with your first hypothesis. Generate multiple alternatives.

Strategy: "Strong inference" - Design experiments that differentiate between competing hypotheses.

**Problem**: Form submission fails intermittently

Competing hypotheses:

  1. Network timeout
  2. Validation failure
  3. Race condition with auto-save
  4. Server-side rate limiting

Design experiment that differentiates:

Add logging at each stage:

try {
  console.log('[1] Starting validation');
  const validation = await validate(formData);
  console.log('[1] Validation passed:', validation);

  console.log('[2] Starting submission');
  const response = await api.submit(formData);
  console.log('[2] Response received:', response.status);

  console.log('[3] Updating UI');
  updateUI(response);
  console.log('[3] Complete');
} catch (error) {
  console.log('[ERROR] Failed at stage:', error);
}

Observe results:

  • Fails at [2] with timeout error → Hypothesis 1
  • Fails at [1] with validation error → Hypothesis 2
  • Succeeds but [3] has wrong data → Hypothesis 3
  • Fails at [2] with 429 status → Hypothesis 4

One experiment, differentiates between four hypotheses. </multiple_hypotheses>

``` 1. Observe unexpected behavior ↓ 2. Form specific hypotheses (plural) ↓ 3. For each hypothesis: What would prove/disprove? ↓ 4. Design experiment to test ↓ 5. Run experiment ↓ 6. Observe results ↓ 7. Evaluate: Confirmed, refuted, or inconclusive? ↓ 8a. If CONFIRMED → Design fix based on understanding 8b. If REFUTED → Return to step 2 with new hypotheses 8c. If INCONCLUSIVE → Redesign experiment or gather more data ```

Key insight: This is a loop, not a line. You'll cycle through multiple times. That's expected.

Pitfall: Testing multiple hypotheses at once

  • You change three things and it works
  • Which one fixed it? You don't know
  • Solution: Test one hypothesis at a time

Pitfall: Confirmation bias in experiments

  • You only look for evidence that confirms your hypothesis
  • You ignore evidence that contradicts it
  • Solution: Actively seek disconfirming evidence

Pitfall: Acting on weak evidence

  • "It seems like maybe this could be..."
  • Solution: Wait for strong, unambiguous evidence

Pitfall: Not documenting results

  • You forget what you tested
  • You repeat the same experiments
  • Solution: Write down each hypothesis and its result

Pitfall: Giving up on the scientific method

  • Under pressure, you start making random changes
  • "Let me just try this..."
  • Solution: Double down on rigor when pressure increases
**Great debuggers**: - Form multiple competing hypotheses - Design clever experiments that differentiate between them - Follow the evidence wherever it leads - Revise their beliefs when proven wrong - Act only when they have strong evidence - Understand the mechanism, not just the symptom

This is the difference between guessing and debugging.