gh-glittercowboy-taches-cc-resources/skills/debug-like-expert/references/investigation-techniques.md at ccc65b3f07f345385b8c6407b4df11b139fe0b63

zhongwei/gh-glittercowboy-taches-cc-resources

Files

Zhongwei Li ccc65b3f07 Initial commit

2025-11-29 18:28:37 +08:00

9.7 KiB

Raw Blame History

These are systematic approaches to narrowing down bugs. Each technique is a tool in your debugging toolkit. The skill is knowing which tool to use when. **When to use**: Large codebase, long execution path, or many possible failure points.

How it works: Cut the problem space in half repeatedly until you isolate the issue.

In practice:

Identify the boundaries: Where does it work? Where does it fail?
Find the midpoint: Add logging/testing at the middle of the execution path
Determine which half: Does the bug occur before or after the midpoint?
Repeat: Cut that half in half, test again
Converge: Keep halving until you find the exact line

Problem: API request returns wrong data

Test: Does the data leave the database correctly? YES
Test: Does the data reach the frontend correctly? NO
Test: Does the data leave the API route correctly? YES
Test: Does the data survive serialization? NO
Found it: Bug is in the serialization layer

You just eliminated 90% of the code in 4 tests.

**Variant**: Commenting out code to find the breaking change.

Comment out the second half of a function
Does it work now? The bug is in the commented section
Uncomment half of that, repeat
Converge on the problematic lines

Warning: Only works for code you can safely comment out. Don't use for initialization code.

**When to use**: You're stuck, confused, or your mental model doesn't match reality.

How it works: Explain the problem out loud (to a rubber duck, a colleague, or in writing) in complete detail.

Why it works: Articulating forces you to:

Make assumptions explicit
Notice gaps in your understanding
Hear how convoluted your explanation sounds
Realize what you haven't actually verified

In practice:

Write or say out loud:

"The system should do X"
"Instead it does Y"
"I think this is because Z"
"The code path is: A → B → C → D"
"I've verified that..." (List what you've actually tested)
"I'm assuming that..." (List assumptions)

Often you'll spot the bug mid-explanation: "Wait, I never actually verified that B returns what I think it does."

"So when the user clicks the button, it calls handleClick, which dispatches an action, which... wait, does the reducer actually handle this action type? Let me check... Oh. The reducer is looking for 'UPDATE_USER' but I'm dispatching 'USER_UPDATE'." **When to use**: Complex system, many moving parts, unclear which part is failing.

How it works: Strip away everything until you have the smallest possible code that reproduces the bug.

Why it works:

Removes distractions
Isolates the actual issue
Often reveals the bug during the stripping process
Makes it easier to reason about

Process:

Copy the failing code to a new file
Remove one piece (a dependency, a function, a feature)
Test: Does it still reproduce?
- YES: Keep it removed, continue
- NO: Put it back, it's needed
Repeat until you have the bare minimum
The bug is now obvious in the stripped-down code

Start with: 500-line React component with 15 props, 8 hooks, 3 contexts

End with:

function MinimalRepro() {
  const [count, setCount] = useState(0);

  useEffect(() => {
    setCount(count + 1); // Bug: infinite loop, missing dependency array
  });

  return <div>{count}</div>;
}

The bug was hidden in complexity. Minimal reproduction made it obvious.

**When to use**: You know what the correct output should be, but don't know why you're not getting it.

How it works: Start from the desired end state and trace backwards through the execution path.

Process:

Define the desired output precisely
Ask: What function produces this output?
Test that function: Give it the input it should receive. Does it produce correct output?
- YES: The bug is earlier (wrong input to this function)
- NO: The bug is here
Repeat backwards through the call stack
Find the divergence point: Where does expected vs actual first differ?

Problem: UI shows "User not found" when user exists

Trace backwards:

UI displays: user.error → Is this the right value to display? YES
Component receives: user.error = "User not found" → Is this correct? NO, should be null
API returns: { error: "User not found" } → Why?
Database query: SELECT * FROM users WHERE id = 'undefined' → AH!
Found it: The user ID is 'undefined' (string) instead of a number

Working backwards revealed the bug was in how the ID was passed to the query.

**When to use**: Something used to work and now doesn't. A feature works in one environment but not another.

How it works: Compare the working vs broken states to find what's different.

Questions to ask:

Time-based (it worked, now it doesn't):

What changed in the code since it worked?
What changed in the environment? (Node version, OS, dependencies)
What changed in the data? (Database schema, API responses)
What changed in the configuration?

Environment-based (works in dev, fails in prod):

What's different between environments?
Configuration values
Environment variables
Network conditions
Data volume
Third-party service behavior

Process:

Make a list of differences between working and broken
Test each difference in isolation
Find the difference that causes the failure
That difference reveals the root cause

Works locally, fails in CI:

Differences:

Node version: Same ✓
Environment variables: Same ✓
Timezone: Different! ✗

Test: Set local timezone to UTC (like CI) Result: Now fails locally too

Found it: Date comparison logic assumes local timezone

**When to use**: Always. Before making any fix.

Why it matters: You can't fix what you can't see. Add visibility before changing behavior.

Approaches:

1. Strategic logging

// Not this (useless):
console.log('in function');

// This (useful):
console.log('[handleSubmit] Input:', { email, password: '***' });
console.log('[handleSubmit] Validation result:', validationResult);
console.log('[handleSubmit] API response:', response);

2. Assertion checks

function processUser(user) {
  console.assert(user !== null, 'User is null!');
  console.assert(user.id !== undefined, 'User ID is undefined!');
  // ... rest of function
}

3. Timing measurements

console.time('Database query');
const result = await db.query(sql);
console.timeEnd('Database query');

4. Stack traces at key points

console.log('[updateUser] Called from:', new Error().stack);

The workflow:

Add logging/instrumentation at suspected points
Run the code
Observe the output
Form hypothesis based on what you see
Only then make changes

Don't code in the dark. Light up the execution path first.

**When to use**: Many possible interactions, unclear which code is causing the issue.

How it works:

Comment out everything in a function/file
Verify the bug is gone
Uncomment one piece at a time
After each uncomment, test
When the bug returns, you found the culprit

Variant: For config files, reset to defaults and add back one setting at a time.

Problem: Some middleware breaks requests, but you have 8 middleware functions.

app.use(helmet()); // Uncomment, test → works
app.use(cors()); // Uncomment, test → works
app.use(compression()); // Uncomment, test → works
app.use(bodyParser.json({ limit: '50mb' })); // Uncomment, test → BREAKS

// Found it: Body size limit too high causes memory issues

**When to use**: Feature worked in the past, broke at some unknown commit.

How it works: Binary search through git history to find the breaking commit.

Process:

git bisect start

git bisect bad

git bisect good abc123

git bisect bad

git bisect good

Why it's powerful: Turns "it broke sometime in the last 100 commits" into "it broke in commit abc123" in ~7 tests (log₂ 100 ≈ 7).

100 commits between working and broken Manual testing: 100 commits to check Git bisect: 7 commits to check

Time saved: Massive

<decision_tree> Large codebase, many files: → Binary search / Divide and conquer

Confused about what's happening: → Rubber duck debugging → Observability first (add logging)

Complex system with many interactions: → Minimal reproduction

Know the desired output: → Working backwards

Used to work, now doesn't: → Differential debugging → Git bisect

Many possible causes: → Comment out everything → Binary search

Always: → Observability first before making changes </decision_tree>

<combining_techniques> Often you'll use multiple techniques together:

Differential debugging to identify what changed
Binary search to narrow down where in the code
Observability first to add logging at that point
Rubber duck to articulate what you're seeing
Minimal reproduction to isolate just that behavior
Working backwards to find the root cause

Techniques compose. Use as many as needed. </combining_techniques>

9.7 KiB Raw Blame History

9.7 KiB

Raw Blame History