Files
gh-iciakky-cc-general-skills/skills/debug/references/debugging_checklist.md
2025-11-29 18:47:55 +08:00

5.5 KiB

Debugging Checklist

This checklist provides detailed action items for each step of the debugging workflow.

Step 1: Observe Without Preconception ✓

Evidence Collection:

  • Review user's bug report or issue description
  • Examine error messages and stack traces
  • Check application logs (stderr, stdout, application-specific logs)
  • Review monitoring dashboards (if available)
  • Inspect recent code changes (git diff, git log)
  • Document current environment (OS, versions, dependencies)
  • Capture configuration files (config files, environment variables, CLI arguments)
  • Screenshot or record the error if visual
  • Note exact steps to reproduce

Documentation:

  • Create investigation log file
  • Record timestamp and initial observations
  • List all data sources consulted

Step 2: Classify and Isolate Facts ✓

Symptom Analysis:

  • List all observable symptoms
  • Distinguish symptoms from potential causes
  • Identify what changed recently (code, config, dependencies, infrastructure)

Scope Narrowing:

  • Test across different environments (dev, staging, production)
  • Test across different platforms (Windows, Linux, macOS)
  • Test across different browsers (if web application)
  • Test with different input data
  • Test with different configurations
  • Identify minimal reproduction case
  • Test with previous working version (regression testing)

Component Isolation:

  • List all involved components/modules
  • Mark components known to work correctly
  • Highlight suspicious components
  • Draw dependency diagram if complex

Step 3: Build Differential Diagnosis List ✓

Infrastructure Issues:

  • Network connectivity problems
  • DNS resolution failures
  • Load balancer misconfiguration
  • Firewall/security group blocking
  • Resource exhaustion (CPU, memory, disk)

Application Issues:

  • Cache staleness or corruption
  • Database connection pool exhaustion
  • Database deadlocks or slow queries
  • Third-party API failures or timeouts
  • Memory leaks
  • Race conditions or threading issues
  • Incorrect error handling
  • Invalid input validation

Configuration Issues:

  • Environment variable mismatch
  • Configuration file errors
  • Version incompatibility
  • Missing dependencies
  • Permission problems

Code Issues:

  • Logic errors in recent changes
  • Null pointer/undefined errors
  • Type mismatches
  • Off-by-one errors
  • Incorrect assumptions

Step 4: Apply Elimination and Deductive Reasoning ✓

Hypothesis Testing:

  • Rank hypotheses by likelihood
  • Design test for most likely hypothesis
  • Execute test and document result
  • If hypothesis invalidated, mark as eliminated
  • If hypothesis confirmed, design further verification
  • Move to next hypothesis if needed

Reasoning Documentation:

  • Document "If X, then Y" statements
  • Record why each hypothesis was eliminated
  • Note which tests ruled out which possibilities
  • Maintain chain of reasoning for review

Narrowing Down:

  • Eliminate external factors first (network, APIs)
  • Then infrastructure (resources, configuration)
  • Then application-level issues (cache, database)
  • Finally code-level issues (logic, types)

Step 5: Experimental Verification ✓

Preparation:

  • Create git branch for experiments
  • Backup current state (checkpoint)
  • Document experiment plan

Experimentation:

  • Add logging/instrumentation to suspected area
  • Add debug breakpoints if using debugger
  • Create controlled test case
  • Run experiment and capture output
  • Compare actual vs expected behavior

Research:

  • Search GitHub issues for similar problems
  • Check Stack Overflow for related questions
  • Review official documentation for edge cases
  • Check release notes for known issues
  • Consult language/framework changelog

Validation:

  • Can the issue be reproduced consistently?
  • Does the evidence match the hypothesis?
  • Are there alternative explanations?

Step 6: Locate and Implement Fix ✓

Root Cause Confirmation:

  • Identify exact file and line number
  • Understand why the code fails
  • Confirm this is root cause, not symptom

Solution Design:

  • Consider multiple fix approaches
  • Evaluate side effects of each approach
  • Choose most elegant and maintainable solution
  • Ensure fix doesn't introduce new issues

Implementation:

  • Implement the fix
  • Add comments explaining the fix
  • Update related documentation
  • Add test case to prevent regression

Verification:

  • Test the fix resolves original issue
  • Run existing test suite
  • Test edge cases
  • Verify no new issues introduced

Step 7: Prevention Mechanism ✓

Stability Verification:

  • Run full test suite
  • Perform integration testing
  • Test in staging environment
  • Monitor for unexpected behavior

Documentation:

  • Update CLAUDE.md or project docs
  • Document root cause
  • Document fix and reasoning
  • Add to knowledge base

Prevention Measures:

  • Add automated test for this scenario
  • Add validation/assertions to prevent recurrence
  • Update error messages for clarity
  • Add monitoring/alerting if applicable
  • Share learnings with team

Post-Mortem:

  • Review what went well
  • Identify what could improve
  • Update debugging procedures if needed
  • Celebrate the fix! 🎉