zhongwei/gh-iciakky-cc-general-skills

Fork 0

Files

Zhongwei Li e732da8316 Initial commit

2025-11-29 18:47:55 +08:00

25 KiB

Raw Blame History

Systematic Debugging Methodology

Guiding Principle: Occam's Razor for Debugging

When facing mysterious errors, the root cause is almost always simpler than it appears.

Common error distribution in real-world debugging:

70%: Configuration issues (wrong parameters, missing flags, incorrect paths)
20%: Environment issues (missing dependencies, version mismatches, path problems)
8%: Data format issues (encoding, structure assumptions)
2%: Actual bugs in external APIs or fundamental incompatibilities

Core Rule: Exhaust simple hypotheses before considering complex ones. Authentication failures are configuration problems 99% of the time, not API design flaws.

1. Hypothesis Priority Framework

Always Start Here: The "Boring Checklist"

Before investigating complex hypotheses, verify these fundamentals:

Configuration Loading
- Are environment variables actually loaded? (Not just "file exists")
- Are config files in the correct location relative to execution path?
- Are there hidden default parameters overriding your explicit config?
Exact Version/Name Matching
- Library versions exact match with documentation examples?
- API endpoint names exactly correct (not similar, not partially matching)?
- Model/service names character-perfect? (e.g., preview-09-2025 ≠ preview-2025-09)
First-Hand Error Visibility
- Have you personally executed the failing code?
- Have you seen the complete error output (not summarized or truncated)?
- Are there error details hidden in non-obvious places (WebSocket close codes, HTTP headers)?

Hypothesis Ordering Template

## Problem: [Description]

### Priority 1: Configuration Issues (Check First)
- [ ] Hypothesis 1a: Config file not loaded from expected path
      Verification: Add logging immediately after config load
- [ ] Hypothesis 1b: SDK using unexpected default parameters
      Verification: Log all parameters passed to SDK constructor
- [ ] Hypothesis 1c: Authentication credentials format issue
      Verification: Check credential string length, prefix, encoding

### Priority 2: Environment Issues
- [ ] Hypothesis 2a: Dependency version mismatch
      Verification: Check package.json vs installed versions
- [ ] Hypothesis 2b: Runtime environment differences
      Verification: Compare working vs failing environment variables

### Priority 3: Data Format Issues
- [ ] Hypothesis 3a: Incorrect data structure assumptions
      Verification: Log raw data structure, don't assume
- [ ] Hypothesis 3b: Encoding mismatch (UTF-8, base64, binary)
      Verification: Inspect first few bytes/characters

### Priority 4: Complex Issues (Only After Above Exhausted)
- [ ] Hypothesis 4a: API limitation or bug
      Verification: Find official documentation or working examples
- [ ] Hypothesis 4b: Fundamental incompatibility
      Verification: Find counter-examples proving it can work

2. Isolation Strategy: Diagnostic Scripts

The Most Powerful Debugging Technique: Create minimal, single-purpose scripts that test ONE thing at a time.

Why This Works

Eliminates confounding variables: Complex scripts have multiple failure points
Provides clear pass/fail criteria: If this 10-line script fails, the problem is X
Builds confidence incrementally: Each passing diagnostic narrows the search space
Creates reusable verification tools: Same scripts can verify fixes

Diagnostic Script Principles

One Variable Per Script: Test configuration loading separately from API calls separately from data processing
Verbose Logging: Log every step, every value, every assumption
Explicit Success Criteria: Script should clearly output PASS or FAIL
No Assumptions: Check everything explicitly, even "obvious" things

Template: Configuration Diagnostic

// diagnose-config.js
// Purpose: Verify configuration loading and format
// Success criteria: All checks pass with ✓

console.log('=== Configuration Diagnostic ===\n');

// 1. Environment
console.log('--- Environment ---');
console.log('Node version:', process.version);
console.log('Working directory:', process.cwd());
console.log('Script location:', __dirname);

// 2. Config File Loading
console.log('\n--- Config File ---');
const fs = require('fs');
const path = require('path');

const configPath = path.join(__dirname, '.env');
console.log('Expected path:', configPath);
console.log('File exists:', fs.existsSync(configPath));

if (fs.existsSync(configPath)) {
  const content = fs.readFileSync(configPath, 'utf-8');
  console.log('File size:', content.length, 'bytes');
  console.log('Lines:', content.split('\n').length);
}

// 3. Environment Variable
require('dotenv').config({ path: configPath });
console.log('\n--- Environment Variable ---');
console.log('API_KEY defined:', !!process.env.API_KEY);
console.log('API_KEY length:', process.env.API_KEY?.length);
console.log('API_KEY prefix:', process.env.API_KEY?.slice(0, 4));
console.log('API_KEY has whitespace:', /\s/.test(process.env.API_KEY || ''));

// 4. Format Validation
console.log('\n--- Format Validation ---');
const expectedPrefix = 'AIza'; // Example for Google API keys
const expectedLength = 39;

const prefixMatch = process.env.API_KEY?.startsWith(expectedPrefix);
const lengthMatch = process.env.API_KEY?.length === expectedLength;

console.log(prefixMatch ? '✓' : '✗', 'Prefix matches expected format');
console.log(lengthMatch ? '✓' : '✗', 'Length matches expected format');

// 5. SDK Constructor (No Network Call)
console.log('\n--- SDK Initialization ---');
try {
  const SDK = require('./sdk'); // Your SDK
  const client = new SDK({
    apiKey: process.env.API_KEY,
    // Log all parameters, including defaults
  });
  console.log('✓ SDK initialized without errors');
  console.log('Client config:', JSON.stringify(client.config, null, 2));
} catch (error) {
  console.log('✗ SDK initialization failed:', error.message);
}

console.log('\n=== Diagnostic Complete ===');

Template: Data Structure Diagnostic

// diagnose-data-structure.js
// Purpose: Inspect actual data structure without assumptions
// Success criteria: Understand exact structure and location of target data

function inspectObject(obj, path = 'root', maxDepth = 3, currentDepth = 0) {
  const indent = '  '.repeat(currentDepth);

  console.log(`${indent}${path}:`);
  console.log(`${indent}  Type: ${typeof obj}`);
  console.log(`${indent}  Null: ${obj === null}`);
  console.log(`${indent}  Undefined: ${obj === undefined}`);

  if (obj === null || obj === undefined) return;

  if (typeof obj === 'object') {
    if (Buffer.isBuffer(obj)) {
      console.log(`${indent}  [Buffer: ${obj.length} bytes]`);
      console.log(`${indent}  First 20 bytes: ${obj.slice(0, 20).toString('hex')}`);
      return;
    }

    if (Array.isArray(obj)) {
      console.log(`${indent}  [Array: ${obj.length} items]`);
      if (obj.length > 0 && currentDepth < maxDepth) {
        inspectObject(obj[0], `${path}[0]`, maxDepth, currentDepth + 1);
      }
      return;
    }

    const keys = Object.keys(obj);
    console.log(`${indent}  Keys: [${keys.join(', ')}]`);

    if (currentDepth < maxDepth) {
      for (const key of keys.slice(0, 5)) { // Limit to first 5 keys
        inspectObject(obj[key], `${path}.${key}`, maxDepth, currentDepth + 1);
      }
    }
  } else if (typeof obj === 'string') {
    console.log(`${indent}  Length: ${obj.length}`);
    console.log(`${indent}  Preview: "${obj.slice(0, 50)}${obj.length > 50 ? '...' : ''}"`);

    // Check encoding hints
    const isBase64Like = /^[A-Za-z0-9+/=]+$/.test(obj);
    const isHexLike = /^[0-9a-fA-F]+$/.test(obj);
    console.log(`${indent}  Looks like base64: ${isBase64Like}`);
    console.log(`${indent}  Looks like hex: ${isHexLike}`);
  } else {
    console.log(`${indent}  Value: ${obj}`);
  }
}

// Usage example with API response
const response = getAPIResponse(); // Your actual API call
console.log('=== Raw Response Structure ===\n');
inspectObject(response);

// Specific path investigation
console.log('\n=== Investigating Suspected Path ===');
console.log('response.data exists:', !!response.data);
console.log('response.body exists:', !!response.body);
console.log('response.payload exists:', !!response.payload);

// If you're looking for binary data
console.log('\n=== Binary Data Search ===');
function findBuffers(obj, path = 'root') {
  if (Buffer.isBuffer(obj)) {
    console.log(`Found Buffer at: ${path} (${obj.length} bytes)`);
    return;
  }
  if (typeof obj === 'object' && obj !== null) {
    for (const key in obj) {
      findBuffers(obj[key], `${path}.${key}`);
    }
  }
}
findBuffers(response);

Template: API Interaction Diagnostic

// diagnose-api-minimal.js
// Purpose: Minimal API call to isolate authentication from functionality
// Success criteria: Connection succeeds, response received (any response)

console.log('=== Minimal API Test ===\n');

const API = require('./api-client');

async function testMinimalConnection() {
  console.log('1. Creating client...');
  const client = new API({
    apiKey: process.env.API_KEY,
    // Start with absolute minimal config
  });

  console.log('2. Attempting connection...');
  try {
    await client.connect();
    console.log('✓ Connection established');
  } catch (error) {
    console.log('✗ Connection failed:', error.message);
    console.log('Error code:', error.code);
    console.log('Error details:', JSON.stringify(error, null, 2));
    process.exit(1);
  }

  console.log('3. Sending minimal request...');
  try {
    // Simplest possible request
    const response = await client.send({ message: 'ping' });
    console.log('✓ Response received');
    console.log('Response type:', typeof response);
    console.log('Response keys:', Object.keys(response));
  } catch (error) {
    console.log('✗ Request failed:', error.message);
  }

  console.log('4. Closing connection...');
  await client.close();
  console.log('✓ Test complete');
}

testMinimalConnection().catch(console.error);

Real-World Example: WebSocket Authentication Mystery

Problem: WebSocket connection immediately closes with code 1007 "API key not valid"

❌ Initial Approach (jumping to complex hypotheses):

"Maybe the API doesn't support this model"
"Maybe the feature flag is disabled for my account"
"Maybe there's a rate limit"

✅ Diagnostic Script Approach:

// diagnose-websocket-auth.js
console.log('=== WebSocket Auth Diagnostic ===\n');

// Step 1: Verify API key loading
console.log('--- API Key ---');
console.log('Loaded:', !!process.env.API_KEY);
console.log('Length:', process.env.API_KEY?.length);
console.log('Format:', process.env.API_KEY?.slice(0, 4) + '...' + process.env.API_KEY?.slice(-4));

// Step 2: Check SDK configuration
console.log('\n--- SDK Config ---');
const sdk = new SDK({
  apiKey: process.env.API_KEY,
});

// KEY DISCOVERY: Log the actual endpoint being used
console.log('Endpoint:', sdk.endpoint); // Revealed: Using Vertex AI endpoint!

// Step 3: Check SDK defaults
console.log('Default vertexai:', sdk.config.vertexai); // Revealed: true by default!

// Step 4: Test with explicit configuration
console.log('\n--- Testing explicit vertexai: false ---');
const sdkFixed = new SDK({
  apiKey: process.env.API_KEY,
  vertexai: false, // Explicit override
});
console.log('Endpoint:', sdkFixed.endpoint); // Now using correct endpoint!

Result: The diagnostic revealed that the SDK defaulted to vertexai: true, sending requests to Vertex AI endpoint instead of the Gemini Developer API endpoint. The fix was a single parameter.

Time saved: This 15-line diagnostic script found the issue in 2 minutes. The alternative (reading SDK source code or trial-and-error config changes) would have taken hours.

3. Comparison with Working Examples

Why This Is Critical

When you have a working example (different language, different version, official sample), it's a treasure map showing the correct configuration.

Working example exists → Problem is in the differences

Systematic Comparison Process

Find the Working Example
- Official SDK examples (preferred)
- Successful previous implementations in your codebase
- Community examples with verified success (check issues/discussions)

Compare Layer by Layer

## Comparison Checklist

### Language/Runtime
- [ ] Working: Python 3.11, Failing: Node.js 20
- [ ] Any known language-specific issues?

### SDK Versions
- [ ] Working: v2.1.0, Failing: v2.3.1
- [ ] Check changelog between versions

### Configuration Parameters
Working:
```python
client = Client(api_key=key)  # Only 1 parameter

Failing:

const client = new Client({ apiKey: key, ...manyOtherParams });

What are those other params? What are their defaults?

API Endpoint

Working: api.service.com, Failing: ???
Log actual endpoint used by SDK

Request Format

Compare actual HTTP/WebSocket frames sent (use network inspector)

Identify Hidden Differences

Common gotchas:
- Default parameters: JavaScript SDK has vertexai: true default, Python doesn't have this parameter
- Authentication methods: One uses header, another uses query param
- Endpoint URLs: SDKs may auto-select endpoints based on config
- Retry behavior: One SDK retries automatically, hiding transient failures

Example: Cross-Language Comparison

Problem: Python POC works, JavaScript POC fails with authentication error

Comparison:

# Python (WORKING)
import google.generativeai as genai

genai.configure(api_key=api_key)  # Simple, one-line config
model = genai.GenerativeModel('gemini-2.5-flash')
response = model.generate_content('Hello')

// JavaScript (FAILING)
const { GoogleGenerativeAI } = require('@google/generative-ai');

const ai = new GoogleGenerativeAI({
  apiKey: process.env.API_KEY,
  // What's different?
});

Investigation:

Python library source: genai.configure() only sets API key, no other parameters
JavaScript SDK docs: Constructor accepts vertexai parameter (default: true)
Hypothesis: JavaScript defaulting to Vertex AI endpoint

Verification:

const ai = new GoogleGenerativeAI({
  apiKey: process.env.API_KEY,
  vertexai: false,  // Match Python's implicit behavior
});

Result: Fixed. The difference was an implicit vs explicit endpoint selection.

4. Data Structure Verification: Never Assume

The Anti-Pattern

// ❌ Assumption-based code
const audioData = response.data; // Assuming 'data' contains audio
audioFile.write(audioData);
// Result: Writes undefined or wrong data, produces corrupted file

The Correct Pattern

// ✅ Verification-first code

// Step 1: Inspect actual structure
console.log('Response keys:', Object.keys(response));
console.log('Response structure:', JSON.stringify(response, null, 2).slice(0, 500));

// Step 2: Search for target data
function findAudioData(obj, path = 'response') {
  if (Buffer.isBuffer(obj)) {
    console.log(`Found Buffer at ${path}: ${obj.length} bytes`);
  }
  if (typeof obj === 'object' && obj !== null) {
    for (const [key, value] of Object.entries(obj)) {
      if (key.includes('audio') || key.includes('data')) {
        console.log(`Candidate at ${path}.${key}:`, typeof value);
      }
      findAudioData(value, `${path}.${key}`);
    }
  }
}
findAudioData(response);

// Step 3: Verify encoding
const candidateData = response.serverContent.modelTurn.parts[0].inlineData.data;
console.log('Data type:', typeof candidateData);
console.log('First 50 chars:', candidateData.slice(0, 50));
console.log('Looks like base64:', /^[A-Za-z0-9+/=]+$/.test(candidateData));

// Step 4: Test decoding
const decoded = Buffer.from(candidateData, 'base64');
console.log('Decoded size:', decoded.length, 'bytes');
console.log('First 10 bytes (hex):', decoded.slice(0, 10).toString('hex'));

// Step 5: Use verified data
audioFile.write(decoded); // Now confident this is correct

Layer-by-Layer Verification Template

// For nested data structures (e.g., API responses, message objects)

function verifyPath(obj, pathString) {
  console.log(`\n=== Verifying: ${pathString} ===`);

  const parts = pathString.split('.');
  let current = obj;
  let currentPath = 'root';

  for (const part of parts) {
    currentPath += `.${part}`;

    console.log(`Checking ${currentPath}...`);

    if (current === null || current === undefined) {
      console.log(`✗ Path broken at ${currentPath}: value is ${current}`);
      return null;
    }

    if (typeof current !== 'object') {
      console.log(`✗ Path broken at ${currentPath}: not an object (${typeof current})`);
      return null;
    }

    if (!(part in current)) {
      console.log(`✗ Key "${part}" doesn't exist`);
      console.log(`  Available keys:`, Object.keys(current));
      return null;
    }

    console.log(`✓ ${part} exists`);
    current = current[part];

    if (Array.isArray(current)) {
      console.log(`  (Array with ${current.length} items)`);
    } else if (Buffer.isBuffer(current)) {
      console.log(`  (Buffer with ${current.length} bytes)`);
    } else {
      console.log(`  (${typeof current})`);
    }
  }

  console.log(`\n✓ Full path verified: ${pathString}`);
  return current;
}

// Usage
const audioData = verifyPath(
  response,
  'serverContent.modelTurn.parts.0.inlineData.data'
);

5. Evidence Quality Hierarchy

Not all evidence is equal. Rank your evidence sources:

Tier 1: Direct Verification (Strongest)

Running code that succeeds/fails in front of you
Network traffic you personally captured
Logs you personally generated with verbose flags
Binary data you inspected byte-by-byte

Tier 2: Official Sources

Official API documentation (with version number matching yours)
Official SDK examples (with version number matching yours)
Official changelog entries

Tier 3: Working Examples

Community examples with verified success (stars, recent activity)
Stack Overflow answers with upvotes and recent dates
Your own previous successful implementations

Tier 4: Problem Reports

GitHub Issues (open or closed)
Stack Overflow questions (problems, not solutions)
Forum discussions

Tier 5: Speculation (Weakest)

"I think this API doesn't support..."
"This probably means..."
Assumptions based on API names or parameter names

Applying the Hierarchy

Scenario: Investigating why authentication fails

## Evidence Analysis

### Hypothesis: API doesn't support this authentication method

Evidence collected:
1. [Tier 4] GitHub Issue #123: User reports auth failure (OPEN, no resolution)
2. [Tier 5] Parameter named "beta" suggests experimental feature
3. [Tier 2] Official docs state: "Authentication via API key is supported"
4. [Tier 3] Example repo uses API key successfully (last updated 2 months ago)

Conclusion:
- Tier 2 (official docs) contradicts hypothesis
- Tier 3 (working example) disproves hypothesis
- Tier 4 evidence (open issue) indicates others hit same problem, but doesn't prove API limitation
- Hypothesis REJECTED: Auth method IS supported, problem is likely in configuration

Key Principle: Higher-tier evidence always overrules lower-tier evidence.

6. The First-Hand Execution Rule

Rule: Before forming conclusions, personally execute the failing code and observe the complete output.

Why This Matters

Second-hand error reports often omit critical details:

Full error messages (users summarize or truncate)
Error codes (users paste message but not code)
Preceding warnings (users skip "unimportant" output)
Environment differences (users assume their env is "normal")

Checklist Before Forming Hypothesis

Have I executed the failing code myself?
Have I seen the complete console output (not summarized)?
Have I checked for errors in non-obvious places (close codes, HTTP status, exit codes)?
Have I added extra logging to expose internal state?

Example: The Hidden Close Code

Second-hand report: "The WebSocket connection closes immediately with no error"

Assumptions formed:

"No error" → Maybe timeout?
"Closes immediately" → Maybe connection refused?

First-hand execution:

// Added logging
websocket.on('close', (code, reason) => {
  console.log('Close code:', code);      // Revealed: 1007
  console.log('Close reason:', reason);  // Revealed: "API key not valid"
});

Result: Error WAS present, just not logged by default. The close code (1007) immediately pointed to authentication issue.

7. Debugging Session Template

Use this template to structure your investigation:

## Problem Statement
[Concise description of unexpected behavior]

## Environment
- Runtime: [Node.js 20.11.0, Python 3.11, etc.]
- Library versions: [Exact versions from package.json/requirements.txt]
- OS: [If potentially relevant]

## Reproduction
[Minimal code that reproduces the issue]

## Expected vs Actual
- Expected: [What should happen]
- Actual: [What actually happens, with exact error messages]

## Hypothesis Priority List

### Priority 1: Configuration (Check First)
- [ ] Hypothesis 1a: [Specific config issue]
      Evidence needed: [What would prove/disprove this]
      Diagnostic: [Script or test to verify]

### Priority 2: Environment
- [ ] Hypothesis 2a: [Specific env issue]
      Evidence needed: [...]
      Diagnostic: [...]

### Priority 3: Data Format
[...]

### Priority 4: Complex Issues (Only if above exhausted)
[...]

## Evidence Collected

### [Hypothesis 1a]
- **Status**: ✓ PROVEN / ✗ DISPROVEN / ⚠ INCONCLUSIVE
- **Evidence tier**: [1-5]
- **Details**: [What you found]
- **Source**: [Where this evidence came from]

[Repeat for each hypothesis]

## Working Examples Comparison

### Python Implementation (WORKING)
```python
[Code]

JavaScript Implementation (FAILING)

[Code]

Differences Identified

[Difference 1]
[Difference 2]

Solution

Root Cause

[Final verified cause, with evidence tier]

Fix Applied

[Exact code change]

Verification

[How you verified the fix works]

Lessons Learned

[What would make this faster next time]


---

## 8. Common Anti-Patterns to Avoid

### Anti-Pattern 1: "Debugging by Modification"

**❌ Wrong**:
```javascript
// Try random changes hoping something works
const client = new API({ apiKey: key, timeout: 5000 });  // Doesn't work
const client = new API({ apiKey: key, timeout: 10000 }); // Doesn't work
const client = new API({ apiKey: key, retry: true });    // Doesn't work
// [30 more random attempts...]

✅ Right:

// Diagnose THEN fix
// 1. Create diagnostic to understand current behavior
// 2. Form hypothesis based on diagnostic output
// 3. Make targeted change
// 4. Verify with diagnostic

Anti-Pattern 2: "Complex First"

❌ Wrong: "The API must not support this feature with this model configuration"

✅ Right: "Let me first check if my API key is even loading correctly"

Anti-Pattern 3: "Assumption Stacking"

❌ Wrong:

// Assuming response.data exists
// Assuming it's a Buffer
// Assuming it's in the right format
fs.writeFileSync('output.wav', response.data);