gh-tachyon-beep-skillpacks-…/skills/using-system-archaeologist/analyzing-unknown-codebases.md


# Analyzing Unknown Codebases

## Purpose

Systematically analyze unfamiliar code to identify subsystems, components, dependencies, and architectural patterns. Produce catalog entries that follow EXACT output contracts.

## When to Use

- Coordinator delegates subsystem analysis task
- Task specifies reading from workspace and appending to `02-subsystem-catalog.md`
- You need to analyze code you haven't seen before
- Output must integrate with downstream tooling (validation, diagram generation)

## Critical Principle: Contract Compliance

**Your analysis quality doesn't matter if you violate the output contract.**

**Common rationalization:** "I'll add helpful extra sections to improve clarity"

**Reality:** Extra sections break downstream tools. The coordinator expects EXACT format for parsing and validation. Your job is to follow the specification, not improve it.

## Output Contract (MANDATORY)

When writing to `02-subsystem-catalog.md`, append EXACTLY this format:

```markdown
## [Subsystem Name]

**Location:** `path/to/subsystem/`

**Responsibility:** [One sentence describing what this subsystem does]

**Key Components:**
- `file1.ext` - [Brief description]
- `file2.ext` - [Brief description]
- `file3.ext` - [Brief description]

**Dependencies:**
- Inbound: [Subsystems that depend on this one]
- Outbound: [Subsystems this one depends on]

**Patterns Observed:**
- [Pattern 1]
- [Pattern 2]

**Concerns:**
- [Any issues, gaps, or technical debt observed]

**Confidence:** [High/Medium/Low] - [Brief reasoning]

```

**If no concerns exist, write:**
```markdown
**Concerns:**
- None observed
```

**CRITICAL COMPLIANCE RULES:**
- ❌ Add extra sections ("Integration Points", "Recommendations", "Files", etc.)
- ❌ Change section names or reorder them
- ❌ Write to separate file (must append to `02-subsystem-catalog.md`)
- ❌ Skip sections (include ALL sections - use "None observed" if empty)
- ✅ Copy the template structure EXACTLY
- ✅ Keep section order: Location → Responsibility → Key Components → Dependencies → Patterns → Concerns → Confidence

**Contract is specification, not minimum. Extra sections break downstream validation.**

### Example: Complete Compliant Entry

Here's what a correctly formatted entry looks like:

```markdown
## Authentication Service

**Location:** `/src/services/auth/`

**Responsibility:** Handles user authentication, session management, and JWT token generation for API access.

**Key Components:**
- `auth_handler.py` - Main authentication logic with login/logout endpoints (342 lines)
- `token_manager.py` - JWT token generation and validation (156 lines)
- `session_store.py` - Redis-backed session storage (98 lines)

**Dependencies:**
- Inbound: API Gateway, User Service
- Outbound: Database Layer, Cache Service, Logging Service

**Patterns Observed:**
- Dependency injection for testability (all external services injected)
- Token refresh pattern with sliding expiration
- Audit logging for all authentication events

**Concerns:**
- None observed

**Confidence:** High - Clear entry points, documented API, test coverage validates behavior

```

**This is EXACTLY what your output should look like.** No more, no less.

## Systematic Analysis Approach

### Step 1: Read Task Specification

Your task file (`temp/task-[name].md`) specifies:
- What to analyze (scope: directories, plugins, services)
- Where to read context (`01-discovery-findings.md`)
- Where to write output (`02-subsystem-catalog.md` - append)
- Expected format (the contract above)

**Read these files FIRST before analyzing code.**

### Step 2: Layered Exploration

Use this proven approach from baseline testing:

1. **Metadata layer** - Read plugin.json, package.json, setup.py
2. **Structure layer** - Examine directory organization
3. **Router layer** - Find and read router/index files (often named "using-X")
4. **Sampling layer** - Read 3-5 representative files
5. **Quantitative layer** - Use line counts as depth indicators

**Why this order works:**
- Metadata gives overview without code diving
- Structure reveals organization philosophy
- Routers often catalog all components
- Sampling verifies patterns
- Quantitative data supports claims

### Step 3: Mark Confidence Explicitly

**Every output MUST include confidence level with reasoning.**

**High confidence** - Router skill provided catalog + verified with sampling
```markdown
**Confidence:** High - Router skill listed all 10 components, sampling 4 confirmed patterns
```

**Medium confidence** - No router, but clear structure + sampling
```markdown
**Confidence:** Medium - No router catalog, inferred from directory structure + 5 file samples
```

**Low confidence** - Incomplete, placeholders, or unclear organization
```markdown
**Confidence:** Low - Several SKILL.md files missing, test artifacts suggest work-in-progress
```

### Step 4: Distinguish States Clearly

When analyzing codebases with mixed completion:

**Complete** - Skill file exists, has content, passes basic read test
```markdown
- `skill-name/SKILL.md` - Complete skill (1,234 lines)
```

**Placeholder** - Skill file exists but is stub/template
```markdown
- `skill-name/SKILL.md` - Placeholder (12 lines, template only)
```

**Planned** - Referenced in router but no file exists
```markdown
- `skill-name` - Planned (referenced in router, not implemented)
```

**TDD artifacts** - Test scenarios, baseline results (these ARE documentation)
```markdown
- `test-scenarios.md` - TDD test scenarios (RED phase)
- `baseline-results.md` - Baseline behavior documentation
```

### Step 5: Write Output (Contract Compliance)

**Before writing:**
1. Prepare your entry in EXACT contract format from the template above
2. Copy the structure - don't paraphrase or reorganize
3. Triple-check you have ALL sections in correct order

**When writing:**
1. **Target file:** `02-subsystem-catalog.md` in workspace directory
2. **Operation:** Append your entry (create file if first entry, append if file exists)
3. **Method:**
   - If file exists: Read current content, then Write with original + your entry
   - If file doesn't exist: Write your entry directly
4. **Format:** Follow contract sections in exact order
5. **Completeness:** Include ALL sections - use "None observed" for empty Concerns

**DO NOT create separate files** (e.g., `subsystem-X-analysis.md`). The coordinator expects all entries in `02-subsystem-catalog.md`.

**After writing:**
1. Re-read `02-subsystem-catalog.md` to verify your entry was added correctly
2. Validate format matches contract exactly using this checklist:

**Self-Validation Checklist:**
```
[ ] Section 1: Subsystem name as H2 heading (## Name)
[ ] Section 2: Location with backticks and absolute path
[ ] Section 3: Responsibility as single sentence
[ ] Section 4: Key Components as bulleted list with descriptions
[ ] Section 5: Dependencies with "Inbound:" and "Outbound:" labels
[ ] Section 6: Patterns Observed as bulleted list
[ ] Section 7: Concerns present (with issues OR "None observed")
[ ] Section 8: Confidence level (High/Medium/Low) with reasoning
[ ] Separator: "---" line after confidence
[ ] NO extra sections added
[ ] Sections in correct order
[ ] Entry in file: 02-subsystem-catalog.md (not separate file)
```

## Handling Uncertainty

**When architecture is unclear:**

1. **State what you observe** - Don't guess at intent
   ```markdown
   **Patterns Observed:**
   - 3 files with similar structure (analysis.py, parsing.py, validation.py)
   - Unclear if this is deliberate pattern or coincidence
   ```

2. **Mark confidence appropriately** - Low confidence is valid
   ```markdown
   **Confidence:** Low - Directory structure suggests microservices, but no service definitions found
   ```

3. **Use "Concerns" section** - Document gaps
   ```markdown
   **Concerns:**
   - No clear entry point identified
   - Dependencies inferred from imports, not explicit manifest
   ```

**DO NOT:**
- Invent relationships you didn't verify
- Assume "obvious" architecture without evidence
- Skip confidence marking because you're uncertain

## Positive Behaviors to Maintain

From baseline testing, these approaches WORK:

✅ **Read actual files** - Don't infer from names alone
✅ **Use router skills** - They often provide complete catalogs
✅ **Sample strategically** - 3-5 files verifies patterns without exhaustive reading
✅ **Cross-reference** - Verify claims (imports match listed dependencies)
✅ **Document assumptions** - Make reasoning explicit
✅ **Line counts indicate depth** - 1,500-line skill vs 50-line stub matters

## Common Rationalizations (STOP SIGNALS)

If you catch yourself thinking these, STOP:

| Rationalization | Reality |
|-----------------|---------|
| "I'll add Integration Points section for clarity" | Extra sections break downstream parsing |
| "I'll write to separate file for organization" | Coordinator expects append to specified file |
| "I'll improve the contract format" | Contract is specification from coordinator |
| "More information is always helpful" | Your job: follow spec. Coordinator's job: decide what's included |
| "This comprehensive format is better" | "Better" violates contract. Compliance is mandatory. |

## Validation Criteria

Your output will be validated against:

1. **Contract compliance** - All sections present, no extras
2. **File operation** - Appended to `02-subsystem-catalog.md`, not separate file
3. **Confidence marking** - High/Medium/Low with reasoning
4. **Evidence-based claims** - Components you actually read
5. **Bidirectional dependencies** - If A→B, then B must show A as inbound

**If validation returns NEEDS_REVISION:**
- Read the validation report
- Fix specific issues identified
- Re-submit following contract

## Success Criteria

**You succeeded when:**
- Entry appended to `02-subsystem-catalog.md` in exact contract format
- All sections included (none skipped, none added)
- Confidence level marked with reasoning
- Claims supported by files you read
- Validation returns APPROVED

**You failed when:**
- Added "helpful" extra sections
- Wrote to separate file
- Changed contract format
- Skipped sections
- No confidence marking
- Validation returns BLOCK status

## Anti-Patterns

❌ **Add extra sections**
"I'll add Recommendations section" → Violates contract

❌ **Write to new file**
"I'll create subsystem-X-analysis.md" → Should append to `02-subsystem-catalog.md`

❌ **Skip required sections**
"No concerns, so I'll omit that section" → Include section with "None observed"

❌ **Change format**
"I'll use numbered lists instead of bullet points" → Follow contract exactly

❌ **Work without reading task spec**
"I know what to do" → Read `temp/task-*.md` first

## Integration with Workflow

This skill is typically invoked as:

1. **Coordinator** creates workspace and holistic assessment
2. **Coordinator** writes task specification in `temp/task-[yourname].md`
3. **YOU** read task spec + `01-discovery-findings.md`
4. **YOU** analyze assigned subsystem systematically
5. **YOU** append entry to `02-subsystem-catalog.md` following contract
6. **Validator** checks your output against contract
7. **Coordinator** proceeds to next phase if validation passes

**Your role:** Analyze systematically, follow contract exactly, mark confidence explicitly.