Files
2025-11-30 08:59:22 +08:00

328 lines
11 KiB
Markdown

# Analyzing Unknown Codebases
## Purpose
Systematically analyze unfamiliar code to identify subsystems, components, dependencies, and architectural patterns. Produce catalog entries that follow EXACT output contracts.
## When to Use
- Coordinator delegates subsystem analysis task
- Task specifies reading from workspace and appending to `02-subsystem-catalog.md`
- You need to analyze code you haven't seen before
- Output must integrate with downstream tooling (validation, diagram generation)
## Critical Principle: Contract Compliance
**Your analysis quality doesn't matter if you violate the output contract.**
**Common rationalization:** "I'll add helpful extra sections to improve clarity"
**Reality:** Extra sections break downstream tools. The coordinator expects EXACT format for parsing and validation. Your job is to follow the specification, not improve it.
## Output Contract (MANDATORY)
When writing to `02-subsystem-catalog.md`, append EXACTLY this format:
```markdown
## [Subsystem Name]
**Location:** `path/to/subsystem/`
**Responsibility:** [One sentence describing what this subsystem does]
**Key Components:**
- `file1.ext` - [Brief description]
- `file2.ext` - [Brief description]
- `file3.ext` - [Brief description]
**Dependencies:**
- Inbound: [Subsystems that depend on this one]
- Outbound: [Subsystems this one depends on]
**Patterns Observed:**
- [Pattern 1]
- [Pattern 2]
**Concerns:**
- [Any issues, gaps, or technical debt observed]
**Confidence:** [High/Medium/Low] - [Brief reasoning]
```
**If no concerns exist, write:**
```markdown
**Concerns:**
- None observed
```
**CRITICAL COMPLIANCE RULES:**
- ❌ Add extra sections ("Integration Points", "Recommendations", "Files", etc.)
- ❌ Change section names or reorder them
- ❌ Write to separate file (must append to `02-subsystem-catalog.md`)
- ❌ Skip sections (include ALL sections - use "None observed" if empty)
- ✅ Copy the template structure EXACTLY
- ✅ Keep section order: Location → Responsibility → Key Components → Dependencies → Patterns → Concerns → Confidence
**Contract is specification, not minimum. Extra sections break downstream validation.**
### Example: Complete Compliant Entry
Here's what a correctly formatted entry looks like:
```markdown
## Authentication Service
**Location:** `/src/services/auth/`
**Responsibility:** Handles user authentication, session management, and JWT token generation for API access.
**Key Components:**
- `auth_handler.py` - Main authentication logic with login/logout endpoints (342 lines)
- `token_manager.py` - JWT token generation and validation (156 lines)
- `session_store.py` - Redis-backed session storage (98 lines)
**Dependencies:**
- Inbound: API Gateway, User Service
- Outbound: Database Layer, Cache Service, Logging Service
**Patterns Observed:**
- Dependency injection for testability (all external services injected)
- Token refresh pattern with sliding expiration
- Audit logging for all authentication events
**Concerns:**
- None observed
**Confidence:** High - Clear entry points, documented API, test coverage validates behavior
```
**This is EXACTLY what your output should look like.** No more, no less.
## Systematic Analysis Approach
### Step 1: Read Task Specification
Your task file (`temp/task-[name].md`) specifies:
- What to analyze (scope: directories, plugins, services)
- Where to read context (`01-discovery-findings.md`)
- Where to write output (`02-subsystem-catalog.md` - append)
- Expected format (the contract above)
**Read these files FIRST before analyzing code.**
### Step 2: Layered Exploration
Use this proven approach from baseline testing:
1. **Metadata layer** - Read plugin.json, package.json, setup.py
2. **Structure layer** - Examine directory organization
3. **Router layer** - Find and read router/index files (often named "using-X")
4. **Sampling layer** - Read 3-5 representative files
5. **Quantitative layer** - Use line counts as depth indicators
**Why this order works:**
- Metadata gives overview without code diving
- Structure reveals organization philosophy
- Routers often catalog all components
- Sampling verifies patterns
- Quantitative data supports claims
### Step 3: Mark Confidence Explicitly
**Every output MUST include confidence level with reasoning.**
**High confidence** - Router skill provided catalog + verified with sampling
```markdown
**Confidence:** High - Router skill listed all 10 components, sampling 4 confirmed patterns
```
**Medium confidence** - No router, but clear structure + sampling
```markdown
**Confidence:** Medium - No router catalog, inferred from directory structure + 5 file samples
```
**Low confidence** - Incomplete, placeholders, or unclear organization
```markdown
**Confidence:** Low - Several SKILL.md files missing, test artifacts suggest work-in-progress
```
### Step 4: Distinguish States Clearly
When analyzing codebases with mixed completion:
**Complete** - Skill file exists, has content, passes basic read test
```markdown
- `skill-name/SKILL.md` - Complete skill (1,234 lines)
```
**Placeholder** - Skill file exists but is stub/template
```markdown
- `skill-name/SKILL.md` - Placeholder (12 lines, template only)
```
**Planned** - Referenced in router but no file exists
```markdown
- `skill-name` - Planned (referenced in router, not implemented)
```
**TDD artifacts** - Test scenarios, baseline results (these ARE documentation)
```markdown
- `test-scenarios.md` - TDD test scenarios (RED phase)
- `baseline-results.md` - Baseline behavior documentation
```
### Step 5: Write Output (Contract Compliance)
**Before writing:**
1. Prepare your entry in EXACT contract format from the template above
2. Copy the structure - don't paraphrase or reorganize
3. Triple-check you have ALL sections in correct order
**When writing:**
1. **Target file:** `02-subsystem-catalog.md` in workspace directory
2. **Operation:** Append your entry (create file if first entry, append if file exists)
3. **Method:**
- If file exists: Read current content, then Write with original + your entry
- If file doesn't exist: Write your entry directly
4. **Format:** Follow contract sections in exact order
5. **Completeness:** Include ALL sections - use "None observed" for empty Concerns
**DO NOT create separate files** (e.g., `subsystem-X-analysis.md`). The coordinator expects all entries in `02-subsystem-catalog.md`.
**After writing:**
1. Re-read `02-subsystem-catalog.md` to verify your entry was added correctly
2. Validate format matches contract exactly using this checklist:
**Self-Validation Checklist:**
```
[ ] Section 1: Subsystem name as H2 heading (## Name)
[ ] Section 2: Location with backticks and absolute path
[ ] Section 3: Responsibility as single sentence
[ ] Section 4: Key Components as bulleted list with descriptions
[ ] Section 5: Dependencies with "Inbound:" and "Outbound:" labels
[ ] Section 6: Patterns Observed as bulleted list
[ ] Section 7: Concerns present (with issues OR "None observed")
[ ] Section 8: Confidence level (High/Medium/Low) with reasoning
[ ] Separator: "---" line after confidence
[ ] NO extra sections added
[ ] Sections in correct order
[ ] Entry in file: 02-subsystem-catalog.md (not separate file)
```
## Handling Uncertainty
**When architecture is unclear:**
1. **State what you observe** - Don't guess at intent
```markdown
**Patterns Observed:**
- 3 files with similar structure (analysis.py, parsing.py, validation.py)
- Unclear if this is deliberate pattern or coincidence
```
2. **Mark confidence appropriately** - Low confidence is valid
```markdown
**Confidence:** Low - Directory structure suggests microservices, but no service definitions found
```
3. **Use "Concerns" section** - Document gaps
```markdown
**Concerns:**
- No clear entry point identified
- Dependencies inferred from imports, not explicit manifest
```
**DO NOT:**
- Invent relationships you didn't verify
- Assume "obvious" architecture without evidence
- Skip confidence marking because you're uncertain
## Positive Behaviors to Maintain
From baseline testing, these approaches WORK:
✅ **Read actual files** - Don't infer from names alone
✅ **Use router skills** - They often provide complete catalogs
✅ **Sample strategically** - 3-5 files verifies patterns without exhaustive reading
✅ **Cross-reference** - Verify claims (imports match listed dependencies)
✅ **Document assumptions** - Make reasoning explicit
✅ **Line counts indicate depth** - 1,500-line skill vs 50-line stub matters
## Common Rationalizations (STOP SIGNALS)
If you catch yourself thinking these, STOP:
| Rationalization | Reality |
|-----------------|---------|
| "I'll add Integration Points section for clarity" | Extra sections break downstream parsing |
| "I'll write to separate file for organization" | Coordinator expects append to specified file |
| "I'll improve the contract format" | Contract is specification from coordinator |
| "More information is always helpful" | Your job: follow spec. Coordinator's job: decide what's included |
| "This comprehensive format is better" | "Better" violates contract. Compliance is mandatory. |
## Validation Criteria
Your output will be validated against:
1. **Contract compliance** - All sections present, no extras
2. **File operation** - Appended to `02-subsystem-catalog.md`, not separate file
3. **Confidence marking** - High/Medium/Low with reasoning
4. **Evidence-based claims** - Components you actually read
5. **Bidirectional dependencies** - If A→B, then B must show A as inbound
**If validation returns NEEDS_REVISION:**
- Read the validation report
- Fix specific issues identified
- Re-submit following contract
## Success Criteria
**You succeeded when:**
- Entry appended to `02-subsystem-catalog.md` in exact contract format
- All sections included (none skipped, none added)
- Confidence level marked with reasoning
- Claims supported by files you read
- Validation returns APPROVED
**You failed when:**
- Added "helpful" extra sections
- Wrote to separate file
- Changed contract format
- Skipped sections
- No confidence marking
- Validation returns BLOCK status
## Anti-Patterns
❌ **Add extra sections**
"I'll add Recommendations section" → Violates contract
❌ **Write to new file**
"I'll create subsystem-X-analysis.md" → Should append to `02-subsystem-catalog.md`
❌ **Skip required sections**
"No concerns, so I'll omit that section" → Include section with "None observed"
❌ **Change format**
"I'll use numbered lists instead of bullet points" → Follow contract exactly
❌ **Work without reading task spec**
"I know what to do" → Read `temp/task-*.md` first
## Integration with Workflow
This skill is typically invoked as:
1. **Coordinator** creates workspace and holistic assessment
2. **Coordinator** writes task specification in `temp/task-[yourname].md`
3. **YOU** read task spec + `01-discovery-findings.md`
4. **YOU** analyze assigned subsystem systematically
5. **YOU** append entry to `02-subsystem-catalog.md` following contract
6. **Validator** checks your output against contract
7. **Coordinator** proceeds to next phase if validation passes
**Your role:** Analyze systematically, follow contract exactly, mark confidence explicitly.