Initial commit
This commit is contained in:
327
skills/using-system-archaeologist/analyzing-unknown-codebases.md
Normal file
327
skills/using-system-archaeologist/analyzing-unknown-codebases.md
Normal file
@@ -0,0 +1,327 @@
|
||||
|
||||
# Analyzing Unknown Codebases
|
||||
|
||||
## Purpose
|
||||
|
||||
Systematically analyze unfamiliar code to identify subsystems, components, dependencies, and architectural patterns. Produce catalog entries that follow EXACT output contracts.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Coordinator delegates subsystem analysis task
|
||||
- Task specifies reading from workspace and appending to `02-subsystem-catalog.md`
|
||||
- You need to analyze code you haven't seen before
|
||||
- Output must integrate with downstream tooling (validation, diagram generation)
|
||||
|
||||
## Critical Principle: Contract Compliance
|
||||
|
||||
**Your analysis quality doesn't matter if you violate the output contract.**
|
||||
|
||||
**Common rationalization:** "I'll add helpful extra sections to improve clarity"
|
||||
|
||||
**Reality:** Extra sections break downstream tools. The coordinator expects EXACT format for parsing and validation. Your job is to follow the specification, not improve it.
|
||||
|
||||
## Output Contract (MANDATORY)
|
||||
|
||||
When writing to `02-subsystem-catalog.md`, append EXACTLY this format:
|
||||
|
||||
```markdown
|
||||
## [Subsystem Name]
|
||||
|
||||
**Location:** `path/to/subsystem/`
|
||||
|
||||
**Responsibility:** [One sentence describing what this subsystem does]
|
||||
|
||||
**Key Components:**
|
||||
- `file1.ext` - [Brief description]
|
||||
- `file2.ext` - [Brief description]
|
||||
- `file3.ext` - [Brief description]
|
||||
|
||||
**Dependencies:**
|
||||
- Inbound: [Subsystems that depend on this one]
|
||||
- Outbound: [Subsystems this one depends on]
|
||||
|
||||
**Patterns Observed:**
|
||||
- [Pattern 1]
|
||||
- [Pattern 2]
|
||||
|
||||
**Concerns:**
|
||||
- [Any issues, gaps, or technical debt observed]
|
||||
|
||||
**Confidence:** [High/Medium/Low] - [Brief reasoning]
|
||||
|
||||
```
|
||||
|
||||
**If no concerns exist, write:**
|
||||
```markdown
|
||||
**Concerns:**
|
||||
- None observed
|
||||
```
|
||||
|
||||
**CRITICAL COMPLIANCE RULES:**
|
||||
- ❌ Add extra sections ("Integration Points", "Recommendations", "Files", etc.)
|
||||
- ❌ Change section names or reorder them
|
||||
- ❌ Write to separate file (must append to `02-subsystem-catalog.md`)
|
||||
- ❌ Skip sections (include ALL sections - use "None observed" if empty)
|
||||
- ✅ Copy the template structure EXACTLY
|
||||
- ✅ Keep section order: Location → Responsibility → Key Components → Dependencies → Patterns → Concerns → Confidence
|
||||
|
||||
**Contract is specification, not minimum. Extra sections break downstream validation.**
|
||||
|
||||
### Example: Complete Compliant Entry
|
||||
|
||||
Here's what a correctly formatted entry looks like:
|
||||
|
||||
```markdown
|
||||
## Authentication Service
|
||||
|
||||
**Location:** `/src/services/auth/`
|
||||
|
||||
**Responsibility:** Handles user authentication, session management, and JWT token generation for API access.
|
||||
|
||||
**Key Components:**
|
||||
- `auth_handler.py` - Main authentication logic with login/logout endpoints (342 lines)
|
||||
- `token_manager.py` - JWT token generation and validation (156 lines)
|
||||
- `session_store.py` - Redis-backed session storage (98 lines)
|
||||
|
||||
**Dependencies:**
|
||||
- Inbound: API Gateway, User Service
|
||||
- Outbound: Database Layer, Cache Service, Logging Service
|
||||
|
||||
**Patterns Observed:**
|
||||
- Dependency injection for testability (all external services injected)
|
||||
- Token refresh pattern with sliding expiration
|
||||
- Audit logging for all authentication events
|
||||
|
||||
**Concerns:**
|
||||
- None observed
|
||||
|
||||
**Confidence:** High - Clear entry points, documented API, test coverage validates behavior
|
||||
|
||||
```
|
||||
|
||||
**This is EXACTLY what your output should look like.** No more, no less.
|
||||
|
||||
## Systematic Analysis Approach
|
||||
|
||||
### Step 1: Read Task Specification
|
||||
|
||||
Your task file (`temp/task-[name].md`) specifies:
|
||||
- What to analyze (scope: directories, plugins, services)
|
||||
- Where to read context (`01-discovery-findings.md`)
|
||||
- Where to write output (`02-subsystem-catalog.md` - append)
|
||||
- Expected format (the contract above)
|
||||
|
||||
**Read these files FIRST before analyzing code.**
|
||||
|
||||
### Step 2: Layered Exploration
|
||||
|
||||
Use this proven approach from baseline testing:
|
||||
|
||||
1. **Metadata layer** - Read plugin.json, package.json, setup.py
|
||||
2. **Structure layer** - Examine directory organization
|
||||
3. **Router layer** - Find and read router/index files (often named "using-X")
|
||||
4. **Sampling layer** - Read 3-5 representative files
|
||||
5. **Quantitative layer** - Use line counts as depth indicators
|
||||
|
||||
**Why this order works:**
|
||||
- Metadata gives overview without code diving
|
||||
- Structure reveals organization philosophy
|
||||
- Routers often catalog all components
|
||||
- Sampling verifies patterns
|
||||
- Quantitative data supports claims
|
||||
|
||||
### Step 3: Mark Confidence Explicitly
|
||||
|
||||
**Every output MUST include confidence level with reasoning.**
|
||||
|
||||
**High confidence** - Router skill provided catalog + verified with sampling
|
||||
```markdown
|
||||
**Confidence:** High - Router skill listed all 10 components, sampling 4 confirmed patterns
|
||||
```
|
||||
|
||||
**Medium confidence** - No router, but clear structure + sampling
|
||||
```markdown
|
||||
**Confidence:** Medium - No router catalog, inferred from directory structure + 5 file samples
|
||||
```
|
||||
|
||||
**Low confidence** - Incomplete, placeholders, or unclear organization
|
||||
```markdown
|
||||
**Confidence:** Low - Several SKILL.md files missing, test artifacts suggest work-in-progress
|
||||
```
|
||||
|
||||
### Step 4: Distinguish States Clearly
|
||||
|
||||
When analyzing codebases with mixed completion:
|
||||
|
||||
**Complete** - Skill file exists, has content, passes basic read test
|
||||
```markdown
|
||||
- `skill-name/SKILL.md` - Complete skill (1,234 lines)
|
||||
```
|
||||
|
||||
**Placeholder** - Skill file exists but is stub/template
|
||||
```markdown
|
||||
- `skill-name/SKILL.md` - Placeholder (12 lines, template only)
|
||||
```
|
||||
|
||||
**Planned** - Referenced in router but no file exists
|
||||
```markdown
|
||||
- `skill-name` - Planned (referenced in router, not implemented)
|
||||
```
|
||||
|
||||
**TDD artifacts** - Test scenarios, baseline results (these ARE documentation)
|
||||
```markdown
|
||||
- `test-scenarios.md` - TDD test scenarios (RED phase)
|
||||
- `baseline-results.md` - Baseline behavior documentation
|
||||
```
|
||||
|
||||
### Step 5: Write Output (Contract Compliance)
|
||||
|
||||
**Before writing:**
|
||||
1. Prepare your entry in EXACT contract format from the template above
|
||||
2. Copy the structure - don't paraphrase or reorganize
|
||||
3. Triple-check you have ALL sections in correct order
|
||||
|
||||
**When writing:**
|
||||
1. **Target file:** `02-subsystem-catalog.md` in workspace directory
|
||||
2. **Operation:** Append your entry (create file if first entry, append if file exists)
|
||||
3. **Method:**
|
||||
- If file exists: Read current content, then Write with original + your entry
|
||||
- If file doesn't exist: Write your entry directly
|
||||
4. **Format:** Follow contract sections in exact order
|
||||
5. **Completeness:** Include ALL sections - use "None observed" for empty Concerns
|
||||
|
||||
**DO NOT create separate files** (e.g., `subsystem-X-analysis.md`). The coordinator expects all entries in `02-subsystem-catalog.md`.
|
||||
|
||||
**After writing:**
|
||||
1. Re-read `02-subsystem-catalog.md` to verify your entry was added correctly
|
||||
2. Validate format matches contract exactly using this checklist:
|
||||
|
||||
**Self-Validation Checklist:**
|
||||
```
|
||||
[ ] Section 1: Subsystem name as H2 heading (## Name)
|
||||
[ ] Section 2: Location with backticks and absolute path
|
||||
[ ] Section 3: Responsibility as single sentence
|
||||
[ ] Section 4: Key Components as bulleted list with descriptions
|
||||
[ ] Section 5: Dependencies with "Inbound:" and "Outbound:" labels
|
||||
[ ] Section 6: Patterns Observed as bulleted list
|
||||
[ ] Section 7: Concerns present (with issues OR "None observed")
|
||||
[ ] Section 8: Confidence level (High/Medium/Low) with reasoning
|
||||
[ ] Separator: "---" line after confidence
|
||||
[ ] NO extra sections added
|
||||
[ ] Sections in correct order
|
||||
[ ] Entry in file: 02-subsystem-catalog.md (not separate file)
|
||||
```
|
||||
|
||||
## Handling Uncertainty
|
||||
|
||||
**When architecture is unclear:**
|
||||
|
||||
1. **State what you observe** - Don't guess at intent
|
||||
```markdown
|
||||
**Patterns Observed:**
|
||||
- 3 files with similar structure (analysis.py, parsing.py, validation.py)
|
||||
- Unclear if this is deliberate pattern or coincidence
|
||||
```
|
||||
|
||||
2. **Mark confidence appropriately** - Low confidence is valid
|
||||
```markdown
|
||||
**Confidence:** Low - Directory structure suggests microservices, but no service definitions found
|
||||
```
|
||||
|
||||
3. **Use "Concerns" section** - Document gaps
|
||||
```markdown
|
||||
**Concerns:**
|
||||
- No clear entry point identified
|
||||
- Dependencies inferred from imports, not explicit manifest
|
||||
```
|
||||
|
||||
**DO NOT:**
|
||||
- Invent relationships you didn't verify
|
||||
- Assume "obvious" architecture without evidence
|
||||
- Skip confidence marking because you're uncertain
|
||||
|
||||
## Positive Behaviors to Maintain
|
||||
|
||||
From baseline testing, these approaches WORK:
|
||||
|
||||
✅ **Read actual files** - Don't infer from names alone
|
||||
✅ **Use router skills** - They often provide complete catalogs
|
||||
✅ **Sample strategically** - 3-5 files verifies patterns without exhaustive reading
|
||||
✅ **Cross-reference** - Verify claims (imports match listed dependencies)
|
||||
✅ **Document assumptions** - Make reasoning explicit
|
||||
✅ **Line counts indicate depth** - 1,500-line skill vs 50-line stub matters
|
||||
|
||||
## Common Rationalizations (STOP SIGNALS)
|
||||
|
||||
If you catch yourself thinking these, STOP:
|
||||
|
||||
| Rationalization | Reality |
|
||||
|-----------------|---------|
|
||||
| "I'll add Integration Points section for clarity" | Extra sections break downstream parsing |
|
||||
| "I'll write to separate file for organization" | Coordinator expects append to specified file |
|
||||
| "I'll improve the contract format" | Contract is specification from coordinator |
|
||||
| "More information is always helpful" | Your job: follow spec. Coordinator's job: decide what's included |
|
||||
| "This comprehensive format is better" | "Better" violates contract. Compliance is mandatory. |
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
Your output will be validated against:
|
||||
|
||||
1. **Contract compliance** - All sections present, no extras
|
||||
2. **File operation** - Appended to `02-subsystem-catalog.md`, not separate file
|
||||
3. **Confidence marking** - High/Medium/Low with reasoning
|
||||
4. **Evidence-based claims** - Components you actually read
|
||||
5. **Bidirectional dependencies** - If A→B, then B must show A as inbound
|
||||
|
||||
**If validation returns NEEDS_REVISION:**
|
||||
- Read the validation report
|
||||
- Fix specific issues identified
|
||||
- Re-submit following contract
|
||||
|
||||
## Success Criteria
|
||||
|
||||
**You succeeded when:**
|
||||
- Entry appended to `02-subsystem-catalog.md` in exact contract format
|
||||
- All sections included (none skipped, none added)
|
||||
- Confidence level marked with reasoning
|
||||
- Claims supported by files you read
|
||||
- Validation returns APPROVED
|
||||
|
||||
**You failed when:**
|
||||
- Added "helpful" extra sections
|
||||
- Wrote to separate file
|
||||
- Changed contract format
|
||||
- Skipped sections
|
||||
- No confidence marking
|
||||
- Validation returns BLOCK status
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
❌ **Add extra sections**
|
||||
"I'll add Recommendations section" → Violates contract
|
||||
|
||||
❌ **Write to new file**
|
||||
"I'll create subsystem-X-analysis.md" → Should append to `02-subsystem-catalog.md`
|
||||
|
||||
❌ **Skip required sections**
|
||||
"No concerns, so I'll omit that section" → Include section with "None observed"
|
||||
|
||||
❌ **Change format**
|
||||
"I'll use numbered lists instead of bullet points" → Follow contract exactly
|
||||
|
||||
❌ **Work without reading task spec**
|
||||
"I know what to do" → Read `temp/task-*.md` first
|
||||
|
||||
## Integration with Workflow
|
||||
|
||||
This skill is typically invoked as:
|
||||
|
||||
1. **Coordinator** creates workspace and holistic assessment
|
||||
2. **Coordinator** writes task specification in `temp/task-[yourname].md`
|
||||
3. **YOU** read task spec + `01-discovery-findings.md`
|
||||
4. **YOU** analyze assigned subsystem systematically
|
||||
5. **YOU** append entry to `02-subsystem-catalog.md` following contract
|
||||
6. **Validator** checks your output against contract
|
||||
7. **Coordinator** proceeds to next phase if validation passes
|
||||
|
||||
**Your role:** Analyze systematically, follow contract exactly, mark confidence explicitly.
|
||||
Reference in New Issue
Block a user