328 lines
11 KiB
Markdown
328 lines
11 KiB
Markdown
|
|
# Analyzing Unknown Codebases
|
|
|
|
## Purpose
|
|
|
|
Systematically analyze unfamiliar code to identify subsystems, components, dependencies, and architectural patterns. Produce catalog entries that follow EXACT output contracts.
|
|
|
|
## When to Use
|
|
|
|
- Coordinator delegates subsystem analysis task
|
|
- Task specifies reading from workspace and appending to `02-subsystem-catalog.md`
|
|
- You need to analyze code you haven't seen before
|
|
- Output must integrate with downstream tooling (validation, diagram generation)
|
|
|
|
## Critical Principle: Contract Compliance
|
|
|
|
**Your analysis quality doesn't matter if you violate the output contract.**
|
|
|
|
**Common rationalization:** "I'll add helpful extra sections to improve clarity"
|
|
|
|
**Reality:** Extra sections break downstream tools. The coordinator expects EXACT format for parsing and validation. Your job is to follow the specification, not improve it.
|
|
|
|
## Output Contract (MANDATORY)
|
|
|
|
When writing to `02-subsystem-catalog.md`, append EXACTLY this format:
|
|
|
|
```markdown
|
|
## [Subsystem Name]
|
|
|
|
**Location:** `path/to/subsystem/`
|
|
|
|
**Responsibility:** [One sentence describing what this subsystem does]
|
|
|
|
**Key Components:**
|
|
- `file1.ext` - [Brief description]
|
|
- `file2.ext` - [Brief description]
|
|
- `file3.ext` - [Brief description]
|
|
|
|
**Dependencies:**
|
|
- Inbound: [Subsystems that depend on this one]
|
|
- Outbound: [Subsystems this one depends on]
|
|
|
|
**Patterns Observed:**
|
|
- [Pattern 1]
|
|
- [Pattern 2]
|
|
|
|
**Concerns:**
|
|
- [Any issues, gaps, or technical debt observed]
|
|
|
|
**Confidence:** [High/Medium/Low] - [Brief reasoning]
|
|
|
|
```
|
|
|
|
**If no concerns exist, write:**
|
|
```markdown
|
|
**Concerns:**
|
|
- None observed
|
|
```
|
|
|
|
**CRITICAL COMPLIANCE RULES:**
|
|
- ❌ Add extra sections ("Integration Points", "Recommendations", "Files", etc.)
|
|
- ❌ Change section names or reorder them
|
|
- ❌ Write to separate file (must append to `02-subsystem-catalog.md`)
|
|
- ❌ Skip sections (include ALL sections - use "None observed" if empty)
|
|
- ✅ Copy the template structure EXACTLY
|
|
- ✅ Keep section order: Location → Responsibility → Key Components → Dependencies → Patterns → Concerns → Confidence
|
|
|
|
**Contract is specification, not minimum. Extra sections break downstream validation.**
|
|
|
|
### Example: Complete Compliant Entry
|
|
|
|
Here's what a correctly formatted entry looks like:
|
|
|
|
```markdown
|
|
## Authentication Service
|
|
|
|
**Location:** `/src/services/auth/`
|
|
|
|
**Responsibility:** Handles user authentication, session management, and JWT token generation for API access.
|
|
|
|
**Key Components:**
|
|
- `auth_handler.py` - Main authentication logic with login/logout endpoints (342 lines)
|
|
- `token_manager.py` - JWT token generation and validation (156 lines)
|
|
- `session_store.py` - Redis-backed session storage (98 lines)
|
|
|
|
**Dependencies:**
|
|
- Inbound: API Gateway, User Service
|
|
- Outbound: Database Layer, Cache Service, Logging Service
|
|
|
|
**Patterns Observed:**
|
|
- Dependency injection for testability (all external services injected)
|
|
- Token refresh pattern with sliding expiration
|
|
- Audit logging for all authentication events
|
|
|
|
**Concerns:**
|
|
- None observed
|
|
|
|
**Confidence:** High - Clear entry points, documented API, test coverage validates behavior
|
|
|
|
```
|
|
|
|
**This is EXACTLY what your output should look like.** No more, no less.
|
|
|
|
## Systematic Analysis Approach
|
|
|
|
### Step 1: Read Task Specification
|
|
|
|
Your task file (`temp/task-[name].md`) specifies:
|
|
- What to analyze (scope: directories, plugins, services)
|
|
- Where to read context (`01-discovery-findings.md`)
|
|
- Where to write output (`02-subsystem-catalog.md` - append)
|
|
- Expected format (the contract above)
|
|
|
|
**Read these files FIRST before analyzing code.**
|
|
|
|
### Step 2: Layered Exploration
|
|
|
|
Use this proven approach from baseline testing:
|
|
|
|
1. **Metadata layer** - Read plugin.json, package.json, setup.py
|
|
2. **Structure layer** - Examine directory organization
|
|
3. **Router layer** - Find and read router/index files (often named "using-X")
|
|
4. **Sampling layer** - Read 3-5 representative files
|
|
5. **Quantitative layer** - Use line counts as depth indicators
|
|
|
|
**Why this order works:**
|
|
- Metadata gives overview without code diving
|
|
- Structure reveals organization philosophy
|
|
- Routers often catalog all components
|
|
- Sampling verifies patterns
|
|
- Quantitative data supports claims
|
|
|
|
### Step 3: Mark Confidence Explicitly
|
|
|
|
**Every output MUST include confidence level with reasoning.**
|
|
|
|
**High confidence** - Router skill provided catalog + verified with sampling
|
|
```markdown
|
|
**Confidence:** High - Router skill listed all 10 components, sampling 4 confirmed patterns
|
|
```
|
|
|
|
**Medium confidence** - No router, but clear structure + sampling
|
|
```markdown
|
|
**Confidence:** Medium - No router catalog, inferred from directory structure + 5 file samples
|
|
```
|
|
|
|
**Low confidence** - Incomplete, placeholders, or unclear organization
|
|
```markdown
|
|
**Confidence:** Low - Several SKILL.md files missing, test artifacts suggest work-in-progress
|
|
```
|
|
|
|
### Step 4: Distinguish States Clearly
|
|
|
|
When analyzing codebases with mixed completion:
|
|
|
|
**Complete** - Skill file exists, has content, passes basic read test
|
|
```markdown
|
|
- `skill-name/SKILL.md` - Complete skill (1,234 lines)
|
|
```
|
|
|
|
**Placeholder** - Skill file exists but is stub/template
|
|
```markdown
|
|
- `skill-name/SKILL.md` - Placeholder (12 lines, template only)
|
|
```
|
|
|
|
**Planned** - Referenced in router but no file exists
|
|
```markdown
|
|
- `skill-name` - Planned (referenced in router, not implemented)
|
|
```
|
|
|
|
**TDD artifacts** - Test scenarios, baseline results (these ARE documentation)
|
|
```markdown
|
|
- `test-scenarios.md` - TDD test scenarios (RED phase)
|
|
- `baseline-results.md` - Baseline behavior documentation
|
|
```
|
|
|
|
### Step 5: Write Output (Contract Compliance)
|
|
|
|
**Before writing:**
|
|
1. Prepare your entry in EXACT contract format from the template above
|
|
2. Copy the structure - don't paraphrase or reorganize
|
|
3. Triple-check you have ALL sections in correct order
|
|
|
|
**When writing:**
|
|
1. **Target file:** `02-subsystem-catalog.md` in workspace directory
|
|
2. **Operation:** Append your entry (create file if first entry, append if file exists)
|
|
3. **Method:**
|
|
- If file exists: Read current content, then Write with original + your entry
|
|
- If file doesn't exist: Write your entry directly
|
|
4. **Format:** Follow contract sections in exact order
|
|
5. **Completeness:** Include ALL sections - use "None observed" for empty Concerns
|
|
|
|
**DO NOT create separate files** (e.g., `subsystem-X-analysis.md`). The coordinator expects all entries in `02-subsystem-catalog.md`.
|
|
|
|
**After writing:**
|
|
1. Re-read `02-subsystem-catalog.md` to verify your entry was added correctly
|
|
2. Validate format matches contract exactly using this checklist:
|
|
|
|
**Self-Validation Checklist:**
|
|
```
|
|
[ ] Section 1: Subsystem name as H2 heading (## Name)
|
|
[ ] Section 2: Location with backticks and absolute path
|
|
[ ] Section 3: Responsibility as single sentence
|
|
[ ] Section 4: Key Components as bulleted list with descriptions
|
|
[ ] Section 5: Dependencies with "Inbound:" and "Outbound:" labels
|
|
[ ] Section 6: Patterns Observed as bulleted list
|
|
[ ] Section 7: Concerns present (with issues OR "None observed")
|
|
[ ] Section 8: Confidence level (High/Medium/Low) with reasoning
|
|
[ ] Separator: "---" line after confidence
|
|
[ ] NO extra sections added
|
|
[ ] Sections in correct order
|
|
[ ] Entry in file: 02-subsystem-catalog.md (not separate file)
|
|
```
|
|
|
|
## Handling Uncertainty
|
|
|
|
**When architecture is unclear:**
|
|
|
|
1. **State what you observe** - Don't guess at intent
|
|
```markdown
|
|
**Patterns Observed:**
|
|
- 3 files with similar structure (analysis.py, parsing.py, validation.py)
|
|
- Unclear if this is deliberate pattern or coincidence
|
|
```
|
|
|
|
2. **Mark confidence appropriately** - Low confidence is valid
|
|
```markdown
|
|
**Confidence:** Low - Directory structure suggests microservices, but no service definitions found
|
|
```
|
|
|
|
3. **Use "Concerns" section** - Document gaps
|
|
```markdown
|
|
**Concerns:**
|
|
- No clear entry point identified
|
|
- Dependencies inferred from imports, not explicit manifest
|
|
```
|
|
|
|
**DO NOT:**
|
|
- Invent relationships you didn't verify
|
|
- Assume "obvious" architecture without evidence
|
|
- Skip confidence marking because you're uncertain
|
|
|
|
## Positive Behaviors to Maintain
|
|
|
|
From baseline testing, these approaches WORK:
|
|
|
|
✅ **Read actual files** - Don't infer from names alone
|
|
✅ **Use router skills** - They often provide complete catalogs
|
|
✅ **Sample strategically** - 3-5 files verifies patterns without exhaustive reading
|
|
✅ **Cross-reference** - Verify claims (imports match listed dependencies)
|
|
✅ **Document assumptions** - Make reasoning explicit
|
|
✅ **Line counts indicate depth** - 1,500-line skill vs 50-line stub matters
|
|
|
|
## Common Rationalizations (STOP SIGNALS)
|
|
|
|
If you catch yourself thinking these, STOP:
|
|
|
|
| Rationalization | Reality |
|
|
|-----------------|---------|
|
|
| "I'll add Integration Points section for clarity" | Extra sections break downstream parsing |
|
|
| "I'll write to separate file for organization" | Coordinator expects append to specified file |
|
|
| "I'll improve the contract format" | Contract is specification from coordinator |
|
|
| "More information is always helpful" | Your job: follow spec. Coordinator's job: decide what's included |
|
|
| "This comprehensive format is better" | "Better" violates contract. Compliance is mandatory. |
|
|
|
|
## Validation Criteria
|
|
|
|
Your output will be validated against:
|
|
|
|
1. **Contract compliance** - All sections present, no extras
|
|
2. **File operation** - Appended to `02-subsystem-catalog.md`, not separate file
|
|
3. **Confidence marking** - High/Medium/Low with reasoning
|
|
4. **Evidence-based claims** - Components you actually read
|
|
5. **Bidirectional dependencies** - If A→B, then B must show A as inbound
|
|
|
|
**If validation returns NEEDS_REVISION:**
|
|
- Read the validation report
|
|
- Fix specific issues identified
|
|
- Re-submit following contract
|
|
|
|
## Success Criteria
|
|
|
|
**You succeeded when:**
|
|
- Entry appended to `02-subsystem-catalog.md` in exact contract format
|
|
- All sections included (none skipped, none added)
|
|
- Confidence level marked with reasoning
|
|
- Claims supported by files you read
|
|
- Validation returns APPROVED
|
|
|
|
**You failed when:**
|
|
- Added "helpful" extra sections
|
|
- Wrote to separate file
|
|
- Changed contract format
|
|
- Skipped sections
|
|
- No confidence marking
|
|
- Validation returns BLOCK status
|
|
|
|
## Anti-Patterns
|
|
|
|
❌ **Add extra sections**
|
|
"I'll add Recommendations section" → Violates contract
|
|
|
|
❌ **Write to new file**
|
|
"I'll create subsystem-X-analysis.md" → Should append to `02-subsystem-catalog.md`
|
|
|
|
❌ **Skip required sections**
|
|
"No concerns, so I'll omit that section" → Include section with "None observed"
|
|
|
|
❌ **Change format**
|
|
"I'll use numbered lists instead of bullet points" → Follow contract exactly
|
|
|
|
❌ **Work without reading task spec**
|
|
"I know what to do" → Read `temp/task-*.md` first
|
|
|
|
## Integration with Workflow
|
|
|
|
This skill is typically invoked as:
|
|
|
|
1. **Coordinator** creates workspace and holistic assessment
|
|
2. **Coordinator** writes task specification in `temp/task-[yourname].md`
|
|
3. **YOU** read task spec + `01-discovery-findings.md`
|
|
4. **YOU** analyze assigned subsystem systematically
|
|
5. **YOU** append entry to `02-subsystem-catalog.md` following contract
|
|
6. **Validator** checks your output against contract
|
|
7. **Coordinator** proceeds to next phase if validation passes
|
|
|
|
**Your role:** Analyze systematically, follow contract exactly, mark confidence explicitly.
|